Why not extract the images from your legally owned PDFs yourself? If you have access to a *nix system you can easily do this with the
pdfimages utility through the CLI:
- First navigate to the folder where you want the images to be extracted to
- run the following command:
Code:
pdfimages -all path_to_source_pdf extract
- You will get an ungodly amount of images dumped in the folder you navigated to
- Use an image browser to get rid of the images you don't want or need, but keep the ones you want together with their mask image*
- Convert any non-png images to png format first**
- Make a subfolder named Masks, one named Images and one named Composites
- Move the images to the images sub folder and the associated masks to the masks subfolder
- Add a sequence number to the images in both the Masks and images subfolders, making sure that sequence numbers between masks and their associated images match (easiest to use a batch rename program for this)
- Now navigate through the CLI into the Composites folder
- Run the following bash script:
Bash:
#!/bin/bash
Arr1=(/path/to/Images/*);
Arr2=(/path/to/Masks/*);
for i in "${!Arr1[@]}"; do magick ${Arr1[i]} ${Arr2[i]} -alpha off -compose copy-opacity -composite "alpha_"$i".png"; done
Once the script has ran you will now have nice composited PNGs with proper transparencies.
* You will notice that for almost each image there is an associated B/W 'sharp' mask and a B/W 'soft' mask. You only need to retain the first mask and you can just throw away all the soft masks.
** Some files will be extracted as seemingly colour-inverted JPGs. These you will need to 'colour-correct' first before conversion to png, using ImageMagick and the command
Code:
magick original.jpg -negate fixed.jpg