GeekTips

Most used PDF operations performed with with various free apps

Highlight text in yellow with Document Viewer (Evince)
select text | right click or Ctrl-H
able to change color by right-clicking | Annotation Properties

flatpak install flathub org.gnome.Evince
flatpak run org.gnome.Evince

In Evince to print many pages in one page
choose

Print | Print to File
Page Setup | Pages per side:

1, 2, 4, 6, 9 or 16
------------------------------------------------

PDF pages per side (more options) and to make booklets out of a linear PDF
https://kjo.herbesfolles.org/bookletimposer/

sudo apt install bookletimposer

------------------------------------------------

Combine many images from a directory into a PDF.

sudo apt install img2pdf

(or python)

pip3 install img2pdf

img2pdf *.jpg -o output.pdf

If you have images and a few are much bigger than the others you might get extremely small pages in your document. Use Pix or Image View (Xviewer) to quickly browse through the images and check out the image dimensions. So if most are say 2000 x 1500 or so and just a few are 3000 x 2500 or higher set the max pixel height and width and all PDF pages will be relatively uniform in size and with no white margins.

img2pdf --imgsize 2000x2000 *.jpg -o output.pdf

------------------------------------------------

OCR (Optical Character Recognition) a PDF document while retaining the image and putting the OCR'ed text hidden behind it

1) install tesseract 5.x which is 15% faster than 4.x

sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
sudo apt update
sudo apt install tesseract-ocr

2) install ocrmypdf

pip install ocrmypdf

3) install JBIG2 for image compression
https://ocrmypdf.readthedocs.io/en/latest/jbig2.html

OCR a PDF

ocrmypdf input.pdf output.pdf

OCR a PDF and add metadata

ocrmypdf --title "title" --author "author" input.pdf output.pdf

OCR a PDF and optimize file size by compressing images

ocrmypdf -O 3 input.pdf output.pdf

Only optimizing a PDF and skipping OCR

ocrmypdf -s -O 3 --skip-big .1 input.pdf output.pdf

-s is same as —skip-text (skips text if already OCR'd)
-O (that's a letter O not a 0 zero) --optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)
- - skip-big tells it to skip any page over 0.1 Megapixels (which would be every page)
- - output-type pdf to disable PDF/A generation and maintain annotations

Batch ocrmypdf limiting it to 2 pdfs at a time

sudo apt install parallel
mkdir output

(in dir of PDFs)

parallel --tag -j 2 ocrmypdf -s -O 3 --skip-big .1 '{}' 'output/{}' ::: *.pdf

------------------------------------------------

To extract, delete, rotate, split, combine PDF pages use PDF Slicer (Windows, Linux, keyboard to rearrange) or PDF Arranger drag PDF pages to rearrange

flatpak install flathub com.github.junrrein.PDFSlicer
flatpak run com.github.junrrein.PDFSlicer

flatpak install flathub com.github.jeromerobert.pdfarranger
flatpak run com.github.jeromerobert.pdfarranger

------------------------------------------------

Combine PDFs

pdftk one.pdf two.pdf three.pdf output combined.pdf
pdftk *.pdf cat output combined.pdf

-v = natural sort of (version) numbers within text

ls -v *.pdf > namelist
pdftk 'cat namelist' output combined.pdf

------------------------------------------------

Clean PDF metadata
https://exifcleaner.com AppImage, DEB, rpm, Windows, Mac
drag your PDFs into ExifCleaner window and their metadata is wiped
------------------------------------------------

Crop a PDF (not just crop and hide margins) (Linux, Mac, Windows)
Master PDF Editor (DEB, rpm)$70 but can use the non-expiring free version
https://code-industry.net/free-pdf-editor/

Crop a PDF in Master PDF Editor (free version)
Crop a page or pages manually by selecting the area to keep. Click

Document | Crop Pages

------------------------------------------------

99 viewsedited 22:48