GeekTips
109 subscribers
586 photos
3 videos
77 files
231 links
Linux Mint, video encoding, ffmpeg, geek tips, regex, pdf manipulation, substitcher, mpv config
Download Telegram
Forwarded from GeekTips
Only optimizing a PDF for file size and no need to OCR it so from 20.3MiB —> 10.7MiB.

-s is same as —skip-text (skips text if already OCR'd)
-O (that's a letter O not a 0 zero) - - optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)
- - skip-big tells it to skip any page over 0.1 Megapixels (which would be every page)

ocrmypdf -s -O 3 --skip-big .1 some.pdf some_optimized.pdf
Scan: 100%|███████████████████████████████████████████| 399/399 [00:35<00:00, 11.28page/s]
INFO - Start processing 4 pages concurrently
Forwarded from GeekTips
ocrmypdf -O 3 input.pdf output_ocr.pdf

-O 3 (letter O not 0 zero) is - - optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)

OCR: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 125.0/125.0 [06:56<00:00, 3.33s/page]
WARNING - Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
JPEGs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 125/125 [00:14<00:00, 8.29image/s]
PNGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
INFO - Optimize ratio: 1.74 savings: 42.5%
INFO - Output file is a PDF/A-2B (as expected)

PDF is 56MB reduced from 98MB. To use you need to install jbig2 encoder
which even after following the instructions I still had to install leptonica which was a pain but worth it in the end. If you don't install libtiff5-dev before you compile leptonica you'll get an error like
Error in pixReadMemTiff: function not present

sudo apt-get install libtiff5-dev

tar zxvf leoptonica-1.82.0.tar.gz
cd leptonica-1.82.0/
./autogen.sh
./configure
make
sudo make install

If you only want to optimize a PDF to reduce size by reducing images then see here https://t.iss.one/geektips/185
16kpbs Opus audio with chapters is ready for prime time. I suppose it always was perhaps even in 2020. Opus audio codec is superior at lower bitrates compared to AAC, mp3, vorbis, etc. Anyone can create opus 16kbps chaptered audiobooks with freac (GPL free) which is key for wide adoption.

I've been a fan of m4b chaptered 32kbps AAC audiobooks for awhile. Used m4b-tool to create hundreds of m4b audiobooks which is too complicated for the average user. Currently most developers of apps only support mp3 and m4b audiobooks but I believe it'll change eventually.

Make an Opus chaptered audiobook

1) It’s super easy for anyone to create a Opus chaptered audiobook using Freac freac.org which is GPL free on Windows, Mac and Linux. Drag your m4b, mp3s or opus files into freac and export to opus and set maximum bitrate to 16kpbs. Under Tag tab Artist = Author of audiobook and Album = Title of audiobook. Add a cover if you wish.

2) freac General Settings, Opus Encoder Settings, Tag Settings

3) freac main window setting Author of audiobook, chapter names, Encode to a single file, output folder

4) freac Metadata tags for Artist and Album, Cover, Encode

5) Update Cover without re-encoding with TagEditor

6) Playing Opus chaptered audiobook with VLC on Windows, Mac, Linux

7) Playing Opus chaptered audiobook with VLC on iOS, Android and Settings

8) move opus audiobooks from downloads to VLC directory in Files on iOS or share to open in VLC

9) Fix title tag to match mp3 filename for chapters with MusicBrainz Picard

10) Low volume m4b audiobook normalize the chapters using Audacity

11) Low volume m4b audiobook normalize each chapters audio using Ocenaudio

12) Automatically create chapters from a single mp3 audiobook

13) Manually create chapters from a single mp3 audiobook

14) Manually create chapters using timecodes with LosslessCut

15) Huge audiobook collections from various sources video / audio have various sample rates result in inaccurate chapter times

16) Edit chaptered opus audiobook chapter names or times with MusicBrainz Picard

17) automatically Human or Smart Title Case chapters and append : after chapter numbers

For advanced users you can convert an existing m4b chaptered audiobook with ffmpeg (5.x retains cover and 4.x doesn't)

ffmpeg -i input.m4b -vn -c:a libopus -b:a 16k output.opus

or just use videomass with this preset
-vn -c:a libopus -b:a 16k
Make an opus chaptered 16kbps audiobook with freac (free GPL) freac.org (Windows, Mac, Linux)

— General Settings
Filename pattern: <artist> - <album>
artist = Author of audiobook
album = Title of audiobook

under Tags section you only need to check Vorbis Comment as opus is the successor to ogg vorbis. You don't need ID3v2 or other ones so just uncheck the rest.

Selected encoder: Opus Audio Encoder
click Configure Selected Audio Encoder

1) Encoding mode: Voice (Auto is ok too)

2) File extension: opus (oga is a container but it's for various audio codecs and opus can embed cover art and metadata and chapters)

3) Uncheck Enable variable bitrate encoding. Won't save you anything. Can use for music encoding at 96kbps though if you wish.

4) Bitrate: 16kbps which is just as good as 32kbps m4b AAC in my tests. Obviously is half the file size as 32kbps.
Freeac main window

1) Artist = Author of audiobook

2) Title = chapter names. You can import an m4b chaptered audiobook or multiple mp3 or opus files and the chapter names will be the file names of the audio files.

3) Check Encode to a single file which creates an Opus chaptered audiobook if using multiple audio files.

4) Select filters: disabled

5) select an Output Folder: to save the opus audiobook in
1) click Tags tab then Albums tab

2) click line showing Artist and Album. If it says unknown fill in the appropriate text. Genre is optional but if you want to put one choose Audiobook.

3) Covers: add a Cover if you wish either a jpg or a png. It'll show Other unless you click on it then it'll show Cover (front).

4) click Encode to start encoding. The output file should show Author - Title.opus like Warner Von Lorne - Wanted 7 Fearless Engineers.opus

5) click Joblist tab to show the progress and time reaming for the audiobook encode
Update opus audiobook metadata without having to re-encode audiobook

I don't recommend using VLC to modify metadata for opus files as it may destroy cover metadata. Won't delete this post though.

1) VLC choose View | Playlist or Ctrl-L then double click on album cover to edit metadata. Can change Artist = author or Album = Title.

2) Can't change Album cover even by right clicking on it and changing some metadata and clicking Save Metadata. For that you need to use TagEditor (free GPL for Windows or Linux).
TagEditor https://github.com/Martchus/tageditor (free GPL Windows and Linux appimage) for those on Mac use MP3Tag

1) Can use to change the opus audiobook cover since VLC doesn't work actually work for changing covers.

2) Change Author = Artist or Title = Album of audiobook

3) Just pointing out under Tag Management it'll show you Vorbis Comment (in Opus stream)

4) Save your metadata changes
VLC videolan.org/vlc (free, GPL) on Linux, Mac, Windows play opus chaptered audibooks.

If you wish to increase or decrease the playback speed (0.25x to 4.00x) be sure to check under Preferences | Audio that Enable Time-Stretching audio is checked as they adjusts the pitch to improve output at faster or slower speeds. VLC on Linux doesn't show chapter durations or starting times unfortunately.
VLC Settings on iOS make sure Continue audio playback is set to Always so it’ll resume from the point you last listened to. For variable playback speed make sure Time-stretching audio is checked.

When importing opus audiobooks into VLC probably the best way is to transfer the opus audiobook(s) to Files on iOS then move them to the VLC directory.
Audacity (free, GPL Win, Linux, Mac) to Normalize the audio if the m4b audiobook volume is too low. This is very rare that you'll encounter such a case.

First export from freac. In General Settings under Output filenames put
Filename pattern: <title>
Each chapter will be encoded and output to matching the chapter name.
Uncheck Encode to a single file
Encoder choose opus 32kbps or as high as the original bitrate as these files need to be re-encode three times in total.

1) in Audacity import all chapter files at once. Select All (Ctrl-A) then Effect | Normalize.

2) For that rare chapter if the wavelength doesn't go up like the rest double click on the track to highlight both tracks then Effect | Amplify and choose an Amplification (db) of whatever kind of matches the height of the other track waveforms.

3) File | Export | Export Multiple and choose mp3 or ogg. Don't lower the bitrate lower than the original.

Using Audacity seems a tad quicker to use for normalization than Ocenaudio next post.
Ocenaudio (free GPL Win, Linux, Mac) can also Normalize audio.

1) Import all chaptered audio files by dragging them into Ocenaudio and it takes awhile to analyzing them (about 3x longer than audacity). Each chapter double click on and choose Effects | Normalize or press the Normalize button.

2) Each subsequent chapter just press Ctrl-Y to repeat the Normalization after double clicking on the track to select it.

3) File | Save All and it'll output the opus files to a variable bitrate around 50kbps overwriting the original ones.

4) For that rare chapter if it doesn't normalize the track like the other tracks judging by the waveform graph then use the Gain Tool and increase the dB appropriately.

For advanced users: you can use MKVToolnix to extract the chapters without re-encoding them although you'll have to manually rename the chapter names. https://t.iss.one/geektips/79
Automatically create audiobook chapters from a single mp3 audiobook that has no chapters.

1) Import single mp3 audiobook into Audacity 3.x or higher

2) Ctrl-A to select all then Analyze | Label Sounds (this process took me 5 minutes on my laptop) Automatically detect chapter breaks for silence of at least 3 seconds as seen in the screenshot. You could try 2.5 seconds or even 2 seconds but don't go below that.
3) In Audacity choose File | Export | Export Multiple and Format: Opus Bitrate: 128kbps (same as original mp3)

4) VBR Mode: Off (Variable Bitrate mode)

5) Split files based on: Labels by default they are Chapter 001, Chapter 002, Chapter 003, etc. if you set it like the screenshot in the previous post.

6) Name files: Using Label/Track Name and click Export and it'll re-encode the one mp3 file into multiple opus audio files, one for each chapter detected.

7) Rename them to chapter names as the book and before importing them into freac to make an opus chaptered audiobook, drag into MusicBrainz Picard and Tag from file names.
Many times the auto detection doesn't work so you can manually find chapters. Once you locate the starting point of the next chapter click it then choose Select | Clip Boundaries | Previous Clip Boundary to Cursor. Then Edit | Labels | Add Label at Selection (Ctrl+B) then name your chapter. This is a tedious process and lots of work but might be worth it if the audiobook is a great one.

To manual locate chapters you zoom in and look for big breaks in the waveform graph.

An even quicker way if you have just a few chapters is to use LosslessCut (Linux, Mac, Win) free GPL. It doesn't have to analyze the mp3 or opus file. Create in and out points and separate the files to export.
Chapters created automatically and here is the result in VLC on Linux.
Most used PDF operations performed with with various free apps

Highlight text in yellow with Document Viewer (Evince)
select text | right click or Ctrl-H
able to change color by right-clicking | Annotation Properties

flatpak install flathub org.gnome.Evince
flatpak run org.gnome.Evince

In Evince to print many pages in one page
choose Print | Print to File
Page Setup | Pages per side:
1, 2, 4, 6, 9 or 16
------------------------------------------------

PDF pages per side (more options) and to make booklets out of a linear PDF
https://kjo.herbesfolles.org/bookletimposer/
sudo apt install bookletimposer
------------------------------------------------

Combine many images from a directory into a PDF.
sudo apt install img2pdf
(or python)
pip3 install img2pdf

img2pdf *.jpg -o output.pdf

If you have images and a few are much bigger than the others you might get extremely small pages in your document. Use Pix or Image View (Xviewer) to quickly browse through the images and check out the image dimensions. So if most are say 2000 x 1500 or so and just a few are 3000 x 2500 or higher set the max pixel height and width and all PDF pages will be relatively uniform in size and with no white margins.

img2pdf --imgsize 2000x2000 *.jpg -o output.pdf
------------------------------------------------

OCR (Optical Character Recognition) a PDF document while retaining the image and putting the OCR'ed text hidden behind it

1) install tesseract 5.x which is 15% faster than 4.x
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
sudo apt update
sudo apt install tesseract-ocr

2) install ocrmypdf
pip install ocrmypdf

3) install JBIG2 for image compression
https://ocrmypdf.readthedocs.io/en/latest/jbig2.html

OCR a PDF
ocrmypdf input.pdf output.pdf

OCR a PDF and add metadata
ocrmypdf --title "title" --author "author" input.pdf output.pdf

OCR a PDF and optimize file size by compressing images
ocrmypdf -O 3 input.pdf output.pdf

Only optimizing a PDF and skipping OCR
ocrmypdf -s -O 3 --skip-big .1 input.pdf output.pdf

-s is same as —skip-text (skips text if already OCR'd)
-O (that's a letter O not a 0 zero) --optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)
- - skip-big tells it to skip any page over 0.1 Megapixels (which would be every page)
- - output-type pdf to disable PDF/A generation and maintain annotations

Batch ocrmypdf limiting it to 2 pdfs at a time
sudo apt install parallel
mkdir output
(in dir of PDFs)
parallel --tag -j 2 ocrmypdf -s -O 3 --skip-big .1 '{}' 'output/{}' ::: *.pdf
------------------------------------------------

To extract, delete, rotate, split, combine PDF pages use PDF Slicer (Windows, Linux, keyboard to rearrange) or PDF Arranger drag PDF pages to rearrange

flatpak install flathub com.github.junrrein.PDFSlicer
flatpak run com.github.junrrein.PDFSlicer

flatpak install flathub com.github.jeromerobert.pdfarranger
flatpak run com.github.jeromerobert.pdfarranger
------------------------------------------------

Combine PDFs
pdftk one.pdf two.pdf three.pdf output combined.pdf
pdftk *.pdf cat output combined.pdf
-v = natural sort of (version) numbers within text
ls -v *.pdf > namelist
pdftk 'cat namelist' output combined.pdf
------------------------------------------------

Clean PDF metadata
https://exifcleaner.com AppImage, DEB, rpm, Windows, Mac
drag your PDFs into ExifCleaner window and their metadata is wiped
------------------------------------------------

Crop a PDF (not just crop and hide margins) (Linux, Mac, Windows)
Master PDF Editor (DEB, rpm)$70 but can use the non-expiring free version
https://code-industry.net/free-pdf-editor/

Crop a PDF in Master PDF Editor (free version)
Crop a page or pages manually by selecting the area to keep. Click Document | Crop Pages

------------------------------------------------
Edit text in a PDF with LibreOffice Draw. Has problems with some complicated documents though.
------------------------------------------------

NormCap is a app that lets you capture and OCR any part of your screen you select that is an image and extract text copied to your clipboard which can then be pasted into a text editor.

In Applications | Settings | Keyboard under Application Shorcuts add Ctrl+Print and for the command navigate to the NormCap-unstable-x86_64.AppImage
------------------------------------------------

View a PDF in Dark Mode / Night Mode Grayscale NOT inverted images like most PDF viewers

Master PDF Editor (free version)
Settings | Display and check Replace Document Colors
change Page Background: black color #2c2c2c and Text: white or light gray
To change to Dark Mode grayscale click View | Replace Document Colors

Also Zathura has Dark Mode grayscale
sudo apt install zathura
Add comic book support cbz cbr files dark mode grayscale on the images too
sudo apt install zathura-cb

Zathura has no thumbnails nor shows document properties

nano ~/.config/zathura/zathurarc
Paste the following into the zathurarc text configuration file and save it. You don't need to create this configuration file as it already has dark mode CTRL-R. So only do this if you don't want pure black and pure white colors.

set recolor true
set recolor-darkcolor "#dcdccc"
set recolor-lightcolor "#1f1f1f"

Zathura Keyboard shortcut keys since there isn't any menu

/ search for text
n next search result
shift+n previous search result

d toggles dual page display
o open another PDF
r rotate page
s fit to screen
- zoom out
+ zoom in
Tab toggle index
CTRL-P print
CTRL-R toggle recolor (dark mode)
F5 Presentation mode
F11 fullscreen
Optimized PDF from 3.1MB to 176KB

ocrmypdf -s -O 3 --skip-big .1 Harvey\ Weinstein\ accuser\ Mogul\ lacks\ male\ genitalia.pdf Harvey\ Weinstein\ accuser\ Mogul\ lacks\ male\ genitalia_opt.pdf 

metadata cleaned with ExifCleaner. Screenshot shows metadata before being wiped
SparkelDrinkIdeas_ocr.pdf
2.7 MB
34 Sparkel Drink Ideas all non-alcoholic and no tea ones. Just put this together for personal reference.