Forwarded from GeekTips
Only optimizing a PDF for file size and no need to OCR it so from 20.3MiB —> 10.7MiB.
-s is same as —skip-text (skips text if already OCR'd)
-O (that's a letter O not a 0 zero) - - optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)
- - skip-big tells it to skip any page over 0.1 Megapixels (which would be every page)
INFO - Start processing 4 pages concurrently
-s is same as —skip-text (skips text if already OCR'd)
-O (that's a letter O not a 0 zero) - - optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)
- - skip-big tells it to skip any page over 0.1 Megapixels (which would be every page)
ocrmypdf -s -O 3 --skip-big .1 some.pdf some_optimized.pdfScan: 100%|███████████████████████████████████████████| 399/399 [00:35<00:00, 11.28page/s]
INFO - Start processing 4 pages concurrently
Forwarded from GeekTips
ocrmypdf -O 3 input.pdf output_ocr.pdf-O 3 (letter O not 0 zero) is - - optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)
OCR: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 125.0/125.0 [06:56<00:00, 3.33s/page]
WARNING - Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
JPEGs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 125/125 [00:14<00:00, 8.29image/s]
PNGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
INFO - Optimize ratio: 1.74 savings: 42.5%
INFO - Output file is a PDF/A-2B (as expected)
PDF is 56MB reduced from 98MB. To use you need to install jbig2 encoder
which even after following the instructions I still had to install leptonica which was a pain but worth it in the end. If you don't install libtiff5-dev before you compile leptonica you'll get an error like
Error in pixReadMemTiff: function not present
sudo apt-get install libtiff5-dev
tar zxvf leoptonica-1.82.0.tar.gz
cd leptonica-1.82.0/
./autogen.sh
./configure
make
sudo make install
If you only want to optimize a PDF to reduce size by reducing images then see here https://t.iss.one/geektips/18516kpbs Opus audio with chapters is ready for prime time. I suppose it always was perhaps even in 2020. Opus audio codec is superior at lower bitrates compared to AAC, mp3, vorbis, etc. Anyone can create opus 16kbps chaptered audiobooks with freac (GPL free) which is key for wide adoption.
I've been a fan of m4b chaptered 32kbps AAC audiobooks for awhile. Used m4b-tool to create hundreds of m4b audiobooks which is too complicated for the average user. Currently most developers of apps only support mp3 and m4b audiobooks but I believe it'll change eventually.
Make an Opus chaptered audiobook
1) It’s super easy for anyone to create a Opus chaptered audiobook using Freac freac.org which is GPL free on Windows, Mac and Linux. Drag your m4b, mp3s or opus files into freac and export to opus and set maximum bitrate to 16kpbs. Under Tag tab Artist = Author of audiobook and Album = Title of audiobook. Add a cover if you wish.
2) freac General Settings, Opus Encoder Settings, Tag Settings
3) freac main window setting Author of audiobook, chapter names, Encode to a single file, output folder
4) freac Metadata tags for Artist and Album, Cover, Encode
5) Update Cover without re-encoding with TagEditor
6) Playing Opus chaptered audiobook with VLC on Windows, Mac, Linux
7) Playing Opus chaptered audiobook with VLC on iOS, Android and Settings
8) move opus audiobooks from downloads to VLC directory in Files on iOS or share to open in VLC
9) Fix title tag to match mp3 filename for chapters with MusicBrainz Picard
10) Low volume m4b audiobook normalize the chapters using Audacity
11) Low volume m4b audiobook normalize each chapters audio using Ocenaudio
12) Automatically create chapters from a single mp3 audiobook
13) Manually create chapters from a single mp3 audiobook
14) Manually create chapters using timecodes with LosslessCut
15) Huge audiobook collections from various sources video / audio have various sample rates result in inaccurate chapter times
16) Edit chaptered opus audiobook chapter names or times with MusicBrainz Picard
17) automatically Human or Smart Title Case chapters and append : after chapter numbers
For advanced users you can convert an existing m4b chaptered audiobook with ffmpeg (5.x retains cover and 4.x doesn't)
I've been a fan of m4b chaptered 32kbps AAC audiobooks for awhile. Used m4b-tool to create hundreds of m4b audiobooks which is too complicated for the average user. Currently most developers of apps only support mp3 and m4b audiobooks but I believe it'll change eventually.
Make an Opus chaptered audiobook
1) It’s super easy for anyone to create a Opus chaptered audiobook using Freac freac.org which is GPL free on Windows, Mac and Linux. Drag your m4b, mp3s or opus files into freac and export to opus and set maximum bitrate to 16kpbs. Under Tag tab Artist = Author of audiobook and Album = Title of audiobook. Add a cover if you wish.
2) freac General Settings, Opus Encoder Settings, Tag Settings
3) freac main window setting Author of audiobook, chapter names, Encode to a single file, output folder
4) freac Metadata tags for Artist and Album, Cover, Encode
5) Update Cover without re-encoding with TagEditor
6) Playing Opus chaptered audiobook with VLC on Windows, Mac, Linux
7) Playing Opus chaptered audiobook with VLC on iOS, Android and Settings
8) move opus audiobooks from downloads to VLC directory in Files on iOS or share to open in VLC
9) Fix title tag to match mp3 filename for chapters with MusicBrainz Picard
10) Low volume m4b audiobook normalize the chapters using Audacity
11) Low volume m4b audiobook normalize each chapters audio using Ocenaudio
12) Automatically create chapters from a single mp3 audiobook
13) Manually create chapters from a single mp3 audiobook
14) Manually create chapters using timecodes with LosslessCut
15) Huge audiobook collections from various sources video / audio have various sample rates result in inaccurate chapter times
16) Edit chaptered opus audiobook chapter names or times with MusicBrainz Picard
17) automatically Human or Smart Title Case chapters and append : after chapter numbers
For advanced users you can convert an existing m4b chaptered audiobook with ffmpeg (5.x retains cover and 4.x doesn't)
ffmpeg -i input.m4b -vn -c:a libopus -b:a 16k output.opus
or just use videomass with this preset-vn -c:a libopus -b:a 16kTelegram
GeekTips
Shows why Opus has superior audio quality at 16kbps. All audiobooks at 16kbps 48kHz or 44kHz.
Make an opus chaptered 16kbps audiobook with freac (free GPL) freac.org (Windows, Mac, Linux)
— General Settings
Filename pattern:
artist = Author of audiobook
album = Title of audiobook
under
Selected encoder: Opus Audio Encoder
click
2) File extension:
3) Uncheck
4) Bitrate:
— General Settings
Filename pattern:
<artist> - <album> artist = Author of audiobook
album = Title of audiobook
under
Tags section you only need to check Vorbis Comment as opus is the successor to ogg vorbis. You don't need ID3v2 or other ones so just uncheck the rest.Selected encoder: Opus Audio Encoder
click
Configure Selected Audio Encoder
1) Encoding mode: Voice (Auto is ok too)2) File extension:
opus (oga is a container but it's for various audio codecs and opus can embed cover art and metadata and chapters)3) Uncheck
Enable variable bitrate encoding. Won't save you anything. Can use for music encoding at 96kbps though if you wish.4) Bitrate:
16kbps which is just as good as 32kbps m4b AAC in my tests. Obviously is half the file size as 32kbps.Freeac main window
1)
2)
3) Check
4) Select filters:
1)
Artist = Author of audiobook2)
Title = chapter names. You can import an m4b chaptered audiobook or multiple mp3 or opus files and the chapter names will be the file names of the audio files.3) Check
Encode to a single file which creates an Opus chaptered audiobook if using multiple audio files.4) Select filters:
disabled
5) select an Output Folder: to save the opus audiobook in1) click
2) click line showing
3)
4) click
Tags tab then Albums tab2) click line showing
Artist and Album. If it says unknown fill in the appropriate text. Genre is optional but if you want to put one choose Audiobook. 3)
Covers: add a Cover if you wish either a jpg or a png. It'll show Other unless you click on it then it'll show Cover (front).4) click
Encode to start encoding. The output file should show Author - Title.opus like Warner Von Lorne - Wanted 7 Fearless Engineers.opus
5) click Joblist tab to show the progress and time reaming for the audiobook encodeUpdate opus audiobook metadata without having to re-encode audiobook
I don't recommend using VLC to modify metadata for opus files as it may destroy cover metadata. Won't delete this post though.
1)
2) Can't change
I don't recommend using VLC to modify metadata for opus files as it may destroy cover metadata. Won't delete this post though.
1)
VLC choose View | Playlist or Ctrl-L then double click on album cover to edit metadata. Can change Artist = author or Album = Title. 2) Can't change
Album cover even by right clicking on it and changing some metadata and clicking Save Metadata. For that you need to use TagEditor (free GPL for Windows or Linux).TagEditor https://github.com/Martchus/tageditor (free GPL Windows and Linux appimage) for those on Mac use MP3Tag
1) Can use to change the opus audiobook cover since VLC doesn't work actually work for changing covers.
2) Change
3) Just pointing out under
1) Can use to change the opus audiobook cover since VLC doesn't work actually work for changing covers.
2) Change
Author = Artist or Title = Album of audiobook3) Just pointing out under
Tag Management it'll show you Vorbis Comment (in Opus stream)
4) Save your metadata changesVLC videolan.org/vlc (free, GPL) on Linux, Mac, Windows play opus chaptered audibooks.
If you wish to increase or decrease the playback speed (0.25x to 4.00x) be sure to check under
If you wish to increase or decrease the playback speed (0.25x to 4.00x) be sure to check under
Preferences | Audio that Enable Time-Stretching audio is checked as they adjusts the pitch to improve output at faster or slower speeds. VLC on Linux doesn't show chapter durations or starting times unfortunately.VLC Settings on iOS make sure
When importing opus audiobooks into VLC probably the best way is to transfer the opus audiobook(s) to
Continue audio playback is set to Always so it’ll resume from the point you last listened to. For variable playback speed make sure Time-stretching audio is checked.When importing opus audiobooks into VLC probably the best way is to transfer the opus audiobook(s) to
Files on iOS then move them to the VLC directory.Audacity (free, GPL Win, Linux, Mac) to Normalize the audio if the m4b audiobook volume is too low. This is very rare that you'll encounter such a case.
First export from freac. In General Settings under Output filenames put
Uncheck
1) in Audacity import all chapter files at once. Select All (Ctrl-A) then
3)
Using Audacity seems a tad quicker to use for normalization than Ocenaudio next post.
First export from freac. In General Settings under Output filenames put
Filename pattern: <title>Each chapter will be encoded and output to matching the chapter name.
Uncheck
Encode to a single file
Encoder choose opus 32kbps or as high as the original bitrate as these files need to be re-encode three times in total.1) in Audacity import all chapter files at once. Select All (Ctrl-A) then
Effect | Normalize.
2) For that rare chapter if the wavelength doesn't go up like the rest double click on the track to highlight both tracks then Effect | Amplify and choose an Amplification (db) of whatever kind of matches the height of the other track waveforms.3)
File | Export | Export Multiple and choose mp3 or ogg. Don't lower the bitrate lower than the original.Using Audacity seems a tad quicker to use for normalization than Ocenaudio next post.
Ocenaudio (free GPL Win, Linux, Mac) can also Normalize audio.
1) Import all chaptered audio files by dragging them into Ocenaudio and it takes awhile to analyzing them (about 3x longer than audacity). Each chapter double click on and choose
2) Each subsequent chapter just press
3)
4) For that rare chapter if it doesn't normalize the track like the other tracks judging by the waveform graph then use the
For advanced users: you can use MKVToolnix to extract the chapters without re-encoding them although you'll have to manually rename the chapter names. https://t.iss.one/geektips/79
1) Import all chaptered audio files by dragging them into Ocenaudio and it takes awhile to analyzing them (about 3x longer than audacity). Each chapter double click on and choose
Effects | Normalize or press the Normalize button. 2) Each subsequent chapter just press
Ctrl-Y to repeat the Normalization after double clicking on the track to select it.3)
File | Save All and it'll output the opus files to a variable bitrate around 50kbps overwriting the original ones.4) For that rare chapter if it doesn't normalize the track like the other tracks judging by the waveform graph then use the
Gain Tool and increase the dB appropriately.For advanced users: you can use MKVToolnix to extract the chapters without re-encoding them although you'll have to manually rename the chapter names. https://t.iss.one/geektips/79
Automatically create audiobook chapters from a single mp3 audiobook that has no chapters.
1) Import single mp3 audiobook into Audacity 3.x or higher
2) Ctrl-A to select all then
1) Import single mp3 audiobook into Audacity 3.x or higher
2) Ctrl-A to select all then
Analyze | Label Sounds (this process took me 5 minutes on my laptop) Automatically detect chapter breaks for silence of at least 3 seconds as seen in the screenshot. You could try 2.5 seconds or even 2 seconds but don't go below that.3) In Audacity choose
4) VBR Mode:
5) Split files based on:
6) Name files:
7) Rename them to chapter names as the book and before importing them into freac to make an opus chaptered audiobook, drag into MusicBrainz Picard and Tag from file names.
File | Export | Export Multiple and Format: Opus Bitrate: 128kbps (same as original mp3)4) VBR Mode:
Off (Variable Bitrate mode)5) Split files based on:
Labels by default they are Chapter 001, Chapter 002, Chapter 003, etc. if you set it like the screenshot in the previous post.6) Name files:
Using Label/Track Name and click Export and it'll re-encode the one mp3 file into multiple opus audio files, one for each chapter detected. 7) Rename them to chapter names as the book and before importing them into freac to make an opus chaptered audiobook, drag into MusicBrainz Picard and Tag from file names.
Many times the auto detection doesn't work so you can manually find chapters. Once you locate the starting point of the next chapter click it then choose
To manual locate chapters you zoom in and look for big breaks in the waveform graph.
An even quicker way if you have just a few chapters is to use LosslessCut (Linux, Mac, Win) free GPL. It doesn't have to analyze the mp3 or opus file. Create in and out points and separate the files to export.
Select | Clip Boundaries | Previous Clip Boundary to Cursor. Then Edit | Labels | Add Label at Selection (Ctrl+B) then name your chapter. This is a tedious process and lots of work but might be worth it if the audiobook is a great one.To manual locate chapters you zoom in and look for big breaks in the waveform graph.
An even quicker way if you have just a few chapters is to use LosslessCut (Linux, Mac, Win) free GPL. It doesn't have to analyze the mp3 or opus file. Create in and out points and separate the files to export.
Most used PDF operations performed with with various free apps
Highlight text in yellow with Document Viewer (Evince)
able to change color by right-clicking | Annotation Properties
choose
------------------------------------------------
PDF pages per side (more options) and to make booklets out of a linear PDF
https://kjo.herbesfolles.org/bookletimposer/
Combine many images from a directory into a PDF.
OCR (Optical Character Recognition) a PDF document while retaining the image and putting the OCR'ed text hidden behind it
1) install tesseract 5.x which is 15% faster than 4.x
https://ocrmypdf.readthedocs.io/en/latest/jbig2.html
OCR a PDF
Batch ocrmypdf limiting it to 2 pdfs at a time
To extract, delete, rotate, split, combine PDF pages use PDF Slicer (Windows, Linux, keyboard to rearrange) or PDF Arranger drag PDF pages to rearrange
Combine PDFs
Clean PDF metadata
https://exifcleaner.com AppImage, DEB, rpm, Windows, Mac
drag your PDFs into ExifCleaner window and their metadata is wiped
------------------------------------------------
Crop a PDF (not just crop and hide margins) (Linux, Mac, Windows)
Master PDF Editor (DEB, rpm)$70 but can use the non-expiring free version
https://code-industry.net/free-pdf-editor/
Crop a PDF in Master PDF Editor (free version)
Crop a page or pages manually by selecting the area to keep. Click
Highlight text in yellow with Document Viewer (Evince)
select text | right click or Ctrl-H able to change color by right-clicking | Annotation Properties
flatpak install flathub org.gnome.EvinceIn Evince to print many pages in one page
flatpak run org.gnome.Evince
choose
Print | Print to File
Page Setup | Pages per side: 1, 2, 4, 6, 9 or 16------------------------------------------------
PDF pages per side (more options) and to make booklets out of a linear PDF
https://kjo.herbesfolles.org/bookletimposer/
sudo apt install bookletimposer------------------------------------------------
Combine many images from a directory into a PDF.
sudo apt install img2pdf(or python)
pip3 install img2pdfIf you have images and a few are much bigger than the others you might get extremely small pages in your document. Use Pix or Image View (Xviewer) to quickly browse through the images and check out the image dimensions. So if most are say 2000 x 1500 or so and just a few are 3000 x 2500 or higher set the max pixel height and width and all PDF pages will be relatively uniform in size and with no white margins.
img2pdf *.jpg -o output.pdf
img2pdf --imgsize 2000x2000 *.jpg -o output.pdf------------------------------------------------
OCR (Optical Character Recognition) a PDF document while retaining the image and putting the OCR'ed text hidden behind it
1) install tesseract 5.x which is 15% faster than 4.x
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel2) install ocrmypdf
sudo apt update
sudo apt install tesseract-ocr
pip install ocrmypdf3) install JBIG2 for image compression
https://ocrmypdf.readthedocs.io/en/latest/jbig2.html
OCR a PDF
ocrmypdf input.pdf output.pdfOCR a PDF and add metadata
ocrmypdf --title "title" --author "author" input.pdf output.pdfOCR a PDF and optimize file size by compressing images
ocrmypdf -O 3 input.pdf output.pdfOnly optimizing a PDF and skipping OCR
ocrmypdf -s -O 3 --skip-big .1 input.pdf output.pdf
-s is same as —skip-text (skips text if already OCR'd)-O (that's a letter O not a 0 zero) --optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)- - skip-big tells it to skip any page over 0.1 Megapixels (which would be every page) - - output-type pdf to disable PDF/A generation and maintain annotationsBatch ocrmypdf limiting it to 2 pdfs at a time
sudo apt install parallel(in dir of PDFs)
mkdir output
parallel --tag -j 2 ocrmypdf -s -O 3 --skip-big .1 '{}' 'output/{}' ::: *.pdf
------------------------------------------------To extract, delete, rotate, split, combine PDF pages use PDF Slicer (Windows, Linux, keyboard to rearrange) or PDF Arranger drag PDF pages to rearrange
flatpak install flathub com.github.junrrein.PDFSlicer------------------------------------------------
flatpak run com.github.junrrein.PDFSlicer
flatpak install flathub com.github.jeromerobert.pdfarranger
flatpak run com.github.jeromerobert.pdfarranger
Combine PDFs
pdftk one.pdf two.pdf three.pdf output combined.pdf-v = natural sort of (version) numbers within text
pdftk *.pdf cat output combined.pdf
ls -v *.pdf > namelist------------------------------------------------
pdftk 'cat namelist' output combined.pdf
Clean PDF metadata
https://exifcleaner.com AppImage, DEB, rpm, Windows, Mac
drag your PDFs into ExifCleaner window and their metadata is wiped
------------------------------------------------
Crop a PDF (not just crop and hide margins) (Linux, Mac, Windows)
Master PDF Editor (DEB, rpm)$70 but can use the non-expiring free version
https://code-industry.net/free-pdf-editor/
Crop a PDF in Master PDF Editor (free version)
Crop a page or pages manually by selecting the area to keep. Click
Document | Crop Pages
------------------------------------------------Edit text in a PDF with LibreOffice Draw. Has problems with some complicated documents though.
------------------------------------------------
NormCap is a app that lets you capture and OCR any part of your screen you select that is an image and extract text copied to your clipboard which can then be pasted into a text editor.
In
View a PDF in Dark Mode / Night Mode Grayscale NOT inverted images like most PDF viewers
Master PDF Editor (free version)
To change to Dark Mode grayscale click
/ search for text
n next search result
shift+n previous search result
d toggles dual page display
o open another PDF
r rotate page
s fit to screen
- zoom out
+ zoom in
Tab toggle index
CTRL-P print
CTRL-R toggle recolor (dark mode)
F5 Presentation mode
F11 fullscreen
------------------------------------------------
NormCap is a app that lets you capture and OCR any part of your screen you select that is an image and extract text copied to your clipboard which can then be pasted into a text editor.
In
Applications | Settings | Keyboard under Application Shorcuts add Ctrl+Print and for the command navigate to the NormCap-unstable-x86_64.AppImage
------------------------------------------------View a PDF in Dark Mode / Night Mode Grayscale NOT inverted images like most PDF viewers
Master PDF Editor (free version)
Settings | Display and check Replace Document Colors
change Page Background: black color #2c2c2c and Text: white or light grayTo change to Dark Mode grayscale click
View | Replace Document Colors
Also Zathura has Dark Mode grayscalesudo apt install zathura
Add comic book support cbz cbr files dark mode grayscale on the images toosudo apt install zathura-cb
Zathura has no thumbnails nor shows document propertiesnano ~/.config/zathura/zathurarc
Paste the following into the zathurarc text configuration file and save it. You don't need to create this configuration file as it already has dark mode CTRL-R. So only do this if you don't want pure black and pure white colors.set recolor true
set recolor-darkcolor "#dcdccc"
set recolor-lightcolor "#1f1f1f"
Zathura Keyboard shortcut keys since there isn't any menu/ search for text
n next search result
shift+n previous search result
d toggles dual page display
o open another PDF
r rotate page
s fit to screen
- zoom out
+ zoom in
Tab toggle index
CTRL-P print
CTRL-R toggle recolor (dark mode)
F5 Presentation mode
F11 fullscreen
GitHub
GitHub - dynobo/normcap: OCR powered screen-capture tool to capture information instead of images
OCR powered screen-capture tool to capture information instead of images - dynobo/normcap
SparkelDrinkIdeas_ocr.pdf
2.7 MB
34 Sparkel Drink Ideas all non-alcoholic and no tea ones. Just put this together for personal reference.