GeekTips
109 subscribers
586 photos
3 videos
77 files
231 links
Linux Mint, video encoding, ffmpeg, geek tips, regex, pdf manipulation, substitcher, mpv config
Download Telegram
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
sudo apt update
sudo apt install tesseract-ocr

Installed Tesseract 5.0 which in my testing was 15% faster than 4.1

ocrmypdf —optimize 3 input.pdf output_optimized.pdf

PDF/A. You may wish to examine the output PDF's XMP metadata.
JPEGs: 100%|███████████████████████████████| 180/180 [00:22<00:00, 6.97image/s]
PNGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
INFO - Optimize ratio: 1.64 savings: 38.9%
INFO - Output file is a PDF/A-2B (as expected)

It optimized the PDF from 165MB —> 101MB.

Noticed gImagereader three weeks ago released a package that's compatible with tesseract 5. Probably have to wait 6 to 12 months till it gets released as a deb file though.
NormCap is a app that lets you capture and OCR any part of your screen you select that is an image and extract text copied to your clipboard which can then be pasted into a text editor.

I tried installing the python norm NormCap cap app but it failed. Just download the appimage.

In Applications | Settings | Keyboard under Application Shorcuts
add Ctrl+Print and for the command navigate to the NormCap-unstable-x86_64.AppImage
MinderKBShortcuts(Outliner).pdf
27.8 KB
Minder Keyboard Shortcuts made with Outliner
Also use Flatseal (Flatpak) to grant home home file permissions for apps where you see /run/user/1000/doc/ as the Flatpak is sandboxed.

Need to do for Annotator, Minder, Outliner
First paragraph – Example text copied from a PDF which contains
unwanted line breaks and
the text doesn’t flow.

Second paragraph – To fix this in LibreOffice you
need to Edit | Search and Replace three
times to fix it.

Third paragraph – now if you have tabs in
your existing document this won’t work. Always wanted an
offline solution not those websites to do this.

Fourth paragraph - So you should see four paragraphs nicely formatted
and the text flowing
so it looks like it should.

You want it to look like so:

First paragraph – Example text copied from a PDF which contains unwanted line breaks and the text doesn’t flow.

Second paragraph – To fix this in LibreOffice you need to Edit | Search and Replace three times to fix it.

Third paragraph – now if you have tabs in your existing document this won’t work. Always wanted an offline solution not those websites to do this.

Fourth paragraph - So you should see four paragraphs nicely formatted and the text flowing so it looks like it should.

Search and Replace and do a Replace ALL four times

1) Find: ^$ (finding only Line Breaks by themselves and not all)
Replace: \t (replaces with a tab)

2) Find: $ (finds all remaining line breaks)
Replace: space (just put a single space character and all the text flows together)

3) Find: \t (finds the tabs)
Replace: \n\n (puts a new line twice to create a paragraph break (on Windows does one need to put \r\n\r\n ?)

4) Find: spacespace (two spaces together " ")
Replace: space (one space)

Can be recorded as a macro. Tools | Options | Advanced | Enable macro recording (restart LibreOffice) if it wasn't already checked.

Ctrl-H to start Search and Replace. Tools | Options | Record Macro (a small dialog will popup showing Stop Recording). Perform actions 1 to 4 and when done click Stop Recording and Save Macro In click My Macros | Standard | New Module and name it RemoveLineBreaks.

Whenever you wish to run the macro do Tools | Macros | Run Macros select RemoveLineBreaks and click Run and line breaks are removed within document. Can also assign a keyboard shortcut to the macro. Tools | Customize | Keyboard
Under Shortcut Keys click on Shift+Ctrl+L (or any unassigned one)
Under Category | LibreOffice Macros | My Macros | Standard | RemoveLineBreaks | Under Function click RemoveLineBreaks then click Modify then Save and name your custom keyboard configuration file. All done.
1) choose keyboard shortcut (any unassigned one)

2) click RemoveLineBreaks

3) and again

4) click Modify

5) click Save and name your keyboard customization configuration file and save it
LMDE 5 beta I played around with under QEMU (like Virtualbox). Linux Mint Debian Edition I believe is the way forward. Canonical who does Ubuntu is evil with their Snap package management system. Flatpak and AppImage are way better.

Nemo (file manager) is better than Thunar (XFCE file manager) dual pane, search and search content within files. Thunar has better file renaming though like Numbering and better regrex in my testing. Still not sure if I’ll switch to Cinnamon or stick with XFCE. Getting off Ubuntu base is a huge step and it looks like Linux Mint can fully pull the plug if they wish to.
Palette Cam (free on iOS, no ads) is quick way to save colors and get hex color from an image. Save palettes without needing to create an online account
Translating services using google while protecting your privacy
https://lingva.ml/
https://simplytranslate.org/

DuckDuckGo many are going away from
https://www.qwant.com/
https://startpage.com/
you could use but at the end of the day they're all controlled. Even telegram CEO is WEF member.
Modified the way to OCR a PDF and only optimize PDFs using —skip-big instead of —tesseract-timeout=0
Forwarded from GeekTips
Only optimizing a PDF for file size and no need to OCR it so from 20.3MiB —> 10.7MiB.

-s is same as —skip-text (skips text if already OCR'd)
-O (that's a letter O not a 0 zero) - - optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)
- - skip-big tells it to skip any page over 0.1 Megapixels (which would be every page)

ocrmypdf -s -O 3 --skip-big .1 some.pdf some_optimized.pdf
Scan: 100%|███████████████████████████████████████████| 399/399 [00:35<00:00, 11.28page/s]
INFO - Start processing 4 pages concurrently
Forwarded from GeekTips
ocrmypdf -O 3 input.pdf output_ocr.pdf

-O 3 (letter O not 0 zero) is - - optimize and 3 does aggressive lossy optimizations (including lossy JBIG2)

OCR: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 125.0/125.0 [06:56<00:00, 3.33s/page]
WARNING - Some input metadata could not be copied because it is not permitted in PDF/A. You may wish to examine the output PDF's XMP metadata.
JPEGs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 125/125 [00:14<00:00, 8.29image/s]
PNGs: 0image [00:00, ?image/s]
JBIG2: 0item [00:00, ?item/s]
INFO - Optimize ratio: 1.74 savings: 42.5%
INFO - Output file is a PDF/A-2B (as expected)

PDF is 56MB reduced from 98MB. To use you need to install jbig2 encoder
which even after following the instructions I still had to install leptonica which was a pain but worth it in the end. If you don't install libtiff5-dev before you compile leptonica you'll get an error like
Error in pixReadMemTiff: function not present

sudo apt-get install libtiff5-dev

tar zxvf leoptonica-1.82.0.tar.gz
cd leptonica-1.82.0/
./autogen.sh
./configure
make
sudo make install

If you only want to optimize a PDF to reduce size by reducing images then see here https://t.iss.one/geektips/185
16kpbs Opus audio with chapters is ready for prime time. I suppose it always was perhaps even in 2020. Opus audio codec is superior at lower bitrates compared to AAC, mp3, vorbis, etc. Anyone can create opus 16kbps chaptered audiobooks with freac (GPL free) which is key for wide adoption.

I've been a fan of m4b chaptered 32kbps AAC audiobooks for awhile. Used m4b-tool to create hundreds of m4b audiobooks which is too complicated for the average user. Currently most developers of apps only support mp3 and m4b audiobooks but I believe it'll change eventually.

Make an Opus chaptered audiobook

1) It’s super easy for anyone to create a Opus chaptered audiobook using Freac freac.org which is GPL free on Windows, Mac and Linux. Drag your m4b, mp3s or opus files into freac and export to opus and set maximum bitrate to 16kpbs. Under Tag tab Artist = Author of audiobook and Album = Title of audiobook. Add a cover if you wish.

2) freac General Settings, Opus Encoder Settings, Tag Settings

3) freac main window setting Author of audiobook, chapter names, Encode to a single file, output folder

4) freac Metadata tags for Artist and Album, Cover, Encode

5) Update Cover without re-encoding with TagEditor

6) Playing Opus chaptered audiobook with VLC on Windows, Mac, Linux

7) Playing Opus chaptered audiobook with VLC on iOS, Android and Settings

8) move opus audiobooks from downloads to VLC directory in Files on iOS or share to open in VLC

9) Fix title tag to match mp3 filename for chapters with MusicBrainz Picard

10) Low volume m4b audiobook normalize the chapters using Audacity

11) Low volume m4b audiobook normalize each chapters audio using Ocenaudio

12) Automatically create chapters from a single mp3 audiobook

13) Manually create chapters from a single mp3 audiobook

14) Manually create chapters using timecodes with LosslessCut

15) Huge audiobook collections from various sources video / audio have various sample rates result in inaccurate chapter times

16) Edit chaptered opus audiobook chapter names or times with MusicBrainz Picard

17) automatically Human or Smart Title Case chapters and append : after chapter numbers

For advanced users you can convert an existing m4b chaptered audiobook with ffmpeg (5.x retains cover and 4.x doesn't)

ffmpeg -i input.m4b -vn -c:a libopus -b:a 16k output.opus

or just use videomass with this preset
-vn -c:a libopus -b:a 16k
Make an opus chaptered 16kbps audiobook with freac (free GPL) freac.org (Windows, Mac, Linux)

— General Settings
Filename pattern: <artist> - <album>
artist = Author of audiobook
album = Title of audiobook

under Tags section you only need to check Vorbis Comment as opus is the successor to ogg vorbis. You don't need ID3v2 or other ones so just uncheck the rest.

Selected encoder: Opus Audio Encoder
click Configure Selected Audio Encoder

1) Encoding mode: Voice (Auto is ok too)

2) File extension: opus (oga is a container but it's for various audio codecs and opus can embed cover art and metadata and chapters)

3) Uncheck Enable variable bitrate encoding. Won't save you anything. Can use for music encoding at 96kbps though if you wish.

4) Bitrate: 16kbps which is just as good as 32kbps m4b AAC in my tests. Obviously is half the file size as 32kbps.
Freeac main window

1) Artist = Author of audiobook

2) Title = chapter names. You can import an m4b chaptered audiobook or multiple mp3 or opus files and the chapter names will be the file names of the audio files.

3) Check Encode to a single file which creates an Opus chaptered audiobook if using multiple audio files.

4) Select filters: disabled

5) select an Output Folder: to save the opus audiobook in
1) click Tags tab then Albums tab

2) click line showing Artist and Album. If it says unknown fill in the appropriate text. Genre is optional but if you want to put one choose Audiobook.

3) Covers: add a Cover if you wish either a jpg or a png. It'll show Other unless you click on it then it'll show Cover (front).

4) click Encode to start encoding. The output file should show Author - Title.opus like Warner Von Lorne - Wanted 7 Fearless Engineers.opus

5) click Joblist tab to show the progress and time reaming for the audiobook encode
Update opus audiobook metadata without having to re-encode audiobook

I don't recommend using VLC to modify metadata for opus files as it may destroy cover metadata. Won't delete this post though.

1) VLC choose View | Playlist or Ctrl-L then double click on album cover to edit metadata. Can change Artist = author or Album = Title.

2) Can't change Album cover even by right clicking on it and changing some metadata and clicking Save Metadata. For that you need to use TagEditor (free GPL for Windows or Linux).
TagEditor https://github.com/Martchus/tageditor (free GPL Windows and Linux appimage) for those on Mac use MP3Tag

1) Can use to change the opus audiobook cover since VLC doesn't work actually work for changing covers.

2) Change Author = Artist or Title = Album of audiobook

3) Just pointing out under Tag Management it'll show you Vorbis Comment (in Opus stream)

4) Save your metadata changes
VLC videolan.org/vlc (free, GPL) on Linux, Mac, Windows play opus chaptered audibooks.

If you wish to increase or decrease the playback speed (0.25x to 4.00x) be sure to check under Preferences | Audio that Enable Time-Stretching audio is checked as they adjusts the pitch to improve output at faster or slower speeds. VLC on Linux doesn't show chapter durations or starting times unfortunately.
VLC Settings on iOS make sure Continue audio playback is set to Always so it’ll resume from the point you last listened to. For variable playback speed make sure Time-stretching audio is checked.

When importing opus audiobooks into VLC probably the best way is to transfer the opus audiobook(s) to Files on iOS then move them to the VLC directory.