GeekTips
109 subscribers
586 photos
3 videos
77 files
231 links
Linux Mint, video encoding, ffmpeg, geek tips, regex, pdf manipulation, substitcher, mpv config
Download Telegram
Edit a few videos together and add soundtracks in Shotcut (Linux, Mac, Windoze GPL)
Using LosslessCut (Linux, Mac, Windoze GPL) to make quick editing cuts of mp3s. Trim off the first 22 seconds and last 20 seconds of each file before encoding into a opus chaptered audiobook with freac.
These are the options that I use for mp3s. But for videos I use SmartCut or keyframe cuts.
Queue up the files you wish to batch download in Videomass (Linux, Mac, Windoze GPL free) which uses yl-dlp to downlaod from youtube, bitchute, odysee, etc.
Download all the videos in Videomass
Queue up all the videos you wish to re-encode to reduce file size keeping 720p.
-c:v hevc -crf 28 -c:a libopus -b:a 16k -vf scale="-2:720"

The preset I use. - 2 keeps aspect ratio even if you upscale or downscale video. hevc = x265. -crf 28 I use for most videos. -crf 23 for a great documentary or movie and -crf 31 for VHS quality.
Legogender
Liaspec
Liaspec
Librafeminine
Librafeminine
Libragender
Libragender
Libralesbian
Libralesbian
Libramasculine
Libramasculine
Libramaverique
Libramaverique
Librandrogyne
Librandrogyne
Libranonbinary
Libranonbinary
Lilafluid
Lilafluid
Lingender
Lingender
Linkgender
Littlefluid
Littlefluid
Lolgender
Ludogender
Lunagender
Lunagender
Lunarset
Lunarset

Remove duplicate lines without changing order
nl -w1 gender.txt | sort -k2 | uniq -f1 | sort -n | cut -f2- > output.txt

or this one works too keeping original order
awk '!seen[$0]++' gender.txt > output.txt

Legogender
Liaspec
Librafeminine
Libragender
Libralesbian
Libramasculine
Libramaverique
Librandrogyne
Libranonbinary
Lilafluid
Lingender
Linkgender
Littlefluid
Lolgender
Ludogender
Lunagender
Lunarset
regex searches for and replaces digits up to 13 times after a dash -
-(\d){13}
regex searches for a replaces any digits mixed with periods / dots 00.34.77 which are timestamps created by LosslessCut
you need to use a . instead of using quantifier or whatever it's called. I used 23 or 24 ..... periods.
-(\d)........................

this also works
-(\d+).(\d+).(\d+).(\d+).(\d+).(\d+).(\d+).(\d+)
(\D)-(\d)........................
$1
multiple dashes in filename so (\D) matches non-digit (like abcd, etc.) then a dash and replace with first string $1 which is the letter. Otherwise the last letter gets chopped off at end of filename.

If there is a numeral at the end like Part 1 change the first (\D) to lowercase to indicate digit like so
(\d)-(\d)........................
\1
This book I'm making into an audiobook but the original OCR on the document is pretty much impossible to correct.

pdftotext -layout book.pdf output.txt
so had to force ocr it
ocrmypdf - -force-ocr book.pdf book_ocr.pdf
and now it's a tad better. Formatting isn't all that important for text to speech.
Removing hyphens from hyphenated words at the end of a line. Notice for the text to speech to work correctly need to change defi- nitely to definitely and don't change non-partisanship as it's correct as it is.
-$\n\s+
-
is dash
$ says at the end of a line
\n line break
\s is whitespace (blank spaces)
\s+ any amount of whitespace
It didn't get im- portance nor cir- cumstances as there wasn't any whitespace after the dash -. So search and replace all again using -$\n
now importance and circumstances are correct and non-partisanship isn't changed. Now just have to spell check it before feeding it to ttstool (text to speech)
batch compressed each PDF by about 75%.

made a subdirectory output then
parallel --tag -j 2 ocrmypdf -s -O 2 --skip-big .1 '{}' 'output/{}' ::: *.pdf

Got an OutofMemory Heap error in PDFSam when trying to process a ton of PDFs. So start PDFSam this way with using 2.4GB of memory instead of the 512MB default for java apps.

java -jar -Xmx2400m /opt/pdfsam-basic/pdfsam-basic-4.3.0.jar

As to why PDFSam isn't compressing even with PDF 1.5 checked I have no idea. Thus it's necessary to use ocrmypdf to do the compression.
One PDF had the years 1960 to 1985 and if merged it would have a single entry in the Table of Contents named 1960-1985. I wanted each one from 1960 on to have it's own link in the TOC (table of contents).

Solution was to Split the PDF by Bookmark with PDFSAM.
To Split by Bookmark choose level 1 and in File names settings right click and choose [BOOKMARK_NAME] and delete PDF_SAM
Putting spaces between joined capitalized words with regex — renaming files

TheParableofTheFigTree rename it to The Parable of The Fig Tree

Search and replace all using
([a-z])([A-Z])
replace all with
$1 $2    

[a-z] lowercase letters
([a-z]) groups that single letter
[A-Z] uppercase letters
$1 1st string and $2 2nd string

Make sure Case Sensitive Search is checked