GeekTips
109 subscribers
586 photos
3 videos
77 files
231 links
Linux Mint, video encoding, ffmpeg, geek tips, regex, pdf manipulation, substitcher, mpv config
Download Telegram
-$\n\s+
-
is dash
$ says at the end of a line
\n line break
\s is whitespace (blank spaces)
\s+ any amount of whitespace
It didn't get im- portance nor cir- cumstances as there wasn't any whitespace after the dash -. So search and replace all again using -$\n
now importance and circumstances are correct and non-partisanship isn't changed. Now just have to spell check it before feeding it to ttstool (text to speech)
batch compressed each PDF by about 75%.

made a subdirectory output then
parallel --tag -j 2 ocrmypdf -s -O 2 --skip-big .1 '{}' 'output/{}' ::: *.pdf

Got an OutofMemory Heap error in PDFSam when trying to process a ton of PDFs. So start PDFSam this way with using 2.4GB of memory instead of the 512MB default for java apps.

java -jar -Xmx2400m /opt/pdfsam-basic/pdfsam-basic-4.3.0.jar

As to why PDFSam isn't compressing even with PDF 1.5 checked I have no idea. Thus it's necessary to use ocrmypdf to do the compression.
One PDF had the years 1960 to 1985 and if merged it would have a single entry in the Table of Contents named 1960-1985. I wanted each one from 1960 on to have it's own link in the TOC (table of contents).

Solution was to Split the PDF by Bookmark with PDFSAM.
To Split by Bookmark choose level 1 and in File names settings right click and choose [BOOKMARK_NAME] and delete PDF_SAM
Putting spaces between joined capitalized words with regex — renaming files

TheParableofTheFigTree rename it to The Parable of The Fig Tree

Search and replace all using
([a-z])([A-Z])
replace all with
$1 $2    

[a-z] lowercase letters
([a-z]) groups that single letter
[A-Z] uppercase letters
$1 1st string and $2 2nd string

Make sure Case Sensitive Search is checked
Must repeat the search and replace again for 2nd instance and the once again for 3rd instance and so on.
3rd instance
Now do the same for numbers and years, etc.

search using
([a-z])([0-9])
replace all with
\1 \2

[0-9) for numbers
$1 $2 or \1 \2 are the same...use either syntax
This media is not supported in your browser
VIEW IN TELEGRAM
Beavis and Butthead learn of their White Privilege

original video re-encoded to 720p but has a low volume

ffmpeg -i beavisprivilege_original.mp4 -af volumedetect -f null /dev/null
shows the following
mean_volume: -32.7 dB
max_volume: -7.6 dB
To boost the audio volume by 12dB (make sure to capitalize the B) add -af "volume=12dB" just before the -vf (video filter). -af stands for audio filter

If you just wish to re-encode the audio without having to re-encode the video again change the following
-c:v hevc -crf 28 -c:a libopus -b:a 16k -af "volume=12dB" -vf scale="-2:720"
change to as to copy the video not re-encoding it
-c:v copy -c:a libopus -b:a 16k -af "volume=12dB"
This media is not supported in your browser
VIEW IN TELEGRAM
This is the video with boosted audio volume and
ffmpeg -i beavisprivilege_12dB+.mp4 -af volumedetect -f null /dev/null

shows the following
mean_volume: -20.8 dB
max_volume: 0.0 dB
ocrmypdf -O 3 --deskew input.pdf output.pdf

--deskew option straightens out PDFs

batch process PDFs using OCR
parallel --tag -j 2 ocrmypdf -O 3 --deskew '{}' 'output/{}' ::: *.pdf
Use PDF Arranger and use SHIFT to select the range of each magazine issue. CTRL-E to export selection to a single PDF. Rename each PDF to the year and month of the magazine. Then can use PDFSAM to generate a index / table of contents.

Sidenote: PDF Slicer also works but is more cumbersome for this particular task as it requires you to select the issue then CTRL-I (invert selection) then delete all pages selected. Then Save as ...once done then undo and wait for thumbnails to be re-generated.
Convert an epub to PDF with an Outline using free Calibre command line.

ebook-convert input.epub output.pdf --base-font-size=13 --change-justification=justify --embed-font-family=freesans

I prefer justification rather than left alignment and hyphenation.
Converted PDF from epub with an Outline
Made montages in Fotoxx then combined montage images from a directory into a PDF.
sudo apt install img2pdf

(or python)
pip3 install img2pdf

img2pdf *.jpg -o output.pdf

If you have images and a few are much bigger than the others you might get extremely small pages in your document. Use Pix or Image View (Xviewer) to quickly browse through the images and check out the image dimensions. So if most are say 2000 x 1500 or so and just a few are 3000 x 2500 or higher set the max pixel height and width and all PDF pages will be relatively uniform in size and with no white margins.

img2pdf --imgsize 2000x2000 *.jpg -o output.pdf

When combining multiple PDFs with various page sizes in PDFSAM choose Normalize Page Size (all pages same width as first page in PDF). Able to compress this from 145MiB to 81MiB.

Then when compressing it since it won't do pdfa 1.6 (which is default) so you have to specify - -output-type=pdf

ocrmypdf -O 2 --skip-big .1 -s --output-type=pdf PDFsam_merge.pdf output.pdf
Also their free app AnyBuffer (free tip app) works well as a clipboard app.
Noise Reduction to remove hiss, hum and increase volume on terrible audio quality of speeches / talks — played around with many different methods.

Audacity you need to get a profile of a few seconds of no talking...cumbersome. Even tried noise repellent plugin and it's impressive but couldn't figure out how to apply it even after hitting apply the changes wouldn't take effect.

In videomass (ffmpeg frontend GUI)
-c:a libopus -vbr off -b:a 32k -ar 48000 -af highpass=200,lowpass=3000,afftdn,aformat=channel_layouts=stereo,volume=12dB,dynaudnorm

also tried highpass=500 and lowpass=1000 and it's not bad but not great for super super noisy. But just hiss and old recordings and especially for batch processing this can't be beat.

Ultimately the one I decided I liked best is OcenAudio (free Linux, Mac, Win) installed the deb file and used it in the past for normalization.

Oceanaudio there is a manual one which I played around with and if you do choose that one I suggest you only need to change the Reduction Factor (Noise Reductor tab) say from 12dB to 20dB after you click Get Profile.

Oceanaudio Automatic Noise Reduction. First though I apply Amplitude | Gain of 200% (+6dB) or 250% (+8dB). Then apply Automatic Noise Reduction once or even twice if necessary.
Oceanaudio spectrogram view