GeekTips

regex searches for a replaces any digits mixed with periods / dots 00.34.77 which are timestamps created by

LosslessCut

you need to use a . instead of using quantifier or whatever it's called. I used 23 or 24 ..... periods.

-(\d)........................

this also works
-(\d+).(\d+).(\d+).(\d+).(\d+).(\d+).(\d+).(\d+)

84 viewsedited 22:16

GeekTips

(\D)-(\d)........................
$1

multiple dashes in filename so (\D) matches non-digit (like abcd, etc.) then a dash and replace with first string $1 which is the letter. Otherwise the last letter gets chopped off at end of filename.

If there is a numeral at the end like Part 1 change the first (\D) to lowercase to indicate digit like so

(\d)-(\d)........................
\1

95 viewsedited 20:02

GeekTips

This book I'm making into an audiobook but the original OCR on the document is pretty much impossible to correct.

pdftotext -layout book.pdf output.txt

91 views21:06

GeekTips

so had to force ocr it

ocrmypdf - -force-ocr book.pdf book_ocr.pdf

and now it's a tad better. Formatting isn't all that important for text to speech.

91 views21:07

GeekTips

Removing hyphens from hyphenated words at the end of a line. Notice for the text to speech to work correctly need to change defi- nitely to definitely and don't change non-partisanship as it's correct as it is.

88 viewsedited 21:44

GeekTips

-$\n\s+
-

is dash
$ says at the end of a line
\n line break
\s is whitespace (blank spaces)
\s+ any amount of whitespace

88 views21:46

GeekTips

It didn't get im- portance nor cir- cumstances as there wasn't any whitespace after the dash -. So search and replace all again using -$\n

91 viewsedited 21:49

GeekTips

now importance and circumstances are correct and non-partisanship isn't changed. Now just have to spell check it before feeding it to ttstool (text to speech)

109 viewsedited 21:50

GeekTips

batch compressed each PDF by about 75%.

made a subdirectory output then

parallel --tag -j 2 ocrmypdf -s -O 2 --skip-big .1 '{}' 'output/{}' ::: *.pdf

Got an OutofMemory Heap error in PDFSam when trying to process a ton of PDFs. So start PDFSam this way with using 2.4GB of memory instead of the 512MB default for java apps.

java -jar -Xmx2400m /opt/pdfsam-basic/pdfsam-basic-4.3.0.jar

As to why PDFSam isn't compressing even with PDF 1.5 checked I have no idea. Thus it's necessary to use ocrmypdf to do the compression.

109 viewsedited 04:19

GeekTips

One PDF had the years 1960 to 1985 and if merged it would have a single entry in the Table of Contents named 1960-1985. I wanted each one from 1960 on to have it's own link in the TOC (table of contents).

Solution was to Split the PDF by Bookmark with PDFSAM.

96 views23:51

GeekTips

To Split by Bookmark choose level 1 and in File names settings right click and choose [BOOKMARK_NAME] and delete PDF_SAM

91 views23:52

GeekTips

Putting spaces between joined capitalized words with regex — renaming files

TheParableofTheFigTree rename it to

The Parable of The Fig Tree

Search and replace all using

([a-z])([A-Z])

replace all with

$1 $2

[a-z] lowercase letters
([a-z]) groups that single letter
[A-Z] uppercase letters
$1 1st string and $2 2nd string

Make sure Case Sensitive Search is checked

103 viewsedited 06:28

GeekTips

Must repeat the search and replace again for 2nd instance and the once again for 3rd instance and so on.

124 views06:30

GeekTips

3rd instance

124 views06:33

GeekTips

Now do the same for numbers and years, etc.

search using

([a-z])([0-9])

replace all with

\1 \2

[0-9) for numbers
$1 $2 or \1 \2 are the same...use either syntax

96 views06:33

GeekTips

2:14

This media is not supported in your browser

VIEW IN TELEGRAM

Beavis and Butthead learn of their White Privilege

original video re-encoded to 720p but has a low volume

ffmpeg -i beavisprivilege_original.mp4 -af volumedetect -f null /dev/null

shows the following

mean_volume: -32.7 dB
max_volume: -7.6 dB

109 views22:45

GeekTips

To boost the audio volume by 12dB (make sure to capitalize the B) add -af "volume=12dB" just before the -vf (video filter). -af stands for audio filter

If you just wish to re-encode the audio without having to re-encode the video again change the following

-c:v hevc -crf 28 -c:a libopus -b:a 16k -af "volume=12dB" -vf scale="-2:720"

change to as to copy the video not re-encoding it

-c:v copy -c:a libopus -b:a 16k -af "volume=12dB"

124 views22:50

GeekTips

2:14

This media is not supported in your browser

VIEW IN TELEGRAM

This is the video with boosted audio volume and

ffmpeg -i beavisprivilege_12dB+.mp4 -af volumedetect -f null /dev/null

shows the following

mean_volume: -20.8 dB
max_volume: 0.0 dB

144 viewsedited 22:51

GeekTips

ocrmypdf -O 3 --deskew input.pdf output.pdf

--deskew option straightens out PDFs

batch process PDFs using OCR
parallel --tag -j 2 ocrmypdf -O 3 --deskew '{}' 'output/{}' ::: *.pdf

155 viewsedited 02:07

GeekTips

Use PDF Arranger and use SHIFT to select the range of each magazine issue. CTRL-E to export selection to a single PDF. Rename each PDF to the year and month of the magazine. Then can use PDFSAM to generate a index / table of contents.

Sidenote: PDF Slicer also works but is more cumbersome for this particular task as it requires you to select the issue then CTRL-I (invert selection) then delete all pages selected. Then Save as ...once done then undo and wait for thumbnails to be re-generated.

167 views12:29

GeekTips

Convert an epub to PDF with an Outline using free Calibre command line.

ebook-convert  input.epub output.pdf --base-font-size=13 --change-justification=justify --embed-font-family=freesans

I prefer justification rather than left alignment and hyphenation.

168 viewsedited 20:20

About

Blog

Apps

Platform