NORMCAP and TextSnatcher both OCR images. Looks like NORMCAP does a tad better especially with line breaks. Both are good to have at your disposal though.
PDF with non-uniform images. Resize larger images smaller ones maintaining aspect ratio to solve this problem.
Smaller ones are: just a few examples
1501 x 1152
1491 x 1155
1547 x 1159
1514 x 1159
Larger ones are: just a few examples
4649 x 3610
4641 x 3647
4677 x 3597
4849 x 3819
Problem with this is in ScanTailor Advanced the smaller images remain small compared to the larger resolution images after setting margins. This pdf is landscape with two pages scanned per page.
A4 = 210mm x 297mm
A4 + A4 = A2 = 420mm x 594mm
I thought just using this
Smaller ones are: just a few examples
1501 x 1152
1491 x 1155
1547 x 1159
1514 x 1159
Larger ones are: just a few examples
4649 x 3610
4641 x 3647
4677 x 3597
4849 x 3819
Problem with this is in ScanTailor Advanced the smaller images remain small compared to the larger resolution images after setting margins. This pdf is landscape with two pages scanned per page.
A4 = 210mm x 297mm
A4 + A4 = A2 = 420mm x 594mm
I thought just using this
scale-to option would solve the trick but it didn't.pdftoppm -r 300 -scale-to 842 -tiff -tiffcompression deflate nonuniformimages.pdf dump/imgso instead just did this and didn't specify dpi
pdftoppm -tiff -tiffcompression deflate nonuniformimages.pdf dump/img
sort by size to select the larger resolution images and move them to a new directory.
mogrify will modify the existing images so make a quick copy of them just in case. ImageMagick doesn't have deflate compression so have to use LZW.
1510 x 1172
1519 x 1172
1516 x 1172
copy these back into your
mogrify will modify the existing images so make a quick copy of them just in case. ImageMagick doesn't have deflate compression so have to use LZW.
-resize 555 resizes width and -resize x555 does height.mogrify -resize x1172 -compress LZW dump3/*.tifThey maintained their aspect ratio and the new resolution on a few of them are:
1510 x 1172
1519 x 1172
1516 x 1172
copy these back into your
/dump directory and ScanTailor will process just fine now.You can try to align chapter numbers by adding blank pages or deleting some unwanted pages so the table of content numbers match the actual PDF page number.
Or you can use vim and a plugin called vim-NumUtils.
Great vim guide
Or you can use vim and a plugin called vim-NumUtils.
sudo apt install vimdownload
vim-NumUtils and after unzipping put the doc/ and plugin/ into the ~/.vim directory. It can do NumUtilsAdd, NumUtilsSub, NumUtilsMul, NumUtilsDiv
Here all the actual PDF page numbers are 3 pages behind the listed page numbers. :% NumUtilsSub 3, '\,'sometimes I type a space after the , so to get those too do
:% NumUtilsSub 3, '\, '
Quick vim commands:wq - write/save file and quit:q - quitu - undoCtrl-R - redoi - insert mode ESC to quit insert modea - append text% is a range for entire documentGreat vim guide
{
foreword by the author,6
{
poem:,
first, my country, 15
this is my own, my dear, and native land,16
god made me free, 21
to the stars and stripes,33
Renumber chapter page numbers in text file with math operations. How to do it with sed and awk
sometimes I'll type a space after the , so let's substitute space after a comma followed by a number then the 2nd substitution separated with a ; does same but has no spacesed -i -r 's/(, )([0-9])/,\2/; s/,([0-9])/%\1/' toc.txt
-i insert in place the /s (substitutions) looking for , followed by any number of digits. Some chapter titles have , in them and it'll ignore them-r extended regex so don't need to escape capturing groups\2 2nd capture group put a , then single digit to remove space\1 is the first capture group ([0-9]) which is just a single digit and re-inserting it after the %without touching other commas in chapter names
{
foreword by the author%6
{
poem:,
first, my country%15
this is my own, my dear, and native land%16
god made me free%21
to the stars and stripes%33
Now subtract 3 from the chapter page numbersawk -F% '{if (/%/) {print $1 "," $2-3} else {print $0}}' > renumbered.txt
-F% field separator , would work but if the chapters contain , which they do sometimes it screws up fields so needs to be something else like %
if (/%/) there's a match for % on the line thenprint $1 print field 1 which is the text string up till the %
"," will print a comma between $1 and $2 so it's ready for titlecase
print $2-3 print the number after the % and subtract 3 can do + addition * multiplication / division alsoelse print $0 prints entire line if no matchchanges to:
{
foreword by the author,3
{
poem:,
first, my country,12
this is my own, my dear, and native land,13
god made me free,18
to the stars and stripes,30
now it's ready for title casetitlecase -f renumbered.txt -o chapters.txtchanges to:
{
Foreword by the Author,3
{
Poem:,
First, My Country,12
This Is My Own, My Dear, and Native Land,13
God Made Me Free,18
To the Stars and Stripes,30
combine all three commands on one linesed -i -r 's/(, )([0-9])/,\2/; s/,([0-9])/%\1/' toc.txt && awk -F% '{if (/%/) {print $1 "," $2-3} else {print $0}}' toc.txt > renumbered.txt && titlecase -f renumbered.txt -o chapters.txt
then booky to add chapters to pdfbooky.sh SomeBook.pdf chapters.txt
If you want to offset the pages numbers by +5 just change $2-3 to $2+5 sed -i -r 's/(, )([0-9])/,\2/; s/,([0-9])/%\1/' toc.txt && awk -F% '{if (/%/) {print $1 "," $2+5} else {print $0}}' toc.txt > renumbered.txt && titlecase -f renumbered.txt -o chapters.txt
changes to:{
Foreword by the Author,11
{
Poem:,
First, My Country,20
This Is My Own, My Dear, and Native Land,21
God Made Me Free,26
To the Stars and Stripes,38Stumbled upon Sidebery and I prefer this one over Tree Style Tab. Sidebery has a bookmark side panel too. I'm aware Tree Style Tab has a bookmark extension.
Can create more panels for your tabs. Has a modern interface / feeling to it. One of those extensions I can live without but will try it out. One huge thing for me is when hovering over sideberry bookmarks it'll also display full URL unlike with regular tabs.
Can create more panels for your tabs. Has a modern interface / feeling to it. One of those extensions I can live without but will try it out. One huge thing for me is when hovering over sideberry bookmarks it'll also display full URL unlike with regular tabs.
addons.mozilla.org
Sidebery – Get this Extension for 🦊 Firefox (en-US)
Download Sidebery for Firefox. Vertical tabs tree and bookmarks in sidebar with advanced containers configuration, grouping and many other features.
Sidebery — the only settings I changed to get it to my liking was
Active tab background color (purple) #613583ff
Bookmark background color on click (purple) #613583ff
Closed folder color (blue) #62a0eaff
Expanded folder color (orange) #ff7800ff (not working now..bug)
Settings | Appearance | Color Scheme darkSettings | Styles editor |TabsBackground color on hover (brown) #63452cff
Active tab background color (purple) #613583ff
Settings | Styles editor |BookmarksBookmark background color on hover (brown) #63452cff
Bookmark background color on click (purple) #613583ff
Closed folder color (blue) #62a0eaff
Expanded folder color (orange) #ff7800ff (not working now..bug)
Modifying opus chaptered audiobooks
linux (some are multiplatform) apps that can
puddletag
kid3
Ex Falso
puddletag (Edit | Extended tags)
kid3
puddletag
functions choose Filename to Tag
Patter: %title%
kid3
Format: (up arrow) %f
Format: (down arrow) %{title} press Tag 2
Ex Falso
Tags from Path put <title> also can save it as a pattern named whatever
MusicBrainz Picard
Options | Options | User Interface
add Parse File Names
then just select all Ctrl-A and click Parse File Names
Since puddletag and kid3 can do all three I'll quickly show them
linux (some are multiplatform) apps that can
add or remove cover imagetageditor
puddletag
kid3
Modify existing opus chapter names from opus audiobookMusicBrainz Picard
Ex Falso
puddletag (Edit | Extended tags)
kid3
add metadata tag from filename (so one can name chapter names from filenames before adding to freac)puddletag
functions choose Filename to Tag
Patter: %title%
kid3
Format: (up arrow) %f
Format: (down arrow) %{title} press Tag 2
Ex Falso
Tags from Path put <title> also can save it as a pattern named whatever
MusicBrainz Picard
Options | Options | User Interface
add Parse File Names
then just select all Ctrl-A and click Parse File Names
Since puddletag and kid3 can do all three I'll quickly show them
Smart Title Case Converter. This one is the best online one and does it great. Doesn't matter all that much if you use AP, Bluebook, Chicago...just avoid NYTimes as that had weird exceptions.
For an explanation of which words NOT to capitalize see https://titlecaseconverter.com/words-to-capitalize/
So if I want to rename mp3 files with Smart Title case before adding to freac to make an opus chaptered audiobook it's not so easy. Sure any filemanager can Title Case them but not Smart Title case them. Ex Falso has a plugin for Human Title Case but it only works on one file at a time. So here's the bash script I wrote to do that. The end part of renaming old files to new I found online.
For an explanation of which words NOT to capitalize see https://titlecaseconverter.com/words-to-capitalize/
So if I want to rename mp3 files with Smart Title case before adding to freac to make an opus chaptered audiobook it's not so easy. Sure any filemanager can Title Case them but not Smart Title case them. Ex Falso has a plugin for Human Title Case but it only works on one file at a time. So here's the bash script I wrote to do that. The end part of renaming old files to new I found online.
Titlecaseconverter
Title Case Converter – A Smart Title Capitalization Tool
Automatically convert text to title case (AMA, AP, APA, MLA, Chicago), sentence case (with proper noun handling), uppercase, lowercase. Try it out!
try this in a test directory
1 this is to test out chapters.mp3
2 lord of the rings.mp3
3 the sun rises in the west.mp3
44 of all the various species.mp3
45 never again but tomorrow.mp3
101 should work till 999 king louis xiv here.mp3
241 but will not work with period between numbers.mp3
And the goal is to get them to be like so:
1: This Is to Test Out Chapters.mp3
2: Lord of the Rings.mp3
3: The Sun Rises in the West.mp3
44: Of All the Various Species.mp3
45: Never Again but Tomorrow.mp3
101: Should Work Till 999 King Louis XIV Here.mp3
241: But Will Not Work With Period Between Numbers.mp3
1.2 some chapter
1.3 some chapter
these won't work as it'll put 1:.2 and 1:.3
You must add
45 the flowers
45 the Flowers
instead of
45: the flowers
45: The Flowers
it will make
Just name this script chaptersrename.sh
touch '1 this is to test out chapters.mp3' '2 lord of the rings.mp3' '3 the sun rises in the west.mp3' '44 of all the various species.mp3' '45 never again but tomorrow.mp3' '101 should work till 999 king louis xiv here.mp3' '241 but will not work with period between numbers.mp3'
have these mp3 files that look like so:1 this is to test out chapters.mp3
2 lord of the rings.mp3
3 the sun rises in the west.mp3
44 of all the various species.mp3
45 never again but tomorrow.mp3
101 should work till 999 king louis xiv here.mp3
241 but will not work with period between numbers.mp3
And the goal is to get them to be like so:
1: This Is to Test Out Chapters.mp3
2: Lord of the Rings.mp3
3: The Sun Rises in the West.mp3
44: Of All the Various Species.mp3
45: Never Again but Tomorrow.mp3
101: Should Work Till 999 King Louis XIV Here.mp3
241: But Will Not Work With Period Between Numbers.mp3
1.2 some chapter
1.3 some chapter
these won't work as it'll put 1:.2 and 1:.3
You must add
: or a - after a number otherwise titlecase won't capitalize the word after the number. 45 the flowers
45 the Flowers
instead of
45: the flowers
45: The Flowers
it will make
xiv capital but not if it's at the end though. I'll share my titlecase.txt file I have in ~/. directory for the titlecase script.Just name this script chaptersrename.sh
chmod a+x chaptersrename.shalso put
sudo chaptersrename.sh /usr/bin
echo before mv like echo mv if just wish to preview the changes. You do need to replace mp3 three places are better just rename your file extensions from say m4a to mp3 temporarilytitlecase.txt
1.8 KB
This is the ~/.titlecaste.txt file I use for exceptions for titlecase script. A huge list of roman numerals I II I. II. I, II,
remember though if at end of chapter it won't convert it
1: the life of louis xiv
1: The Life of Louis Xiv
5: the life of Louis xiv and other stuff
5: The Life of Louis XIV and Other Stuff
a small preview of text
i.e.
e.g.
etc.
#Roman Numerals 1-99 and . + , after
I
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
XV
remember though if at end of chapter it won't convert it
1: the life of louis xiv
1: The Life of Louis Xiv
5: the life of Louis xiv and other stuff
5: The Life of Louis XIV and Other Stuff
a small preview of text
i.e.
e.g.
etc.
#Roman Numerals 1-99 and . + , after
I
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
XV