Machine Learning with Python

# 📚 Python Tutorial: Convert EPUB to PDF (Preserving Images)
#Python #EPUB #PDF #EbookConversion #Automation

This comprehensive guide will show you how to convert EPUB files (including those with images) to high-quality PDFs using Python.

---

## 🔹 Required Tools & Libraries
We'll use these Python packages:
- ebooklib - For EPUB parsing
- pdfkit (wrapper for wkhtmltopdf) - For PDF generation
- Pillow - For image handling (optional)

pip install ebooklib pdfkit pillow

Also install system dependencies:

# On Ubuntu/Debian
sudo apt-get install wkhtmltopdf

# On MacOS
brew install wkhtmltopdf

# On Windows (download from wkhtmltopdf.org)

---

## 🔹 Step 1: Extract EPUB Contents
First, we'll unpack the EPUB file to access its HTML and images.

from ebooklib import epub
from bs4 import BeautifulSoup
import os

def extract_epub(epub_path, output_dir):
    book = epub.read_epub(epub_path)
    
    # Create output directory
    os.makedirs(output_dir, exist_ok=True)
    
    # Extract all items (chapters, images, styles)
    for item in book.get_items():
        if item.get_type() == epub.ITEM_IMAGE:
            # Save images
            with open(os.path.join(output_dir, item.get_name()), 'wb') as f:
                f.write(item.get_content())
        elif item.get_type() == epub.ITEM_DOCUMENT:
            # Save HTML chapters
            with open(os.path.join(output_dir, item.get_name()), 'wb') as f:
                f.write(item.get_content())
    
    return [item.get_name() for item in book.get_items() if item.get_type() == epub.ITEM_DOCUMENT]

---

## 🔹 Step 2: Convert HTML to PDF
Now we'll convert the extracted HTML files to PDF while preserving images.

import pdfkit
from PIL import Image  # For image validation (optional)

def html_to_pdf(html_files, output_pdf, base_dir):
    options = {
        'encoding': "UTF-8",
        'quiet': '',
        'enable-local-file-access': '',  # Critical for local images
        'no-outline': None,
        'margin-top': '15mm',
        'margin-right': '15mm',
        'margin-bottom': '15mm',
        'margin-left': '15mm',
    }
    
    # Validate images (optional)
    for html_file in html_files:
        soup = BeautifulSoup(open(os.path.join(base_dir, html_file)), 'html.parser')
        for img in soup.find_all('img'):
            img_path = os.path.join(base_dir, img['src'])
            try:
                Image.open(img_path)  # Validate image
            except Exception as e:
                print(f"Image error in {html_file}: {e}")
                img.decompose()  # Remove broken images
    
    # Convert to PDF
    pdfkit.from_file(
        [os.path.join(base_dir, f) for f in html_files],
        output_pdf,
        options=options
    )

---

## 🔹 Step 3: Complete Conversion Function
Combine everything into a single workflow.

def epub_to_pdf(epub_path, output_pdf, temp_dir="temp_epub"):
    try:
        print(f"Converting {epub_path} to PDF...")
        
        # Step 1: Extract EPUB
        print("Extracting EPUB contents...")
        html_files = extract_epub(epub_path, temp_dir)
        
        # Step 2: Convert to PDF
        print("Generating PDF...")
        html_to_pdf(html_files, output_pdf, temp_dir)
        
        print(f"Success! PDF saved to {output_pdf}")
        return True
    
    except Exception as e:
        print(f"Conversion failed: {str(e)}")
        return False
    finally:
        # Clean up temporary files
        if os.path.exists(temp_dir):
            import shutil
            shutil.rmtree(temp_dir)

---

## 🔹 Advanced Options
### 1. Custom Styling
Add CSS to improve PDF appearance:

def html_to_pdf(html_files, output_pdf, base_dir):
    options = {
        # ... previous options ...
        'user-style-sheet': 'styles.css',  # Custom CSS
    }
    
    # Create CSS file if needed
    css = """
    body { font-family: "Times New Roman", serif; font-size: 12pt; }
    img { max-width: 100%; height: auto; }
    """
    with open(os.path.join(base_dir, 'styles.css'), 'w') as f:
        f.write(css)
    
    pdfkit.from_file(/* ... */)

❤11🔥2🎉1

5.87K views10:48

Machine Learning with Python

Photo

### 2. Handling Complex EPUBs
For problematic EPUBs, try this pre-processing:

def clean_html(html_file):
    with open(html_file, 'r+', encoding='utf-8') as f:
        content = f.read()
        soup = BeautifulSoup(content, 'html.parser')
        
        # Remove problematic elements
        for element in soup(['script', 'iframe', 'object']):
            element.decompose()
            
        # Fix image paths
        for img in soup.find_all('img'):
            if not os.path.isabs(img['src']):
                img['src'] = os.path.abspath(os.path.join(os.path.dirname(html_file), img['src']))
        
        # Write back cleaned HTML
        f.seek(0)
        f.write(str(soup))
        f.truncate()

---

## 🔹 Full Usage Example

if __name__ == "__main__":
    import argparse
    
    parser = argparse.ArgumentParser(description='Convert EPUB to PDF')
    parser.add_argument('epub_file', help='Input EPUB file path')
    parser.add_argument('pdf_file', help='Output PDF file path')
    args = parser.parse_args()
    
    success = epub_to_pdf(args.epub_file, args.pdf_file)
    if not success:
        exit(1)

Run from command line:

python epub_to_pdf.py input.epub output.pdf

---

## 🔹 Troubleshooting Common Issues
| Problem | Solution |
|---------|----------|
| Missing images | Ensure enable-local-file-access is set |
| Broken CSS paths | Use absolute paths in CSS references |
| Encoding issues | Specify UTF-8 in both HTML and pdfkit options |
| Large file sizes | Optimize images before conversion |
| Layout problems | Add CSS media queries for print |

---

## 🔹 Alternative Libraries
If pdfkit doesn't meet your needs:

1. WeasyPrint (pure Python)

   pip install weasyprint

2. PyMuPDF (fitz)

   pip install pymupdf

3. Calibre's ebook-convert CLI

   ebook-convert input.epub output.pdf

---

## 🔹 Best Practices
1. Always clean temporary files after conversion
2. Validate input EPUBs before processing
3. Handle metadata (title, author, etc.)
4. Batch process multiple files with threading
5. Log conversion results for debugging

---

### 📚 Final Notes
This solution preserves:
✔️ All images in original quality
✔️ Chapter structure and formatting
✔️ Text encoding and special characters

For production use, consider adding:
- Progress tracking
- Parallel conversion of chapters
- EPUB metadata preservation
- Custom cover page support

#PythonAutomation #EbookTools #PDFConversion 🚀

Try enhancing this script by:
1. Adding a progress bar
2. Preserving table of contents
3. Supporting custom cover pages
4. Creating a GUI version

https://t.iss.one/CodeProgrammer ❤️

❤18

6.08K viewsedited 10:48

Machine Learning with Python

Forwarded from Machine Learning with Python

This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

✅

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

✅

https://t.iss.one/Codeprogrammer

Please open Telegram to view this post

VIEW IN TELEGRAM

❤7💯2

4.13K views05:35

Machine Learning with Python

30 NumPy MCQs with solutions

Are you ready??

Let's start: https://codeprogrammer.notion.site/30-NumPy-MCQs-with-solutions-23ccd3a4dba9803e8fafe39a110a3f9e?source=copy_link

Please open Telegram to view this post

VIEW IN TELEGRAM

❤8

7.12K views06:02

Machine Learning with Python

📚 JaidedAI/EasyOCR — an open-source Python library for Optical Character Recognition (OCR) that's easy to use and supports over 80 languages out of the box.

### 🔍 Key Features:

🔸 Extracts text from images and scanned documents — including handwritten notes and unusual fonts
🔸 Supports a wide range of languages like English, Russian, Chinese, Arabic, and more
🔸 Built on PyTorch — uses modern deep learning models (not the old-school Tesseract)
🔸 Simple to integrate into your Python projects

### ✅ Example Usage:

import easyocr

reader = easyocr.Reader(['en', 'ru'])  # Choose supported languages
result = reader.readtext('image.png')

### 📌 Ideal For:

✅ Text extraction from photos, scans, and documents
✅ Embedding OCR capabilities in apps (e.g. automated data entry)

🔗 GitHub: https://github.com/JaidedAI/EasyOCR

👉 Follow us for more: @DataScienceN

#Python #OCR #MachineLearning #ComputerVision #EasyOCR

❤3👎1🎉1

6.39K views06:39

Machine Learning with Python

Transformer Lesson - Part 1/7: Introduction and Architecture

Let's start:
https://hackmd.io/@husseinsheikho/transformers

❤7👍3

6.78K viewsedited 10:30

Machine Learning with Python

🔥

Master Vision Transformers with 65+ MCQs!

🔥

Are you preparing for AI interviews or want to test your knowledge in Vision Transformers (ViT)?

🧠 Dive into 65+ curated Multiple Choice Questions covering the fundamentals, architecture, training, and applications of ViT — all with answers!

🌐 Explore Now: https://hackmd.io/@husseinsheikho/vit-mcq

🔹 Table of Contents
Basic Concepts (Q1–Q15)
Architecture & Components (Q16–Q30)
Attention & Transformers (Q31–Q45)
Training & Optimization (Q46–Q55)
Advanced & Real-World Applications (Q56–Q65)
Answer Key & Explanations

Please open Telegram to view this post

VIEW IN TELEGRAM

❤6

7.53K viewsedited 10:19

Machine Learning with Python

0:38

This media is not supported in your browser

VIEW IN TELEGRAM

🧹

ObjectClear — an AI-powered tool for removing objects from images effortlessly.

⚙️ What It Can Do:

🖼️ Upload any image
🎯 Select the object you want to remove
🌟 The model automatically erases the object and intelligently reconstructs the background

⚡️ Under the Hood:

— Uses Segment Anything (SAM) by Meta for object segmentation
— Leverages Inpaint-Anything for realistic background generation
— Works in your browser with an intuitive Gradio UI

✔️ Fully open-source and can be run locally.

📎 GitHub: https://github.com/zjx0101/ObjectClear

Please open Telegram to view this post

VIEW IN TELEGRAM

❤11

6.96K viewsedited 04:27

Machine Learning with Python

🚀 Comprehensive Tutorial: Build a Folder Monitoring & Intruder Detection System in Python

In this comprehensive, step-by-step tutorial, you will learn how to build a real-time folder monitoring and intruder detection system using Python.

🔐 Your Goal:
Create a background program that:
- Monitors a specific folder on your computer.
- Instantly captures a photo using the webcam whenever someone opens that folder.
- Saves the photo with a timestamp in a secure folder.
- Runs automatically when Windows starts.
- Keeps running until you manually stop it (e.g., via Task Manager or a hotkey).

Read and get code: https://hackmd.io/@husseinsheikho/Build-a-Folder-Monitoring

Please open Telegram to view this post

VIEW IN TELEGRAM

❤8🔥1🎉1

7K viewsedited 05:41

Machine Learning with Python

🚀 Comprehensive Guide: How to Prepare for an Image Processing Job Interview – 500 Most Common Interview Questions

Let's start: https://hackmd.io/@husseinsheikho/IP

#ImageProcessing #ComputerVision #OpenCV #Python #InterviewPrep #DigitalImageProcessing #MachineLearning #AI #SignalProcessing #ComputerGraphics

Please open Telegram to view this post

VIEW IN TELEGRAM

❤4👎1🔥1

6.11K views09:59

Machine Learning with Python

A useful find on GitHub CheatSheets-for-Developers

LINK: https://github.com/crescentpartha/CheatSheets-for-Developers

This is a huge collection of cheat sheets for a wide variety of technologies:

Conveniently structured — you can quickly find the topic you need.

Save it and use it 🔥

👉

@DATASCIENCEN

Please open Telegram to view this post

VIEW IN TELEGRAM

❤6👍2

6.25K views12:15

Machine Learning with Python

5 minutes of work - 127,000$ profit!

Opened access to the Jay Welcome Club where the AI bot does all the work itself💻

Usually you pay crazy money to get into this club, but today access is free for everyone!

23,432% on deposit earned by club members in the last 6 months📈

Just follow Jay's trades and earn! 👇

https://t.iss.one/+mONXtEgVxtU5NmZl

❤3

31.1K views14:39

Machine Learning with Python

Join our WhatsApp channel

There are dedicated resources only for WhatsApp users

https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

❤4

31.2K views08:58

Machine Learning with Python

🚀 Comprehensive Guide: How to Prepare for a Graph Neural Networks (GNN) Job Interview – 350 Most Common Interview Questions

Read: https://hackmd.io/@husseinsheikho/GNN-interview

#GNN #GraphNeuralNetworks #MachineLearning #DeepLearning #AI #DataScience #PyTorchGeometric #DGL #NodeClassification #LinkPrediction #GraphML

❤8

6.83K views15:21

Machine Learning with Python

This repo is awesome. It features RAG, AI Agents, Multi-agent Teams, MCP, Voice Agents, and more.

✅ link: https://github.com/Shubhamsaboo/awesome-llm-apps

Please open Telegram to view this post

VIEW IN TELEGRAM

1❤6🔥5👍2

7.17K views20:02

Machine Learning with Python

500 Essential Web Scraping Interview Questions

Start: https://hackmd.io/@husseinsheikho/WS-Interview

Please open Telegram to view this post

VIEW IN TELEGRAM

❤5👍4

6.9K views11:25

Machine Learning with Python

500 Essential Web Scraping Interview Questions Start: https://hackmd.io/@husseinsheikho/WS-Interview ✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk 📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

500 Essential Web Scraping Interview Questions with Answers - part 1 (1 to 238)

Link: https://hackmd.io/@husseinsheikho/WS-Ans238

500 Essential Web Scraping Interview Questions with Answers - part 2 (239 to 386)

Link: https://hackmd.io/@husseinsheikho/WS-Ans386

500 Essential Web Scraping Interview Questions with Answers - part 3 (387 to 500)

Link: https://hackmd.io/@husseinsheikho/WS-Ans500

https://t.iss.one/DataScienceQ

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

❤5👍2

6.72K views16:43

Machine Learning with Python

0:20

This media is not supported in your browser

VIEW IN TELEGRAM

This repository contains a collection of everything needed to work with libraries related to AI and LLM.

More than 120 libraries, sorted by stages of LLM development:

→ Training, fine-tuning, and evaluation of LLM models
→ Integration and deployment of applications with LLM and RAG
→ Fast and scalable model launching
→ Working with data: extraction, structuring, and synthetic generation
→ Creating autonomous agents based on LLM
→ Prompt optimization and ensuring safe use in production

🌟 link: https://github.com/Shubhamsaboo/awesome-llm-apps

👉

@codeprogrammer

Please open Telegram to view this post

VIEW IN TELEGRAM

❤9💯3

8.37K views19:35

About

Blog

Apps

Platform