Convert PDF to structured JSON — in a couple of lines and without hassle! 📄✨
Today, we'll create a mini-service that takes a PDF document, extracts the text from it, and asks GPT to neatly organize the content into sections: title, author, date, and a list of sections. 🚀
First, let's connect the necessary libraries and API key:
Now, let's extract the text from the PDF. We'll loop through all the pages and combine them into a single string:
Next, we'll send the obtained text to GPT. We'll ask the model to return a structured JSON with the necessary fields:
Output the result:
🔥 Suitable for contracts, reports, methodologies, and any PDFs — we immediately get a JSON ready for use.
#PDF #JSON #Python #GPT #Automation #DataScience
✨ Join Best TG Channels https://t.iss.one/addlist/0f6vfFbEMdAwODBk
⭐️ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
🚀 Level up your AI & Data Science skills with HelloEncyclo — a growing all-in-one platform featuring hands-on courses in LLMs, Deep Learning, MLOps, Data Engineering, and more.
✅ 13 courses live + 40+ coming soon
🎯 One access, lifetime updates
🔑 Use code: PRESALE-BOOK-WAVE-2GFG
👉 https://helloencyclo.com/?ref=HUSSEINSHEIKHO
Today, we'll create a mini-service that takes a PDF document, extracts the text from it, and asks GPT to neatly organize the content into sections: title, author, date, and a list of sections. 🚀
First, let's connect the necessary libraries and API key:
import os
from PyPDF2 import PdfReader
from openai import OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
Now, let's extract the text from the PDF. We'll loop through all the pages and combine them into a single string:
reader = PdfReader("document.pdf")
text = "
".join(page.extract_text() for page in reader.pages)Next, we'll send the obtained text to GPT. We'll ask the model to return a structured JSON with the necessary fields:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": (
"You are a PDF parser. Return a JSON with the fields: title, author, date, sections. "
"Each section is an object with name and summary."
)},
{"role": "user", "content": text}
]
)
Output the result:
structured = response.choices[0].message.content.strip()
print(structured)
🔥 Suitable for contracts, reports, methodologies, and any PDFs — we immediately get a JSON ready for use.
#PDF #JSON #Python #GPT #Automation #DataScience
✨ Join Best TG Channels https://t.iss.one/addlist/0f6vfFbEMdAwODBk
⭐️ Join Our WhatsApp Channel https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
🚀 Level up your AI & Data Science skills with HelloEncyclo — a growing all-in-one platform featuring hands-on courses in LLMs, Deep Learning, MLOps, Data Engineering, and more.
✅ 13 courses live + 40+ coming soon
🎯 One access, lifetime updates
🔑 Use code: PRESALE-BOOK-WAVE-2GFG
👉 https://helloencyclo.com/?ref=HUSSEINSHEIKHO
Telegram
AI PYTHON 🌟
You’ve been invited to add the folder “AI PYTHON 🌟”, which includes 14 chats.
❤1