Data Science | Machine Learning with Python for Researchers
32.5K subscribers
3.12K photos
107 videos
22 files
3.33K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
πŸ€–πŸ§  LandingAI ADE Python SDK: Streamlining AI-Powered Document Understanding

πŸ—“οΈ 22 Oct 2025
πŸ“š AI News & Trends

In the age of AI automation, extracting structured data from documents has become a key part of many business workflows. From invoices and contracts to identity documents and research papers, organizations are relying on AI models to interpret and process information accurately. LandingAI’s ADE Python SDK – an official API client for the LandingAI ADE ...

#AIPowered #DocumentUnderstanding #LandingAI #ADEPythonSDK #AIAutomation #DataExtraction
✨olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

πŸ“ Summary:
olmOCR is an open-source toolkit that uses a fine-tuned vision language model to convert PDFs into clean, structured text. It enables large-scale, cost-effective extraction of trillions of tokens for training language models.

πŸ”Ή Publication Date: Published on Feb 25

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2502.18443
β€’ PDF: https://arxiv.org/pdf/2502.18443
β€’ Github: https://github.com/allenai/olmocr

✨ Datasets citing this paper:
β€’ https://huggingface.co/datasets/davanstrien/test-olmocr2
β€’ https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
β€’ https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297

==================================

For more data science resources:
βœ“ https://t.iss.one/DataScienceT

#OCR #VLMs #LLM #DataExtraction #OpenSource