ML Research Hub

🤖🧠 PaddleOCR-VL: Redefining Multilingual Document Parsing with a 0.9B Vision-Language Model

🗓️ 20 Oct 2025
📚 AI News & Trends

In an era where information is predominantly digital, the ability to extract, interpret and organize data from documents is crucial. From invoices and research papers to multilingual contracts and handwritten notes, document parsing stands at the intersection of vision and language. Traditional Optical Character Recognition (OCR) systems have made impressive strides but they often fall ...

#PaddleOCR-VL #Multilingual #DocumentParsing #VisionLanguageModel #OCR #AI

328 views19:10

📖 Read More

📣 BEST TELEGRAM CHANNELS

ML Research Hub

✨PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

📝 Summary:
PaddleOCR-VL is a new 0.9B vision-language model for document parsing. It uses a NaViT-style visual encoder and ERNIE-4.5, achieving state-of-the-art performance across 109 languages with minimal resources and fast inference. This model is highly suitable for practical deployment.

🔹 Publication Date: Published on Oct 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.14528
• PDF: https://arxiv.org/pdf/2510.14528
• Github: https://github.com/PaddlePaddle/PaddleOCR

🔹 Models citing this paper:
• https://huggingface.co/PaddlePaddle/PaddleOCR-VL
• https://huggingface.co/PaddlePaddle/PP-DocLayoutV2
• https://huggingface.co/lvyufeng/PaddleOCR-VL-0.9B

✨ Spaces citing this paper:
• https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VL_Online_Demo
• https://huggingface.co/spaces/markobinario/PaddleOCR-VL_Online_Demo
• https://huggingface.co/spaces/waytoAGI/PaddleOCR-VL_Online_Demo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#OCR #VisionLanguageModel #DocumentAI #DeepLearning #AI

arXiv.org

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B...

In this report, we propose PaddleOCR-VL, a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model...

99 views05:55

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

📝 Summary:
MinerU2.5 is a new 1.2B-parameter VLM for document parsing. It uses a coarse-to-fine, two-stage strategy: global layout analysis on downsampled images, then targeted content recognition on native-resolution crops. This achieves state-of-the-art accuracy efficiently for high-resolution documents.

🔹 Publication Date: Published on Sep 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22186
• PDF: https://arxiv.org/pdf/2509.22186
• Project Page: https://opendatalab.github.io/MinerU/
• Github: https://github.com/opendatalab/MinerU

🔹 Models citing this paper:
• https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
• https://huggingface.co/freakynit/MinerU2.5-2509-1.2B
• https://huggingface.co/Mungert/MinerU2.5-2509-1.2B-GGUF

✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/MinerU
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModel #DocumentAI #DeepLearning #ComputerVision #AIResearch

arXiv.org

MinerU2.5: A Decoupled Vision-Language Model for Efficient...

We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our...

55 views05:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

2:02

This media is not supported in your browser

VIEW IN TELEGRAM

✨Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

📝 Summary:
Lumine introduces an open recipe for generalist agents in 3D open worlds. This vision-language model-based agent processes pixels to perform complex, hours-long missions with human efficiency and demonstrates strong zero-shot generalization across diverse games like Genshin Impact and Honkai Star...

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08892
• PDF: https://arxiv.org/pdf/2511.08892
• Project Page: https://www.lumine-ai.org/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GeneralistAI #VisionLanguageModel #3DWorlds #AIagents #GamingAI

230 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset

📝 Summary:
Researchers introduce Instruction-Guided Lesion Segmentation ILS for CXRs, allowing diverse lesion segmentation using simple instructions. They developed MIMIC-ILS, a large-scale dataset, and ROSALIA, a vision-language model. ROSALIA accurately segments various lesions and provides textual explan...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15186
• PDF: https://arxiv.org/pdf/2511.15186

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MedicalAI #LesionSegmentation #ChestXray #VisionLanguageModel #DeepLearning

241 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding

📝 Summary:
Hulu-Med is a transparent medical vision-language model unifying diverse data modalities like text, 2D/3D images, and video. It achieves state-of-the-art performance across 30 clinical benchmarks with efficient training, promoting accessible AI.

🔹 Publication Date: Published on Oct 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.08668
• PDF: https://arxiv.org/pdf/2510.08668
• Github: https://github.com/ZJUI-AI4H/Hulu-Med

🔹 Models citing this paper:
• https://huggingface.co/ZJU-AI4H/Hulu-Med-32B
• https://huggingface.co/ZJU-AI4H/Hulu-Med-7B
• https://huggingface.co/ZJU-AI4H/Hulu-Med-14B

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MedicalAI #VisionLanguageModel #MultimodalAI #HealthcareAI #AIResearch

462 views00:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HunyuanOCR Technical Report

📝 Summary:
HunyuanOCR is a lightweight Vision-Language Model for OCR, using a unified end-to-end architecture ViT + LLM. It achieves state-of-the-art performance in diverse tasks, outperforming larger models and commercial APIs, powered by data-driven and RL strategies.

🔹 Publication Date: Published on Nov 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19575
• PDF: https://arxiv.org/pdf/2511.19575
• Github: https://github.com/Tencent-Hunyuan/HunyuanOCR

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#OCR #VisionLanguageModel #LLM #AI #MachineLearning

204 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Qwen3-VL Technical Report

📝 Summary:
Qwen3-VL is a highly capable vision-language model, achieving superior performance across multimodal benchmarks. It supports 256K interleaved contexts and offers strong text understanding, robust long-context comprehension, and advanced multimodal reasoning through key architectural upgrades.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21631
• PDF: https://arxiv.org/pdf/2511.21631

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModel #MultimodalAI #AI #DeepLearning #LLM

300 views08:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics

❤2

399 views17:07

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform