✨Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
📝 Summary:
Qianfan-OCR is a 4B vision-language model that unifies document parsing, layout analysis, and understanding. It features Layout-as-Thought to improve accuracy on complex layouts and achieves state-of-the-art performance across multiple OCR and document intelligence benchmarks.
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13398
• PDF: https://arxiv.org/pdf/2603.13398
• Project Page: https://github.com/baidubce/Qianfan-VL
• Github: https://github.com/baidubce/Qianfan-VL
🔹 Models citing this paper:
• https://huggingface.co/baidu/Qianfan-OCR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OCR #DocumentIntelligence #VisionLanguageModel #AI #MachineLearning
📝 Summary:
Qianfan-OCR is a 4B vision-language model that unifies document parsing, layout analysis, and understanding. It features Layout-as-Thought to improve accuracy on complex layouts and achieves state-of-the-art performance across multiple OCR and document intelligence benchmarks.
🔹 Publication Date: Published on Mar 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13398
• PDF: https://arxiv.org/pdf/2603.13398
• Project Page: https://github.com/baidubce/Qianfan-VL
• Github: https://github.com/baidubce/Qianfan-VL
🔹 Models citing this paper:
• https://huggingface.co/baidu/Qianfan-OCR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OCR #DocumentIntelligence #VisionLanguageModel #AI #MachineLearning