ML Research Hub
32.6K subscribers
3.38K photos
132 videos
23 files
3.61K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho
Download Telegram
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

📝 Summary:
olmOCR is an open-source toolkit that uses a fine-tuned vision language model to convert PDFs into clean, structured text. It enables large-scale, cost-effective extraction of trillions of tokens for training language models.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.18443
• PDF: https://arxiv.org/pdf/2502.18443
• Github: https://github.com/allenai/olmocr

Datasets citing this paper:
https://huggingface.co/datasets/davanstrien/test-olmocr2
https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#OCR #VLMs #LLM #DataExtraction #OpenSource
MedRAX: Medical Reasoning Agent for Chest X-ray

📝 Summary:
MedRAX is a new AI agent that integrates CXR analysis tools and multimodal large language models. It answers complex medical queries without extra training, achieving state-of-the-art performance.

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.02673
• PDF: https://arxiv.org/pdf/2502.02673
• Github: https://github.com/bowang-lab/medrax

Spaces citing this paper:
https://huggingface.co/spaces/asbamit/MedRAX-main

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #MedicalAI #LLM #Radiology #DeepLearning
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

📝 Summary:
Mem0 is a memory-centric architecture with graph-based memory that enhances long-term conversational coherence in LLMs by efficiently extracting and consolidating information. It outperforms existing memory systems in accuracy, achieving 26% improvement over OpenAI, and significantly reduces comp...

🔹 Publication Date: Published on Apr 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.19413
• PDF: https://arxiv.org/pdf/2504.19413
• Github: https://github.com/mem0ai/mem0

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #LLM #AIAgents #LongTermMemory #GraphMemory
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

📝 Summary:
IndexTTS enhances XTTS and Tortoise for TTS, improving naturalness and zero-shot voice cloning. It features hybrid character-pinyin modeling for Chinese and optimized vector quantization, resulting in more controllable usage, faster inference, and superior performance compared to other systems.

🔹 Publication Date: Published on Feb 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.05512
• PDF: https://arxiv.org/pdf/2502.05512
• Github: https://github.com/index-tts/index-tts

🔹 Models citing this paper:
https://huggingface.co/IndexTeam/IndexTTS-2
https://huggingface.co/IndexTeam/Index-TTS
https://huggingface.co/Toxzic/indextts-colab

Spaces citing this paper:
https://huggingface.co/spaces/IndexTeam/IndexTTS
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
https://huggingface.co/spaces/jairwaal/image

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextToSpeech #ZeroShotLearning #VoiceCloning #AI #MachineLearning
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

📝 Summary:
MinerU2.5 is a new 1.2B-parameter VLM for document parsing. It uses a coarse-to-fine, two-stage strategy: global layout analysis on downsampled images, then targeted content recognition on native-resolution crops. This achieves state-of-the-art accuracy efficiently for high-resolution documents.

🔹 Publication Date: Published on Sep 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22186
• PDF: https://arxiv.org/pdf/2509.22186
• Project Page: https://opendatalab.github.io/MinerU/
• Github: https://github.com/opendatalab/MinerU

🔹 Models citing this paper:
https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
https://huggingface.co/freakynit/MinerU2.5-2509-1.2B
https://huggingface.co/Mungert/MinerU2.5-2509-1.2B-GGUF

Spaces citing this paper:
https://huggingface.co/spaces/opendatalab/MinerU
https://huggingface.co/spaces/xiaoye-winters/MinerU-API
https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModel #DocumentAI #DeepLearning #ComputerVision #AIResearch
PyTorch Distributed: Experiences on Accelerating Data Parallel Training

📝 Summary:
This paper details PyTorch's distributed data parallel module, which accelerates large-scale model training. It uses techniques like gradient bucketing and computation-communication overlap to achieve near-linear scalability with 256 GPUs.

🔹 Publication Date: Published on Jun 28, 2020

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2006.15704
• PDF: https://arxiv.org/pdf/2006.15704
• Github: https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/distributed.py

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PyTorch #DistributedTraining #DeepLearning #Scalability #HPC
MinerU: An Open-Source Solution for Precise Document Content Extraction

📝 Summary:
MinerU is an open-source tool that provides high-precision document content extraction. It uses fine-tuned models and pre/postprocessing rules to consistently achieve high performance across diverse document types.

🔹 Publication Date: Published on Sep 27, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2409.18839
• PDF: https://huggingface.co/spaces/Echo9k/PDF_reader
• Github: https://github.com/opendatalab/MinerU

Spaces citing this paper:
https://huggingface.co/spaces/opendatalab/MinerU
https://huggingface.co/spaces/xiaoye-winters/MinerU-API
https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DocumentExtraction #OpenSource #DataScience #NLP #AI
Scaling Agents via Continual Pre-training

📝 Summary:
Current agentic LLMs underperform due to training tensions. This paper proposes Agentic Continual Pre-training CPT to build powerful agentic foundation models. Their AgentFounder model achieves state-of-the-art performance on benchmarks with strong tool-use.

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2502.06589
• PDF: https://arxiv.org/pdf/2509.13310
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMAgents #ContinualPretraining #FoundationModels #AIResearch #ToolUse
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

📝 Summary:
WebWeaver is a dual-agent framework addressing open-ended deep research challenges. It uses dynamic planning interleaving evidence acquisition and outline optimization and hierarchical, targeted writing to overcome long-context issues. This approach produces state-of-the-art, high-quality, reliab...

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webweaver-structuring-web-scale-evidence-with-dynamic-outlines-for-open-ended-deep-research
• PDF: https://arxiv.org/pdf/2509.13312
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #Research #AgentSystems #LLM #KnowledgeManagement
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

📝 Summary:
WebSailor is a post-training method that teaches open-source AI models to systematically reduce uncertainty in complex information-seeking tasks. Using synthetic high-uncertainty tasks and an RL algorithm, it enables open-source agents to match the performance of proprietary systems.

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13305
• PDF: https://arxiv.org/pdf/2509.13305
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #ReinforcementLearning #OpenSourceAI #AIAgents #MachineLearning
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

📝 Summary:
ReSum enhances LLM-based web agents by overcoming context window limitations through periodic context summarization. This novel paradigm converts interaction histories into compact reasoning states, enabling indefinite exploration for complex tasks. ReSum improves performance by 4.5% over ReAct, ...

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13313
• PDF: https://arxiv.org/pdf/2509.13313
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AI #ContextSummarization #WebAgents #Research
This media is not supported in your browser
VIEW IN TELEGRAM
WebDancer: Towards Autonomous Information Seeking Agency

📝 Summary:
WebDancer proposes a four-stage framework for building autonomous information seeking agents. This approach combines data construction, trajectory sampling, supervised fine-tuning, and reinforcement learning, demonstrating strong performance on challenging benchmarks.

🔹 Publication Date: Published on May 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.22648
• PDF: https://arxiv.org/pdf/2505.22648
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
https://huggingface.co/Alibaba-NLP/WebDancer-32B

Spaces citing this paper:
https://huggingface.co/spaces/frucht/Alibaba-NLP-WebDancer-32B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #AutonomousAgents #ReinforcementLearning #MachineLearning #WebAgents
WebSailor: Navigating Super-human Reasoning for Web Agent

📝 Summary:
WebSailor is a post-training method that enhances open-source LLMs with sophisticated reasoning to tackle complex web information-seeking tasks. It teaches models to systematically reduce extreme uncertainty, achieving performance comparable to proprietary AI agents.

🔹 Publication Date: Published on Jul 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.02592
• PDF: https://arxiv.org/pdf/2507.02592
• Project Page: https://github.com/Alibaba-NLP/WebAgent
• Github: https://github.com/Alibaba-NLP/WebAgent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #WebAgents #AI #MachineLearning #Reasoning
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

📝 Summary:
WebWatcher, a multimodal agent, enhances visual-language reasoning for complex information retrieval. It uses synthetic trajectories, tools, and RL for training, outperforming existing agents. This advances solving multimodal info-seeking tasks.

🔹 Publication Date: Published on Aug 7

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webwatcher-breaking-new-frontier-of-vision-language-deep-research-agent
• PDF: https://arxiv.org/pdf/2508.05748
• Project Page: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
https://huggingface.co/Alibaba-NLP/WebWatcher-32B
https://huggingface.co/Alibaba-NLP/WebWatcher-7B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguage #MultimodalAI #DeepLearning #AIagents #InformationRetrieval
1
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

📝 Summary:
WebShaper synthesizes information-seeking datasets to address data scarcity for LLM agents. It uses a formalization-driven framework based on set theory and Knowledge Projections, enabling precise control over reasoning structure. This leads to state-of-the-art performance on open-sourced benchma...

🔹 Publication Date: Published on Jul 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.15061
• PDF: https://arxiv.org/pdf/2507.15061
• Project Page: https://huggingface.co/papers?q=Knowledge%20Projections%20(KP)
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
https://huggingface.co/Alibaba-NLP/WebShaper-32B

Datasets citing this paper:
https://huggingface.co/datasets/Alibaba-NLP/WebShaper

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AIAgents #DataGeneration #FormalMethods #NLP
DeepAgent: A General Reasoning Agent with Scalable Toolsets

📝 Summary:
DeepAgent is an end-to-end deep reasoning agent that autonomously thinks, discovers tools, and executes actions. It uses memory folding for long interactions and ToolPO reinforcement learning for general tool use. DeepAgent consistently outperforms baselines on eight diverse benchmarks.

🔹 Publication Date: Published on Oct 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21618
• PDF: https://arxiv.org/pdf/2510.21618
• Github: https://github.com/RUC-NLPIR/DeepAgent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #ReasoningAgents #ReinforcementLearning #ToolLearning #DeepLearning
Zep: A Temporal Knowledge Graph Architecture for Agent Memory

📝 Summary:
Zep is a new AI agent memory service using a temporal knowledge graph for dynamic knowledge integration. It outperforms MemGPT in benchmarks and significantly improves temporal reasoning and cross-session synthesis for enterprise applications, reducing latency.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• Github: https://github.com/getzep/graphiti

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIAgents #KnowledgeGraphs #TemporalReasoning #AIArchitecture #ArtificialIntelligence