ML Research Hub

✨olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

📝 Summary:
olmOCR is an open-source toolkit that uses a fine-tuned vision language model to convert PDFs into clean, structured text. It enables large-scale, cost-effective extraction of trillions of tokens for training language models.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.18443
• PDF: https://arxiv.org/pdf/2502.18443
• Github: https://github.com/allenai/olmocr

✨ Datasets citing this paper:
• https://huggingface.co/datasets/davanstrien/test-olmocr2
• https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
• https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#OCR #VLMs #LLM #DataExtraction #OpenSource

65 views05:56

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MedRAX: Medical Reasoning Agent for Chest X-ray

📝 Summary:
MedRAX is a new AI agent that integrates CXR analysis tools and multimodal large language models. It answers complex medical queries without extra training, achieving state-of-the-art performance.

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.02673
• PDF: https://arxiv.org/pdf/2502.02673
• Github: https://github.com/bowang-lab/medrax

✨ Spaces citing this paper:
• https://huggingface.co/spaces/asbamit/MedRAX-main

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #MedicalAI #LLM #Radiology #DeepLearning

71 views05:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

📝 Summary:
Mem0 is a memory-centric architecture with graph-based memory that enhances long-term conversational coherence in LLMs by efficiently extracting and consolidating information. It outperforms existing memory systems in accuracy, achieving 26% improvement over OpenAI, and significantly reduces comp...

🔹 Publication Date: Published on Apr 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.19413
• PDF: https://arxiv.org/pdf/2504.19413
• Github: https://github.com/mem0ai/mem0

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #LLM #AIAgents #LongTermMemory #GraphMemory

68 views05:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

58 views05:57

ML Research Hub

✨IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

📝 Summary:
IndexTTS enhances XTTS and Tortoise for TTS, improving naturalness and zero-shot voice cloning. It features hybrid character-pinyin modeling for Chinese and optimized vector quantization, resulting in more controllable usage, faster inference, and superior performance compared to other systems.

🔹 Publication Date: Published on Feb 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.05512
• PDF: https://arxiv.org/pdf/2502.05512
• Github: https://github.com/index-tts/index-tts

🔹 Models citing this paper:
• https://huggingface.co/IndexTeam/IndexTTS-2
• https://huggingface.co/IndexTeam/Index-TTS
• https://huggingface.co/Toxzic/indextts-colab

✨ Spaces citing this paper:
• https://huggingface.co/spaces/IndexTeam/IndexTTS
• https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
• https://huggingface.co/spaces/jairwaal/image

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#TextToSpeech #ZeroShotLearning #VoiceCloning #AI #MachineLearning

arXiv.org

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot...

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning...

57 views05:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

50 views05:57

ML Research Hub

✨PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

📝 Summary:
PyTorch FSDP is an industry-grade solution for efficient and scalable large model training. It enables significantly larger models with near-linear TFLOPS scalability, making advanced capabilities more accessible.

🔹 Publication Date: Published on Apr 21, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2304.11277
• PDF: https://arxiv.org/pdf/2304.11277
• Github: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/fully_sharded_data_parallel.py

🔹 Models citing this paper:
• https://huggingface.co/databricks/dbrx-instruct
• https://huggingface.co/databricks/dbrx-base
• https://huggingface.co/Undi95/dbrx-base

✨ Spaces citing this paper:
• https://huggingface.co/spaces/nanotron/ultrascale-playbook
• https://huggingface.co/spaces/Ki-Seki/ultrascale-playbook-zh-cn
• https://huggingface.co/spaces/Gantrol/ultrascale-playbook-zh-cn

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PyTorch #FSDP #DeepLearning #DistributedTraining #LargeModels

arXiv.org

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine...

53 views05:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

51 views05:58

ML Research Hub

✨MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

📝 Summary:
MinerU2.5 is a new 1.2B-parameter VLM for document parsing. It uses a coarse-to-fine, two-stage strategy: global layout analysis on downsampled images, then targeted content recognition on native-resolution crops. This achieves state-of-the-art accuracy efficiently for high-resolution documents.

🔹 Publication Date: Published on Sep 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22186
• PDF: https://arxiv.org/pdf/2509.22186
• Project Page: https://opendatalab.github.io/MinerU/
• Github: https://github.com/opendatalab/MinerU

🔹 Models citing this paper:
• https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
• https://huggingface.co/freakynit/MinerU2.5-2509-1.2B
• https://huggingface.co/Mungert/MinerU2.5-2509-1.2B-GGUF

✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/MinerU
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModel #DocumentAI #DeepLearning #ComputerVision #AIResearch

arXiv.org

MinerU2.5: A Decoupled Vision-Language Model for Efficient...

We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our...

45 views05:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PyTorch Distributed: Experiences on Accelerating Data Parallel Training

📝 Summary:
This paper details PyTorch's distributed data parallel module, which accelerates large-scale model training. It uses techniques like gradient bucketing and computation-communication overlap to achieve near-linear scalability with 256 GPUs.

🔹 Publication Date: Published on Jun 28, 2020

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2006.15704
• PDF: https://arxiv.org/pdf/2006.15704
• Github: https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/distributed.py

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PyTorch #DistributedTraining #DeepLearning #Scalability #HPC

52 views05:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MinerU: An Open-Source Solution for Precise Document Content Extraction

📝 Summary:
MinerU is an open-source tool that provides high-precision document content extraction. It uses fine-tuned models and pre/postprocessing rules to consistently achieve high performance across diverse document types.

🔹 Publication Date: Published on Sep 27, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2409.18839
• PDF: https://huggingface.co/spaces/Echo9k/PDF_reader
• Github: https://github.com/opendatalab/MinerU

✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/MinerU
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DocumentExtraction #OpenSource #DataScience #NLP #AI

53 views05:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Scaling Agents via Continual Pre-training

📝 Summary:
Current agentic LLMs underperform due to training tensions. This paper proposes Agentic Continual Pre-training CPT to build powerful agentic foundation models. Their AgentFounder model achieves state-of-the-art performance on benchmarks with strong tool-use.

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2502.06589
• PDF: https://arxiv.org/pdf/2509.13310
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLMAgents #ContinualPretraining #FoundationModels #AIResearch #ToolUse

54 views05:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

📝 Summary:
WebWeaver is a dual-agent framework addressing open-ended deep research challenges. It uses dynamic planning interleaving evidence acquisition and outline optimization and hierarchical, targeted writing to overcome long-context issues. This approach produces state-of-the-art, high-quality, reliab...

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webweaver-structuring-web-scale-evidence-with-dynamic-outlines-for-open-ended-deep-research
• PDF: https://arxiv.org/pdf/2509.13312
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #Research #AgentSystems #LLM #KnowledgeManagement

77 views05:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

📝 Summary:
WebSailor is a post-training method that teaches open-source AI models to systematically reduce uncertainty in complex information-seeking tasks. Using synthetic high-uncertainty tasks and an RL algorithm, it enables open-source agents to match the performance of proprietary systems.

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13305
• PDF: https://arxiv.org/pdf/2509.13305
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #ReinforcementLearning #OpenSourceAI #AIAgents #MachineLearning

52 views05:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

📝 Summary:
ReSum enhances LLM-based web agents by overcoming context window limitations through periodic context summarization. This novel paradigm converts interaction histories into compact reasoning states, enabling indefinite exploration for complex tasks. ReSum improves performance by 4.5% over ReAct, ...

🔹 Publication Date: Published on Sep 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13313
• PDF: https://arxiv.org/pdf/2509.13313
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AI #ContextSummarization #WebAgents #Research

73 views05:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:12

This media is not supported in your browser

VIEW IN TELEGRAM

✨WebDancer: Towards Autonomous Information Seeking Agency

📝 Summary:
WebDancer proposes a four-stage framework for building autonomous information seeking agents. This approach combines data construction, trajectory sampling, supervised fine-tuning, and reinforcement learning, demonstrating strong performance on challenging benchmarks.

🔹 Publication Date: Published on May 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.22648
• PDF: https://arxiv.org/pdf/2505.22648
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebDancer-32B

✨ Spaces citing this paper:
• https://huggingface.co/spaces/frucht/Alibaba-NLP-WebDancer-32B

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #AutonomousAgents #ReinforcementLearning #MachineLearning #WebAgents

45 views05:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WebSailor: Navigating Super-human Reasoning for Web Agent

📝 Summary:
WebSailor is a post-training method that enhances open-source LLMs with sophisticated reasoning to tackle complex web information-seeking tasks. It teaches models to systematically reduce extreme uncertainty, achieving performance comparable to proprietary AI agents.

🔹 Publication Date: Published on Jul 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.02592
• PDF: https://arxiv.org/pdf/2507.02592
• Project Page: https://github.com/Alibaba-NLP/WebAgent
• Github: https://github.com/Alibaba-NLP/WebAgent

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLMs #WebAgents #AI #MachineLearning #Reasoning

72 views06:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

📝 Summary:
WebWatcher, a multimodal agent, enhances visual-language reasoning for complex information retrieval. It uses synthetic trajectories, tools, and RL for training, outperforming existing agents. This advances solving multimodal info-seeking tasks.

🔹 Publication Date: Published on Aug 7

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webwatcher-breaking-new-frontier-of-vision-language-deep-research-agent
• PDF: https://arxiv.org/pdf/2508.05748
• Project Page: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebWatcher-32B
• https://huggingface.co/Alibaba-NLP/WebWatcher-7B

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguage #MultimodalAI #DeepLearning #AIagents #InformationRetrieval

❤1

47 views06:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

📝 Summary:
WebShaper synthesizes information-seeking datasets to address data scarcity for LLM agents. It uses a formalization-driven framework based on set theory and Knowledge Projections, enabling precise control over reasoning structure. This leads to state-of-the-art performance on open-sourced benchma...

🔹 Publication Date: Published on Jul 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.15061
• PDF: https://arxiv.org/pdf/2507.15061
• Project Page: https://huggingface.co/papers?q=Knowledge%20Projections%20(KP)
• Github: https://github.com/Alibaba-NLP/WebAgent

🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebShaper-32B

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Alibaba-NLP/WebShaper

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AIAgents #DataGeneration #FormalMethods #NLP

46 views06:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DeepAgent: A General Reasoning Agent with Scalable Toolsets

📝 Summary:
DeepAgent is an end-to-end deep reasoning agent that autonomously thinks, discovers tools, and executes actions. It uses memory folding for long interactions and ToolPO reinforcement learning for general tool use. DeepAgent consistently outperforms baselines on eight diverse benchmarks.

🔹 Publication Date: Published on Oct 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21618
• PDF: https://arxiv.org/pdf/2510.21618
• Github: https://github.com/RUC-NLPIR/DeepAgent

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #ReasoningAgents #ReinforcementLearning #ToolLearning #DeepLearning

73 views06:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Zep: A Temporal Knowledge Graph Architecture for Agent Memory

📝 Summary:
Zep is a new AI agent memory service using a temporal knowledge graph for dynamic knowledge integration. It outperforms MemGPT in benchmarks and significantly improves temporal reasoning and cross-session synthesis for enterprise applications, reducing latency.

🔹 Publication Date: Published on Jan 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• Github: https://github.com/getzep/graphiti

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AIAgents #KnowledgeGraphs #TemporalReasoning #AIArchitecture #ArtificialIntelligence

56 views06:01

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform