✨olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
📝 Summary:
olmOCR is an open-source toolkit that uses a fine-tuned vision language model to convert PDFs into clean, structured text. It enables large-scale, cost-effective extraction of trillions of tokens for training language models.
🔹 Publication Date: Published on Feb 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.18443
• PDF: https://arxiv.org/pdf/2502.18443
• Github: https://github.com/allenai/olmocr
✨ Datasets citing this paper:
• https://huggingface.co/datasets/davanstrien/test-olmocr2
• https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
• https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OCR #VLMs #LLM #DataExtraction #OpenSource
📝 Summary:
olmOCR is an open-source toolkit that uses a fine-tuned vision language model to convert PDFs into clean, structured text. It enables large-scale, cost-effective extraction of trillions of tokens for training language models.
🔹 Publication Date: Published on Feb 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.18443
• PDF: https://arxiv.org/pdf/2502.18443
• Github: https://github.com/allenai/olmocr
✨ Datasets citing this paper:
• https://huggingface.co/datasets/davanstrien/test-olmocr2
• https://huggingface.co/datasets/davanstrien/newspapers-olmocr2
• https://huggingface.co/datasets/stckmn/ocr-output-Directive017-1761355297
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OCR #VLMs #LLM #DataExtraction #OpenSource
✨MedRAX: Medical Reasoning Agent for Chest X-ray
📝 Summary:
MedRAX is a new AI agent that integrates CXR analysis tools and multimodal large language models. It answers complex medical queries without extra training, achieving state-of-the-art performance.
🔹 Publication Date: Published on Feb 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.02673
• PDF: https://arxiv.org/pdf/2502.02673
• Github: https://github.com/bowang-lab/medrax
✨ Spaces citing this paper:
• https://huggingface.co/spaces/asbamit/MedRAX-main
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #MedicalAI #LLM #Radiology #DeepLearning
📝 Summary:
MedRAX is a new AI agent that integrates CXR analysis tools and multimodal large language models. It answers complex medical queries without extra training, achieving state-of-the-art performance.
🔹 Publication Date: Published on Feb 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.02673
• PDF: https://arxiv.org/pdf/2502.02673
• Github: https://github.com/bowang-lab/medrax
✨ Spaces citing this paper:
• https://huggingface.co/spaces/asbamit/MedRAX-main
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #MedicalAI #LLM #Radiology #DeepLearning
✨Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
📝 Summary:
Mem0 is a memory-centric architecture with graph-based memory that enhances long-term conversational coherence in LLMs by efficiently extracting and consolidating information. It outperforms existing memory systems in accuracy, achieving 26% improvement over OpenAI, and significantly reduces comp...
🔹 Publication Date: Published on Apr 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.19413
• PDF: https://arxiv.org/pdf/2504.19413
• Github: https://github.com/mem0ai/mem0
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #LLM #AIAgents #LongTermMemory #GraphMemory
📝 Summary:
Mem0 is a memory-centric architecture with graph-based memory that enhances long-term conversational coherence in LLMs by efficiently extracting and consolidating information. It outperforms existing memory systems in accuracy, achieving 26% improvement over OpenAI, and significantly reduces comp...
🔹 Publication Date: Published on Apr 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.19413
• PDF: https://arxiv.org/pdf/2504.19413
• Github: https://github.com/mem0ai/mem0
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #LLM #AIAgents #LongTermMemory #GraphMemory
✨IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
📝 Summary:
IndexTTS enhances XTTS and Tortoise for TTS, improving naturalness and zero-shot voice cloning. It features hybrid character-pinyin modeling for Chinese and optimized vector quantization, resulting in more controllable usage, faster inference, and superior performance compared to other systems.
🔹 Publication Date: Published on Feb 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.05512
• PDF: https://arxiv.org/pdf/2502.05512
• Github: https://github.com/index-tts/index-tts
🔹 Models citing this paper:
• https://huggingface.co/IndexTeam/IndexTTS-2
• https://huggingface.co/IndexTeam/Index-TTS
• https://huggingface.co/Toxzic/indextts-colab
✨ Spaces citing this paper:
• https://huggingface.co/spaces/IndexTeam/IndexTTS
• https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
• https://huggingface.co/spaces/jairwaal/image
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TextToSpeech #ZeroShotLearning #VoiceCloning #AI #MachineLearning
📝 Summary:
IndexTTS enhances XTTS and Tortoise for TTS, improving naturalness and zero-shot voice cloning. It features hybrid character-pinyin modeling for Chinese and optimized vector quantization, resulting in more controllable usage, faster inference, and superior performance compared to other systems.
🔹 Publication Date: Published on Feb 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.05512
• PDF: https://arxiv.org/pdf/2502.05512
• Github: https://github.com/index-tts/index-tts
🔹 Models citing this paper:
• https://huggingface.co/IndexTeam/IndexTTS-2
• https://huggingface.co/IndexTeam/Index-TTS
• https://huggingface.co/Toxzic/indextts-colab
✨ Spaces citing this paper:
• https://huggingface.co/spaces/IndexTeam/IndexTTS
• https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
• https://huggingface.co/spaces/jairwaal/image
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TextToSpeech #ZeroShotLearning #VoiceCloning #AI #MachineLearning
arXiv.org
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot...
Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning...
✨PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
📝 Summary:
PyTorch FSDP is an industry-grade solution for efficient and scalable large model training. It enables significantly larger models with near-linear TFLOPS scalability, making advanced capabilities more accessible.
🔹 Publication Date: Published on Apr 21, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2304.11277
• PDF: https://arxiv.org/pdf/2304.11277
• Github: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/fully_sharded_data_parallel.py
🔹 Models citing this paper:
• https://huggingface.co/databricks/dbrx-instruct
• https://huggingface.co/databricks/dbrx-base
• https://huggingface.co/Undi95/dbrx-base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/nanotron/ultrascale-playbook
• https://huggingface.co/spaces/Ki-Seki/ultrascale-playbook-zh-cn
• https://huggingface.co/spaces/Gantrol/ultrascale-playbook-zh-cn
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PyTorch #FSDP #DeepLearning #DistributedTraining #LargeModels
📝 Summary:
PyTorch FSDP is an industry-grade solution for efficient and scalable large model training. It enables significantly larger models with near-linear TFLOPS scalability, making advanced capabilities more accessible.
🔹 Publication Date: Published on Apr 21, 2023
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2304.11277
• PDF: https://arxiv.org/pdf/2304.11277
• Github: https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/fully_sharded_data_parallel.py
🔹 Models citing this paper:
• https://huggingface.co/databricks/dbrx-instruct
• https://huggingface.co/databricks/dbrx-base
• https://huggingface.co/Undi95/dbrx-base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/nanotron/ultrascale-playbook
• https://huggingface.co/spaces/Ki-Seki/ultrascale-playbook-zh-cn
• https://huggingface.co/spaces/Gantrol/ultrascale-playbook-zh-cn
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PyTorch #FSDP #DeepLearning #DistributedTraining #LargeModels
arXiv.org
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine...
✨MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
📝 Summary:
MinerU2.5 is a new 1.2B-parameter VLM for document parsing. It uses a coarse-to-fine, two-stage strategy: global layout analysis on downsampled images, then targeted content recognition on native-resolution crops. This achieves state-of-the-art accuracy efficiently for high-resolution documents.
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22186
• PDF: https://arxiv.org/pdf/2509.22186
• Project Page: https://opendatalab.github.io/MinerU/
• Github: https://github.com/opendatalab/MinerU
🔹 Models citing this paper:
• https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
• https://huggingface.co/freakynit/MinerU2.5-2509-1.2B
• https://huggingface.co/Mungert/MinerU2.5-2509-1.2B-GGUF
✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/MinerU
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModel #DocumentAI #DeepLearning #ComputerVision #AIResearch
📝 Summary:
MinerU2.5 is a new 1.2B-parameter VLM for document parsing. It uses a coarse-to-fine, two-stage strategy: global layout analysis on downsampled images, then targeted content recognition on native-resolution crops. This achieves state-of-the-art accuracy efficiently for high-resolution documents.
🔹 Publication Date: Published on Sep 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.22186
• PDF: https://arxiv.org/pdf/2509.22186
• Project Page: https://opendatalab.github.io/MinerU/
• Github: https://github.com/opendatalab/MinerU
🔹 Models citing this paper:
• https://huggingface.co/opendatalab/MinerU2.5-2509-1.2B
• https://huggingface.co/freakynit/MinerU2.5-2509-1.2B
• https://huggingface.co/Mungert/MinerU2.5-2509-1.2B-GGUF
✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/MinerU
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModel #DocumentAI #DeepLearning #ComputerVision #AIResearch
arXiv.org
MinerU2.5: A Decoupled Vision-Language Model for Efficient...
We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our...
✨PyTorch Distributed: Experiences on Accelerating Data Parallel Training
📝 Summary:
This paper details PyTorch's distributed data parallel module, which accelerates large-scale model training. It uses techniques like gradient bucketing and computation-communication overlap to achieve near-linear scalability with 256 GPUs.
🔹 Publication Date: Published on Jun 28, 2020
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2006.15704
• PDF: https://arxiv.org/pdf/2006.15704
• Github: https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/distributed.py
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PyTorch #DistributedTraining #DeepLearning #Scalability #HPC
📝 Summary:
This paper details PyTorch's distributed data parallel module, which accelerates large-scale model training. It uses techniques like gradient bucketing and computation-communication overlap to achieve near-linear scalability with 256 GPUs.
🔹 Publication Date: Published on Jun 28, 2020
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2006.15704
• PDF: https://arxiv.org/pdf/2006.15704
• Github: https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/distributed.py
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PyTorch #DistributedTraining #DeepLearning #Scalability #HPC
✨MinerU: An Open-Source Solution for Precise Document Content Extraction
📝 Summary:
MinerU is an open-source tool that provides high-precision document content extraction. It uses fine-tuned models and pre/postprocessing rules to consistently achieve high performance across diverse document types.
🔹 Publication Date: Published on Sep 27, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2409.18839
• PDF: https://huggingface.co/spaces/Echo9k/PDF_reader
• Github: https://github.com/opendatalab/MinerU
✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/MinerU
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DocumentExtraction #OpenSource #DataScience #NLP #AI
📝 Summary:
MinerU is an open-source tool that provides high-precision document content extraction. It uses fine-tuned models and pre/postprocessing rules to consistently achieve high performance across diverse document types.
🔹 Publication Date: Published on Sep 27, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2409.18839
• PDF: https://huggingface.co/spaces/Echo9k/PDF_reader
• Github: https://github.com/opendatalab/MinerU
✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/MinerU
• https://huggingface.co/spaces/xiaoye-winters/MinerU-API
• https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DocumentExtraction #OpenSource #DataScience #NLP #AI
✨Scaling Agents via Continual Pre-training
📝 Summary:
Current agentic LLMs underperform due to training tensions. This paper proposes Agentic Continual Pre-training CPT to build powerful agentic foundation models. Their AgentFounder model achieves state-of-the-art performance on benchmarks with strong tool-use.
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2502.06589
• PDF: https://arxiv.org/pdf/2509.13310
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMAgents #ContinualPretraining #FoundationModels #AIResearch #ToolUse
📝 Summary:
Current agentic LLMs underperform due to training tensions. This paper proposes Agentic Continual Pre-training CPT to build powerful agentic foundation models. Their AgentFounder model achieves state-of-the-art performance on benchmarks with strong tool-use.
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2502.06589
• PDF: https://arxiv.org/pdf/2509.13310
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMAgents #ContinualPretraining #FoundationModels #AIResearch #ToolUse
✨WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
📝 Summary:
WebWeaver is a dual-agent framework addressing open-ended deep research challenges. It uses dynamic planning interleaving evidence acquisition and outline optimization and hierarchical, targeted writing to overcome long-context issues. This approach produces state-of-the-art, high-quality, reliab...
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webweaver-structuring-web-scale-evidence-with-dynamic-outlines-for-open-ended-deep-research
• PDF: https://arxiv.org/pdf/2509.13312
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #Research #AgentSystems #LLM #KnowledgeManagement
📝 Summary:
WebWeaver is a dual-agent framework addressing open-ended deep research challenges. It uses dynamic planning interleaving evidence acquisition and outline optimization and hierarchical, targeted writing to overcome long-context issues. This approach produces state-of-the-art, high-quality, reliab...
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webweaver-structuring-web-scale-evidence-with-dynamic-outlines-for-open-ended-deep-research
• PDF: https://arxiv.org/pdf/2509.13312
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #Research #AgentSystems #LLM #KnowledgeManagement
✨WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
📝 Summary:
WebSailor is a post-training method that teaches open-source AI models to systematically reduce uncertainty in complex information-seeking tasks. Using synthetic high-uncertainty tasks and an RL algorithm, it enables open-source agents to match the performance of proprietary systems.
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13305
• PDF: https://arxiv.org/pdf/2509.13305
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ReinforcementLearning #OpenSourceAI #AIAgents #MachineLearning
📝 Summary:
WebSailor is a post-training method that teaches open-source AI models to systematically reduce uncertainty in complex information-seeking tasks. Using synthetic high-uncertainty tasks and an RL algorithm, it enables open-source agents to match the performance of proprietary systems.
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13305
• PDF: https://arxiv.org/pdf/2509.13305
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ReinforcementLearning #OpenSourceAI #AIAgents #MachineLearning
✨ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
📝 Summary:
ReSum enhances LLM-based web agents by overcoming context window limitations through periodic context summarization. This novel paradigm converts interaction histories into compact reasoning states, enabling indefinite exploration for complex tasks. ReSum improves performance by 4.5% over ReAct, ...
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13313
• PDF: https://arxiv.org/pdf/2509.13313
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #ContextSummarization #WebAgents #Research
📝 Summary:
ReSum enhances LLM-based web agents by overcoming context window limitations through periodic context summarization. This novel paradigm converts interaction histories into compact reasoning states, enabling indefinite exploration for complex tasks. ReSum improves performance by 4.5% over ReAct, ...
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13313
• PDF: https://arxiv.org/pdf/2509.13313
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #ContextSummarization #WebAgents #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨WebDancer: Towards Autonomous Information Seeking Agency
📝 Summary:
WebDancer proposes a four-stage framework for building autonomous information seeking agents. This approach combines data construction, trajectory sampling, supervised fine-tuning, and reinforcement learning, demonstrating strong performance on challenging benchmarks.
🔹 Publication Date: Published on May 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.22648
• PDF: https://arxiv.org/pdf/2505.22648
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebDancer-32B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/frucht/Alibaba-NLP-WebDancer-32B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #AutonomousAgents #ReinforcementLearning #MachineLearning #WebAgents
📝 Summary:
WebDancer proposes a four-stage framework for building autonomous information seeking agents. This approach combines data construction, trajectory sampling, supervised fine-tuning, and reinforcement learning, demonstrating strong performance on challenging benchmarks.
🔹 Publication Date: Published on May 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.22648
• PDF: https://arxiv.org/pdf/2505.22648
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebDancer-32B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/frucht/Alibaba-NLP-WebDancer-32B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #AutonomousAgents #ReinforcementLearning #MachineLearning #WebAgents
✨WebSailor: Navigating Super-human Reasoning for Web Agent
📝 Summary:
WebSailor is a post-training method that enhances open-source LLMs with sophisticated reasoning to tackle complex web information-seeking tasks. It teaches models to systematically reduce extreme uncertainty, achieving performance comparable to proprietary AI agents.
🔹 Publication Date: Published on Jul 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.02592
• PDF: https://arxiv.org/pdf/2507.02592
• Project Page: https://github.com/Alibaba-NLP/WebAgent
• Github: https://github.com/Alibaba-NLP/WebAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #WebAgents #AI #MachineLearning #Reasoning
📝 Summary:
WebSailor is a post-training method that enhances open-source LLMs with sophisticated reasoning to tackle complex web information-seeking tasks. It teaches models to systematically reduce extreme uncertainty, achieving performance comparable to proprietary AI agents.
🔹 Publication Date: Published on Jul 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.02592
• PDF: https://arxiv.org/pdf/2507.02592
• Project Page: https://github.com/Alibaba-NLP/WebAgent
• Github: https://github.com/Alibaba-NLP/WebAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #WebAgents #AI #MachineLearning #Reasoning
✨WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
📝 Summary:
WebWatcher, a multimodal agent, enhances visual-language reasoning for complex information retrieval. It uses synthetic trajectories, tools, and RL for training, outperforming existing agents. This advances solving multimodal info-seeking tasks.
🔹 Publication Date: Published on Aug 7
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webwatcher-breaking-new-frontier-of-vision-language-deep-research-agent
• PDF: https://arxiv.org/pdf/2508.05748
• Project Page: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebWatcher-32B
• https://huggingface.co/Alibaba-NLP/WebWatcher-7B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguage #MultimodalAI #DeepLearning #AIagents #InformationRetrieval
📝 Summary:
WebWatcher, a multimodal agent, enhances visual-language reasoning for complex information retrieval. It uses synthetic trajectories, tools, and RL for training, outperforming existing agents. This advances solving multimodal info-seeking tasks.
🔹 Publication Date: Published on Aug 7
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/webwatcher-breaking-new-frontier-of-vision-language-deep-research-agent
• PDF: https://arxiv.org/pdf/2508.05748
• Project Page: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebWatcher-32B
• https://huggingface.co/Alibaba-NLP/WebWatcher-7B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguage #MultimodalAI #DeepLearning #AIagents #InformationRetrieval
❤1
✨WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
📝 Summary:
WebShaper synthesizes information-seeking datasets to address data scarcity for LLM agents. It uses a formalization-driven framework based on set theory and Knowledge Projections, enabling precise control over reasoning structure. This leads to state-of-the-art performance on open-sourced benchma...
🔹 Publication Date: Published on Jul 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.15061
• PDF: https://arxiv.org/pdf/2507.15061
• Project Page: https://huggingface.co/papers?q=Knowledge%20Projections%20(KP)
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebShaper-32B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Alibaba-NLP/WebShaper
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AIAgents #DataGeneration #FormalMethods #NLP
📝 Summary:
WebShaper synthesizes information-seeking datasets to address data scarcity for LLM agents. It uses a formalization-driven framework based on set theory and Knowledge Projections, enabling precise control over reasoning structure. This leads to state-of-the-art performance on open-sourced benchma...
🔹 Publication Date: Published on Jul 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.15061
• PDF: https://arxiv.org/pdf/2507.15061
• Project Page: https://huggingface.co/papers?q=Knowledge%20Projections%20(KP)
• Github: https://github.com/Alibaba-NLP/WebAgent
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-NLP/WebShaper-32B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Alibaba-NLP/WebShaper
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AIAgents #DataGeneration #FormalMethods #NLP
✨DeepAgent: A General Reasoning Agent with Scalable Toolsets
📝 Summary:
DeepAgent is an end-to-end deep reasoning agent that autonomously thinks, discovers tools, and executes actions. It uses memory folding for long interactions and ToolPO reinforcement learning for general tool use. DeepAgent consistently outperforms baselines on eight diverse benchmarks.
🔹 Publication Date: Published on Oct 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21618
• PDF: https://arxiv.org/pdf/2510.21618
• Github: https://github.com/RUC-NLPIR/DeepAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ReasoningAgents #ReinforcementLearning #ToolLearning #DeepLearning
📝 Summary:
DeepAgent is an end-to-end deep reasoning agent that autonomously thinks, discovers tools, and executes actions. It uses memory folding for long interactions and ToolPO reinforcement learning for general tool use. DeepAgent consistently outperforms baselines on eight diverse benchmarks.
🔹 Publication Date: Published on Oct 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.21618
• PDF: https://arxiv.org/pdf/2510.21618
• Github: https://github.com/RUC-NLPIR/DeepAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ReasoningAgents #ReinforcementLearning #ToolLearning #DeepLearning
✨Zep: A Temporal Knowledge Graph Architecture for Agent Memory
📝 Summary:
Zep is a new AI agent memory service using a temporal knowledge graph for dynamic knowledge integration. It outperforms MemGPT in benchmarks and significantly improves temporal reasoning and cross-session synthesis for enterprise applications, reducing latency.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• Github: https://github.com/getzep/graphiti
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AIAgents #KnowledgeGraphs #TemporalReasoning #AIArchitecture #ArtificialIntelligence
📝 Summary:
Zep is a new AI agent memory service using a temporal knowledge graph for dynamic knowledge integration. It outperforms MemGPT in benchmarks and significantly improves temporal reasoning and cross-session synthesis for enterprise applications, reducing latency.
🔹 Publication Date: Published on Jan 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.13956
• PDF: https://arxiv.org/pdf/2501.13956
• Github: https://github.com/getzep/graphiti
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AIAgents #KnowledgeGraphs #TemporalReasoning #AIArchitecture #ArtificialIntelligence