Data Science | Machine Learning with Python for Researchers

🔹 Title: C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

🔹 Publication Date: Published on Jul 30

🔹 Abstract: A benchmark dataset for Spoken Dialogue Models (SDMs) in English and Chinese is presented to evaluate their performance in understanding and emulating human spoken conversations, addressing challenges like ambiguity and context-dependency. AI-generated summary Spoken Dialogue Models ( SDMs ) have recently attracted significant attention for their ability to generate voice responses directly to users' spoken queries. Despite their increasing popularity, there exists a gap in research focused on comprehensively understanding their practical effectiveness in comprehending and emulating human conversations. This is especially true compared to text-based Large Language Models ( LLMs ), which benefit from extensive benchmarking. Human voice interactions are inherently more complex than text due to characteristics unique to spoken dialogue. Ambiguity poses one challenge, stemming from semantic factors like polysemy , as well as phonological aspects such as heterograph , heteronyms , and stress patterns . Additionally, context-dependency , like omission , coreference , and multi-turn interaction, adds further complexity to human conversational dynamics. To illuminate the current state of SDM development and to address these challenges, we present a benchmark dataset in this paper, which comprises 1,079 instances in English and Chinese. Accompanied by an LLM-based evaluation method that closely aligns with human judgment , this dataset facilitates a comprehensive exploration of the performance of SDMs in tackling these practical challenges.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.22968

• PDF: https://arxiv.org/pdf/2507.22968

• Project Page: https://step-out.github.io/C3-web/

• Github: https://github.com/step-out/C3

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/ChengqianMa/C3

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

719 views10:58

Data Science | Machine Learning with Python for Researchers

🔹 Title: EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation

🔹 Publication Date: Published on Aug 6

🔹 Abstract: EvoC2Rust is an automated framework that translates entire C projects to Rust using a skeleton-guided approach, combining rule-based and LLM-based methods to improve syntax, semantics, and safety. AI-generated summary Rust's compile-time safety guarantees make it ideal for safety-critical systems, creating demand for translating legacy C codebases to Rust. While various approaches have emerged for this task, they face inherent trade-offs: rule-based solutions face challenges in meeting code safety and idiomaticity requirements, while LLM -based solutions often fail to generate semantically equivalent Rust code, due to the heavy dependencies of modules across the entire codebase. Recent studies have revealed that both solutions are limited to small-scale programs. In this paper, we propose EvoC2Rust, an automated framework for converting entire C projects to equivalent Rust ones. EvoC2Rust employs a skeleton-guided translation strategy for project-level translation. The pipeline consists of three evolutionary stages : 1) it first decomposes the C project into functional modules , employs a feature-mapping-enhanced LLM to transform definitions and macros and generates type-checked function stubs , which form a compilable Rust skeleton; 2) it then incrementally translates the function, replacing the corresponding stub placeholder; 3) finally, it repairs compilation errors by integrating LLM and static analysis . Through evolutionary augmentation, EvoC2Rust combines the advantages of both rule-based and LLM -based solutions. Our evaluation on open-source benchmarks and six industrial projects demonstrates EvoC2Rust's superior performance in project-level C-to-Rust translation. On average, it achieves 17.24% and 14.32% improvements in syntax and semantic accuracy over the LLM -based approaches, along with a 96.79% higher code safety rate than the rule-based tools. At the module level, EvoC2Rust reaches 92.25% compilation and 89.53% test pass rates on industrial projects, even for complex codebases and long functions.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.04295

• PDF: https://arxiv.org/pdf/2508.04295

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

606 views17:32

Data Science | Machine Learning with Python for Researchers

🔹 Title: R-Zero: Self-Evolving Reasoning LLM from Zero Data

🔹 Publication Date: Published on Aug 7

🔹 Abstract: R-Zero is a self-evolving framework that autonomously generates and learns from its own training data, improving reasoning capabilities in LLMs without human-curated tasks. AI-generated summary Self-evolving Large Language Models ( LLMs ) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero , a fully autonomous framework that generates its own training data from scratch. Starting from a single base LLM, R-Zero initializes two independent models with distinct roles, a Challenger and a Solver . These models are optimized separately and co-evolve through interaction: the Challenger is rewarded for proposing tasks near the edge of the Solver capability, and the Solver is rewarded for solving increasingly challenging tasks posed by the Challenger . This process yields a targeted, self-improving curriculum without any pre-existing tasks and labels. Empirically, R-Zero substantially improves reasoning capability across different backbone LLMs , e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05004

• PDF: https://arxiv.org/pdf/2508.05004

• Project Page: https://chengsong-huang.github.io/R-Zero.github.io/

• Github: https://github.com/Chengsong-Huang/R-Zero

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤3

577 views19:37

Data Science | Machine Learning with Python for Researchers

🔹 Title: Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?

🔹 Publication Date: Published on Aug 5

🔹 Abstract: Double-Bench is a large-scale, multilingual, and multimodal evaluation system for document Retrieval-Augmented Generation (RAG) systems, addressing limitations in current benchmarks and providing comprehensive assessments of system components. AI-generated summary Retrieval-Augmented Generation (RAG) systems using Multimodal Large Language Models (MLLMs) show great promise for complex document understanding, yet their development is critically hampered by inadequate evaluation. Current benchmarks often focus on specific part of document RAG system and use synthetic data with incomplete ground truth and evidence labels, therefore failing to reflect real-world bottlenecks and challenges. To overcome these limitations, we introduce Double-Bench : a new large-scale , multilingual , and multimodal evaluation system that is able to produce fine-grained assessment to each component within document RAG system s. It comprises 3,276 documents (72,880 pages) and 5,168 single- and multi-hop queries across 6 languages and 4 document types with streamlined dynamic update support for potential data contamination issues. Queries are grounded in exhaustively scanned evidence pages and verified by human experts to ensure maximum quality and completeness. Our comprehensive experiments across 9 state-of-the-art embedding models, 4 MLLMs and 4 end-to-end document RAG frameworks demonstrate the gap between text and visual embedding models is narrowing, highlighting the need in building stronger document retrieval models . Our findings also reveal the over-confidence dilemma within current document RAG frameworks that tend to provide answer even without evidence support. We hope our fully open-source Double-Bench provide a rigorous foundation for future research in advanced document RAG system s. We plan to retrieve timely corpus and release new benchmarks on an annual basis.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03644

• PDF: https://arxiv.org/pdf/2508.03644

• Project Page: https://double-bench.github.io/

• Github: https://github.com/Episoode/Double-Bench

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

538 views22:53

Data Science | Machine Learning with Python for Researchers

🔹 Title: RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

🔹 Publication Date: Published on Aug 2

🔹 Abstract: RoboMemory, a brain-inspired multi-memory framework, enhances lifelong learning in physical robots by integrating cognitive neuroscience principles and achieving state-of-the-art performance in real-world tasks. AI-generated summary We present RoboMemory, a brain-inspired multi-memory framework for lifelong learning in physical embodied systems, addressing critic al challenges in real-world environments: continuous learning, multi-module memory latency, task correlation capture, and infinite-loop mitigation in closed-loop planning. Grounded in cognitive neuroscience, it integrates four core modules: the Information Preprocessor (thalamus-like), the Lifelong Embodied Memory System (hippocampus-like), the Closed-Loop Planning Module (prefrontal lobe-like), and the Low-Level Executer (cerebellum-like) to enable long-term planning and cumulative learning. The Lifelong Embodied Memory System , central to the framework, alleviates inference speed issues in complex memory frameworks via parallelized updates/retrieval across Spatial, Temporal, Episodic, and Semantic submodules. It incorporates a dynamic Knowledge Graph (KG) and consistent architectural design to enhance memory consistency and scalability. Evaluations on EmbodiedBench show RoboMemory outperforms the open-source baseline ( Qwen2.5-VL-72B-Ins ) by 25% in average success rate and surpasses the closed-source State-of-the-Art ( SOTA ) ( Claude3.5-Sonnet ) by 5%, establishing new SOTA . Ablation studies validate key components ( critic , spatial memory , long-term memory ), while real-world deployment confirms its lifelong learning capability with significantly improved success rates across repeated tasks. RoboMemory alleviates high latency challenges with scalability, serving as a foundational reference for integrating multi-modal memory systems in physical robots.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01415

• PDF: https://arxiv.org/pdf/2508.01415

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

476 views05:53

Data Science | Machine Learning with Python for Researchers

🔹 Title: On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

🔹 Publication Date: Published on Jul 31

🔹 Abstract: Softmax attention is more expressive than linear attention due to its recurrent form, which can be analyzed using RNN components. AI-generated summary Since its introduction, softmax attention has become the backbone of modern transformer architectures due to its expressiveness and scalability across a wide range of tasks. However, the main drawback of softmax attention is the quadratic memory requirement and computational complexity with respect to the sequence length . By replacing the softmax nonlinearity, linear attention and similar methods have been introduced to avoid the quadratic bottleneck of softmax attention . Despite these linear forms of attention being derived from the original softmax formulation, they typically lag in terms of downstream accuracy. While strong intuition of the softmax nonlinearity on the query and key inner product suggests that it has desirable properties compared to other nonlinearities, the question of why this discrepancy exists still remains unanswered. This work demonstrates that linear attention is an approximation of softmax attention by deriving the recurrent form of softmax attention . Using this form, each part of softmax attention can be described in the language of recurrent neural networks (RNNs) . Describing softmax attention as an RNN allows for the ablation of the components of softmax attention to understand the importance of each part and how they interact. In this way, our work helps explain why softmax attention is more expressive than its counterparts.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23632

• PDF: https://arxiv.org/pdf/2507.23632

• Github: https://github.com/gmongaras/On-the-Expressiveness-of-Softmax-Attention-A-Recurrent-Neural-Network-Perspective

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

429 views06:59

Data Science | Machine Learning with Python for Researchers

🔹 Title: ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

🔹 Publication Date: Published on Aug 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07050
• PDF: https://arxiv.org/pdf/2508.07050
• Project Page: https://github.com/8421BCD/ReasonRank
• Github: https://github.com/8421BCD/ReasonRank

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/liuwenhan/reasonrank_data_13k
• https://huggingface.co/datasets/liuwenhan/reasonrank_data_rl
• https://huggingface.co/datasets/liuwenhan/reasonrank_data_sft

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

341 views09:17

Data Science | Machine Learning with Python for Researchers

🔹 Title: WideSearch: Benchmarking Agentic Broad Info-Seeking

🔹 Publication Date: Published on Aug 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07999
• PDF: https://arxiv.org/pdf/2508.07999
• Project Page: https://widesearch-seed.github.io/
• Github: https://widesearch-seed.github.io/

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/ByteDance-Seed/WideSearch

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

271 views09:17

Data Science | Machine Learning with Python for Researchers

🔹 Title: Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

🔹 Publication Date: Published on Aug 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07981
• PDF: https://arxiv.org/pdf/2508.07981
• Project Page: https://amap-ml.github.io/Omni-Effects.github.io/
• Github: https://github.com/AMAP-ML/Omni-Effects

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/GD-ML/Omni-VFX

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

266 views09:17

Data Science | Machine Learning with Python for Researchers

🔹 Title: Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

🔹 Publication Date: Published on Aug 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2508.07629
• PDF: https://arxiv.org/pdf/2508.07629
• Project Page: https://github.com/suu990901/KlearReasoner
• Github: https://github.com/suu990901/KlearReasoner

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

264 views09:17

Data Science | Machine Learning with Python for Researchers

🔹 Title: UserBench: An Interactive Gym Environment for User-Centric Agents

🔹 Publication Date: Published on Jul 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.22034
• PDF: https://arxiv.org/pdf/2507.22034
• Github: https://github.com/SalesforceAIResearch/UserBench

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

250 views09:17

Data Science | Machine Learning with Python for Researchers

🔹 Title: BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

🔹 Publication Date: Published on Aug 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.06600
• PDF: https://arxiv.org/pdf/2508.06600
• Project Page: https://texttron.github.io/BrowseComp-Plus/
• Github: https://github.com/texttron/BrowseComp-Plus

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/Tevatron/browsecomp-plus-corpus
• https://huggingface.co/datasets/Tevatron/browsecomp-plus

🔹 Spaces citing this paper:
• https://huggingface.co/spaces/Tevatron/BrowseComp-Plus
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

283 views09:18

Data Science | Machine Learning with Python for Researchers

🔹 Title: OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks

🔹 Publication Date: Published on Aug 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05614
• PDF: https://arxiv.org/pdf/2508.05614
• Project Page: https://zju-real.github.io/OmniEmbodied/
• Github: https://zju-real.github.io/OmniEmbodied/

🔹 Datasets citing this paper:
• https://huggingface.co/datasets/wangzx1210/OmniEAR

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

228 views09:18

Data Science | Machine Learning with Python for Researchers

🔹 Title: MolmoAct: Action Reasoning Models that can Reason in Space

🔹 Publication Date: Published on Aug 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07917
• PDF: https://arxiv.org/pdf/2508.07917

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

257 views09:18

Data Science | Machine Learning with Python for Researchers

🔹 Title: Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future

🔹 Publication Date: Published on Aug 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.06026
• PDF: https://arxiv.org/pdf/2508.06026

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

248 views09:18

Data Science | Machine Learning with Python for Researchers

🔹 Title: Reinforcement Learning in Vision: A Survey

🔹 Publication Date: Published on Aug 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.08189
• PDF: https://arxiv.org/pdf/2508.08189
• Github: https://github.com/weijiawu/Awesome-Visual-Reinforcement-Learning

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

253 views09:18

Data Science | Machine Learning with Python for Researchers

🔹 Title: SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

🔹 Publication Date: Published on Aug 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05305
• PDF: https://arxiv.org/pdf/2508.05305
• Github: https://github.com/FusionBrainLab/SONAR-LLM/tree/main

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

309 views09:18

Data Science | Machine Learning with Python for Researchers

🔹 Title: Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

🔹 Publication Date: Published on Aug 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.08221
• PDF: https://arxiv.org/pdf/2508.08221

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

249 views09:18

Data Science | Machine Learning with Python for Researchers

🔹 Title: Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control

🔹 Publication Date: Published on Aug 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.08134
• PDF: https://arxiv.org/pdf/2508.08134
• Github: https://github.com/mayuelala/FollowYourShape

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

263 views09:19

Data Science | Machine Learning with Python for Researchers

🔹 Title: Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

🔹 Publication Date: Published on Aug 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07101
• PDF: https://arxiv.org/pdf/2508.07101
• Github: https://github.com/DerrickYLJ/LessIsMore

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

235 views09:19

About

Blog

Apps

Platform