Data Science | Machine Learning with Python for Researchers
32.5K subscribers
3.14K photos
109 videos
23 files
3.35K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
🔹 Title: R-Zero: Self-Evolving Reasoning LLM from Zero Data

🔹 Publication Date: Published on Aug 7

🔹 Abstract: R-Zero is a self-evolving framework that autonomously generates and learns from its own training data, improving reasoning capabilities in LLMs without human-curated tasks. AI-generated summary Self-evolving Large Language Models ( LLMs ) offer a scalable path toward super-intelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for training such models still rely heavily on vast human-curated tasks and labels, typically via fine-tuning or reinforcement learning, which poses a fundamental bottleneck to advancing AI systems toward capabilities beyond human intelligence. To overcome this limitation, we introduce R-Zero , a fully autonomous framework that generates its own training data from scratch. Starting from a single base LLM, R-Zero initializes two independent models with distinct roles, a Challenger and a Solver . These models are optimized separately and co-evolve through interaction: the Challenger is rewarded for proposing tasks near the edge of the Solver capability, and the Solver is rewarded for solving increasingly challenging tasks posed by the Challenger . This process yields a targeted, self-improving curriculum without any pre-existing tasks and labels. Empirically, R-Zero substantially improves reasoning capability across different backbone LLMs , e.g., boosting the Qwen3-4B-Base by +6.49 on math-reasoning benchmarks and +7.54 on general-domain reasoning benchmarks .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05004

• PDF: https://arxiv.org/pdf/2508.05004

• Project Page: https://chengsong-huang.github.io/R-Zero.github.io/

• Github: https://github.com/Chengsong-Huang/R-Zero

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
4
🔹 Title: AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

🔹 Publication Date: Published on Jul 26

🔹 Abstract: AgentTTS, an LLM-agent-based framework, optimizes compute allocation for multi-stage complex tasks, improving performance and robustness compared to traditional methods. AI-generated summary Test-time scaling (TTS) enhances the performance of large language models (LLMs) by allocating additional compute resources during inference. However, existing research primarily investigates TTS in single-stage tasks; while many real-world problems are multi-stage complex tasks , composed of a sequence of heterogeneous subtasks with each subtask requires LLM of specific capability. Therefore, we study a novel problem: the test-time compute-optimal scaling in multi-stage complex tasks , aiming to select suitable models and allocate budgets per subtask to maximize overall performance. TTS in multi-stage tasks introduces two fundamental challenges: (i) The combinatorial search space of model and budget allocations, combined with the high cost of inference, makes brute-force search impractical. (ii) The optimal model and budget allocations across subtasks are interdependent, increasing the complexity of the compute-optimal search. To address this gap, we conduct extensive pilot experiments on four tasks across six datasets, deriving three empirical insights characterizing the behavior of LLMs in multi-stage complex tasks . Informed by these insights, we propose AgentTTS , an LLM-agent-based framework that autonomously searches for compute-optimal allocations through iterative feedback-driven interactions with the execution environment. Experimental results demonstrate that AgentTTS significantly outperforms traditional and other LLM-based baselines in search efficiency, and shows improved robustness to varying training set sizes and enhanced interpretability.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.00890

• PDF: https://arxiv.org/pdf/2508.00890

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1
🔹 Title: Flow Equivariant Recurrent Neural Networks

🔹 Publication Date: Published on Jul 20

🔹 Abstract: Equivariant neural network architectures are extended to handle time-parameterized transformations, improving performance in sequence models like RNNs. AI-generated summary Data arrives at our senses as a continuous stream, smoothly transforming from one instant to the next. These smooth transformations can be viewed as continuous symmetries of the environment that we inhabit, defining equivalence relations between stimuli over time. In machine learning, neural network architectures that respect symmetries of their data are called equivariant and have provable benefits in terms of generalization ability and sample efficiency. To date, however, equivariance has been considered only for static transformations and feed-forward networks, limiting its applicability to sequence models , such as recurrent neural networks ( RNNs ), and corresponding time-parameterized sequence transformations. In this work, we extend equivariant network theory to this regime of `flows' -- one-parameter Lie subgroups capturing natural transformations over time, such as visual motion . We begin by showing that standard RNNs are generally not flow equivariant : their hidden states fail to transform in a geometrically structured manner for moving stimuli. We then show how flow equivariance can be introduced, and demonstrate that these models significantly outperform their non- equivariant counterparts in terms of training speed , length generalization , and velocity generalization, on both next step prediction and sequence classification . We present this work as a first step towards building sequence models that respect the time-parameterized symmetries which govern the world around us.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.14793

• PDF: https://arxiv.org/pdf/2507.14793

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3
🔹 Title: LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

🔹 Publication Date: Published on Aug 5

🔹 Abstract: LongVie, an end-to-end autoregressive framework, addresses temporal consistency and visual degradation in ultra-long video generation through unified noise initialization, global control signal normalization, multi-modal control, and degradation-aware training. AI-generated summary Controllable ultra-long video generation is a fundamental yet challenging task. Although existing methods are effective for short clips, they struggle to scale due to issues such as temporal inconsistency and visual degradation . In this paper, we initially investigate and identify three key factors: separate noise initialization, independent control signal normalization, and the limitations of single-modality guidance. To address these issues, we propose LongVie, an end-to-end autoregressive framework for controllable long video generation. LongVie introduces two core designs to ensure temporal consistency : 1) a unified noise initialization strategy that maintains consistent generation across clips, and 2) global control signal normalization that enforces alignment in the control space throughout the entire video. To mitigate visual degradation, LongVie employs 3) a multi-modal control framework that integrates both dense (e.g., depth maps) and sparse (e.g., keypoints) control signals, complemented by 4) a degradation-aware training strategy that adaptively balances modality contributions over time to preserve visual quality. We also introduce LongVGenBench , a comprehensive benchmark consisting of 100 high-resolution videos spanning diverse real-world and synthetic environments, each lasting over one minute. Extensive experiments show that LongVie achieves state-of-the-art performance in long-range controllability, consistency, and quality.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03694

• PDF: https://arxiv.org/pdf/2508.03694

• Project Page: https://vchitect.github.io/LongVie-project

• Github: https://github.com/Vchitect/LongVie

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🙏21
🔹 Title: The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models

🔹 Publication Date: Published on Jul 31

🔹 Abstract: Transformer-based text-to-image diffusion models show varying degrees of content-style separation in generated artworks, as revealed by cross-attention heatmaps. AI-generated summary Text-to-image diffusion models have demonstrated remarkable capabilities in generating artistic content by learning from billions of images, including popular artworks. However, the fundamental question of how these models internally represent concepts, such as content and style in paintings, remains unexplored. Traditional computer vision assumes content and style are orthogonal, but diffusion models receive no explicit guidance about this distinction during training. In this work, we investigate how transformer-based text-to-image diffusion models encode content and style concepts when generating artworks. We leverage cross-attention heatmaps to attribute pixels in generated images to specific prompt tokens, enabling us to isolate image regions influenced by content-describing versus style-describing tokens. Our findings reveal that diffusion models demonstrate varying degrees of content-style separation depending on the specific artistic prompt and style requested. In many cases, content tokens primarily influence object-related regions while style tokens affect background and texture areas, suggesting an emergent understanding of the content-style distinction. These insights contribute to our understanding of how large-scale generative models internally represent complex artistic concepts without explicit supervision. We share the code and dataset, together with an exploratory tool for visualizing attention maps at https://github.com/umilISLab/artistic-prompt-interpretation.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.23313

• PDF: https://arxiv.org/pdf/2507.23313

• Project Page: https://thecowofrembrandt.islab.di.unimi.it/

• Github: https://github.com/umilISLab/artistic-prompt-interpretation

🔹 Datasets citing this paper:
https://huggingface.co/datasets/sergiopicascia/thecowofrembrandt

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
❤‍🔥3
🔹 Title: DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning

🔹 Publication Date: Published on Aug 7

🔹 Abstract: DeepPHY evaluates Vision Language Models' physical reasoning and control through simulated environments with varying difficulty levels. AI-generated summary Although Vision Language Models (VLMs) exhibit strong perceptual abilities and impressive visual reasoning, they struggle with attention to detail and precise action planning in complex, dynamic environments, leading to subpar performance. Real-world tasks typically require complex interactions, advanced spatial reasoning, long-term planning, and continuous strategy refinement, usually necessitating understanding the physics rules of the target scenario. However, evaluating these capabilities in real-world scenarios is often prohibitively expensive. To bridge this gap, we introduce DeepPHY , a novel benchmark framework designed to systematically evaluate VLMs' understanding and reasoning about fundamental physical principles through a series of challenging simulated environments . DeepPHY integrates multiple physical reasoning environments of varying difficulty levels and incorporates fine-grained evaluation metrics . Our evaluation finds that even state-of-the-art VLMs struggle to translate descriptive physical knowledge into precise, predictive control.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05405

• PDF: https://arxiv.org/pdf/2508.05405

• Project Page: https://github.com/XinrunXu/DeepPHY

• Github: https://github.com/XinrunXu/DeepPHY

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
🔹 Title: OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets

🔹 Publication Date: Published on Aug 3

🔹 Abstract: OpenMed NER, a suite of open-source transformer models using DAPT and LoRA, achieves state-of-the-art performance on diverse biomedical NER benchmarks with high efficiency and low computational cost. AI-generated summary Named-entity recognition (NER) is fundamental to extracting structured information from the >80% of healthcare data that resides in unstructured clinical notes and biomedical literature. Despite recent advances with large language models, achieving state-of-the-art performance across diverse entity types while maintaining computational efficiency remains a significant challenge. We introduce OpenMed NER, a suite of open-source, domain-adapted transformer models that combine lightweight domain-adaptive pre-training (DAPT) with parameter-efficient Low-Rank Adaptation (LoRA) . Our approach performs cost-effective DAPT on a 350k-passage corpus compiled from ethically sourced, publicly available research repositories and de-identified clinical notes (PubMed, arXiv, and MIMIC-III) using DeBERTa-v3 , PubMedBERT , and BioELECTRA backbones. This is followed by task-specific fine-tuning with LoRA, which updates less than 1.5% of model parameters. We evaluate our models on 12 established biomedical NER benchmarks spanning chemicals, diseases, gene s, and species. OpenMed NER achieves new state-of-the-art micro-F1 scores on 10 of these 12 datasets, with substantial gains across diverse entity types. Our models advance the state-of-the-art on foundational disease and chemical benchmarks (e.g., BC5CDR-Disease , +2.70 pp), while delivering even larger improvements of over 5.3 and 9.7 percentage points on more specialized gene and clinical cell line corpora . This work demonstrates that strategically adapted open-source models can surpass closed-source solutions. This performance is achieved with remarkable efficiency: training completes in under 12 hours on a single GPU with a low carbon footprint (

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01630

• PDF: https://arxiv.org/pdf/2508.01630

• Project Page: https://huggingface.co/OpenMed

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/OpenMed/openmed-clinical-ner

https://huggingface.co/spaces/lz-12/space
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

🔹 Publication Date: Published on Aug 3

🔹 Abstract: A new training paradigm and situated embedding models (SitEmb) enhance retrieval performance by conditioning short text chunks on broader context windows, outperforming state-of-the-art models with fewer parameters. AI-generated summary Retrieval-augmented generation (RAG) over long documents typically involves splitting the text into smaller chunks, which serve as the basic units for retrieval. However, due to dependencies across the original document, contextual information is often essential for accurately interpreting each chunk. To address this, prior work has explored encoding longer context window s to produce embeddings for longer chunks. Despite these efforts, gains in retrieval and downstream tasks remain limited. This is because (1) longer chunks strain the capacity of embedding models due to the increased amount of information they must encode, and (2) many real-world applications still require returning localized evidence due to constraints on model or human bandwidth. We propose an alternative approach to this challenge by representing short chunks in a way that is conditioned on a broader context window to enhance retrieval performance -- i.e., situating a chunk's meaning within its context. We further show that existing embedding models are not well-equipped to encode such situated context effectively, and thus introduce a new training paradigm and develop the situated embedding models (SitEmb) . To evaluate our method, we curate a book-plot retrieval dataset specifically designed to assess situated retrieval capabilities. On this benchmark, our SitEmb-v1 model based on BGE-M3 substantially outperforms state-of-the-art embedding models , including several with up to 7-8B parameters, with only 1B parameters. Our 8B SitEmb-v1.5 model further improves performance by over 10% and shows strong results across different languages and several downstream applications .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.01959

• PDF: https://arxiv.org/pdf/2508.01959

• Project Page: https://huggingface.co/SituatedEmbedding

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
5
🔹 Title: On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

🔹 Publication Date: Published on Aug 7

🔹 Abstract: Dynamic Fine-Tuning (DFT) improves the generalization of Large Language Models (LLMs) by dynamically rescaling gradients, outperforming standard Supervised Fine-Tuning (SFT) and showing competitive results in offline reinforcement learning. AI-generated summary We present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generalization capabilities of model. To rectify this, we propose Dynamic Fine-Tuning (DFT), stabilizing gradient updates for each token by dynamically rescaling the objective function with the probability of this token. Remarkably, this single-line code change significantly outperforms standard SFT across multiple challenging benchmarks and base models, demonstrating greatly improved generalization. Additionally, our approach shows competitive results in offline RL settings, offering an effective yet simpler alternative. This work bridges theoretical insight and practical solutions, substantially advancing SFT performance. The code will be available at https://github.com/yongliang-wu/DFT.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05629

• PDF: https://arxiv.org/pdf/2508.05629

• Github: https://github.com/yongliang-wu/DFT

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

🔹 Publication Date: Published on Aug 5

🔹 Abstract: CompassVerifier is a lightweight, robust model for verifying LLM outputs across various domains, supported by VerifierBench, a comprehensive benchmark dataset. AI-generated summary Answer verification is crucial not only for evaluating large language models ( LLMs ) by matching their unstructured outputs against standard answers, but also serves as the reward model to guide LLM optimization. Most evaluation frameworks rely on regularized matching or employ general LLMs for answer verification, which demands extensive, repetitive customization for regex rules or evaluation prompts . Two fundamental limitations persist in current methodologies: 1) the absence of comprehensive benchmarks that systematically evaluate verification capabilities across different LLMs ; and 2) the nascent stage of verifier development , where existing approaches lack both the robustness to handle complex edge cases and the generalizability across different domains. In this work, we develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward. It demonstrates multi-domain competency spanning math , knowledge , and diverse reasoning tasks , with the capability to process various answer types, including multi-subproblems , formulas , and sequence answers , while effectively identifying abnormal/invalid responses . We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier. We anticipate that CompassVerifier and VerifierBench will facilitate answer verification, evaluation protocols, and reinforcement learning research. Code and dataset are available at https://github.com/open-compass/CompassVerifier.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03686

• PDF: https://arxiv.org/pdf/2508.03686

• Github: https://github.com/open-compass/CompassVerifier

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1
🔹 Title: Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models

🔹 Publication Date: Published on Aug 4

🔹 Abstract: Research on efficient reasoning methods for Large Reasoning Models (LRMs) aims to reduce reasoning path length without sacrificing performance, through single-model optimization and model collaboration. AI-generated summary Recently, Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks. Among them, DeepSeek R1 has garnered significant attention for its exceptional performance and open-source nature, driving advancements in the research of R1-style LRMs. Unlike traditional Large Language Models (LLMs), these models enhance logical deduction and decision-making capabilities during reasoning by incorporating mechanisms such as long chain-of-thought and self-reflection through reinforcement learning . However, with the widespread application of these models, the problem of overthinking has gradually emerged. Specifically, when generating answers, these models often construct excessively long reasoning chains with redundant or repetitive steps, which leads to reduced reasoning efficiency and may affect the accuracy of the final answer. To this end, various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability. By reviewing the current research advancements in the field of efficient reasoning methods systematically, we categorize existing works into two main directions based on the lens of single-model optimization versus model collaboration : (1) Efficient Reasoning with Single Model, which focuses on improving the reasoning efficiency of individual models; and (2) Efficient Reasoning with Model Collaboration , which explores optimizing reasoning paths through collaboration among multiple models. Besides, we maintain a public GitHub repository that tracks the latest progress in efficient reasoning methods .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02120

• PDF: https://arxiv.org/pdf/2508.02120

• Project Page: https://github.com/yuelinan/Awesome-Efficient-R1-style-LRMs

• Github: https://github.com/yuelinan/Awesome-Efficient-R1-style-LRMs

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification

🔹 Publication Date: Published on Aug 7

🔹 Abstract: Dynamic Fine-Tuning (DFT) improves the generalization of Large Language Models (LLMs) by dynamically rescaling gradients, outperforming standard Supervised Fine-Tuning (SFT) and showing competitive results in offline reinforcement learning. AI-generated summary We present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM), addressing its limited generalization compared to reinforcement learning (RL). Through mathematical analysis, we reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generalization capabilities of model. To rectify this, we propose Dynamic Fine-Tuning (DFT), stabilizing gradient updates for each token by dynamically rescaling the objective function with the probability of this token. Remarkably, this single-line code change significantly outperforms standard SFT across multiple challenging benchmarks and base models, demonstrating greatly improved generalization. Additionally, our approach shows competitive results in offline RL settings, offering an effective yet simpler alternative. This work bridges theoretical insight and practical solutions, substantially advancing SFT performance. The code will be available at https://github.com/yongliang-wu/DFT.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05629

• PDF: https://arxiv.org/pdf/2508.05629

• Github: https://github.com/yongliang-wu/DFT

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

🔹 Publication Date: Published on Aug 7

🔹 Abstract: Genie Envisioner integrates policy learning, evaluation, and simulation using a video diffusion model and neural simulator for instruction-driven robotic manipulation. AI-generated summary We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation that integrates policy learning, evaluation, and simulation within a single video-generative framework. At its core, GE-Base is a large-scale, instruction-conditioned video diffusion model that captures the spatial, temporal, and semantic dynamics of real-world robotic interactions in a structured latent space . Built upon this foundation, GE-Act maps latent representations to executable action trajectories through a lightweight, flow-matching decoder , enabling precise and generalizable policy inference across diverse embodiments with minimal supervision. To support scalable evaluation and training, GE-Sim serves as an action-conditioned neural simulator, producing high-fidelity rollouts for closed-loop policy development. The platform is further equipped with EWMBench , a standardized benchmark suite measuring visual fidelity , physical consistency , and instruction-action alignment. Together, these components establish Genie Envisioner as a scalable and practical foundation for instruction-driven, general-purpose embodied intelligence. All code, models, and benchmarks will be released publicly.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05635

• PDF: https://arxiv.org/pdf/2508.05635

• Project Page: https://genie-envisioner.github.io/

• Github: https://genie-envisioner.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
5
🔹 Title: Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

🔹 Publication Date: Published on Aug 7

🔹 Abstract: Genie Envisioner integrates policy learning, evaluation, and simulation using a video diffusion model and neural simulator for instruction-driven robotic manipulation. AI-generated summary We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation that integrates policy learning, evaluation, and simulation within a single video-generative framework. At its core, GE-Base is a large-scale, instruction-conditioned video diffusion model that captures the spatial, temporal, and semantic dynamics of real-world robotic interactions in a structured latent space . Built upon this foundation, GE-Act maps latent representations to executable action trajectories through a lightweight, flow-matching decoder , enabling precise and generalizable policy inference across diverse embodiments with minimal supervision. To support scalable evaluation and training, GE-Sim serves as an action-conditioned neural simulator, producing high-fidelity rollouts for closed-loop policy development. The platform is further equipped with EWMBench , a standardized benchmark suite measuring visual fidelity , physical consistency , and instruction-action alignment. Together, these components establish Genie Envisioner as a scalable and practical foundation for instruction-driven, general-purpose embodied intelligence. All code, models, and benchmarks will be released publicly.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05635

• PDF: https://arxiv.org/pdf/2508.05635

• Project Page: https://genie-envisioner.github.io/

• Github: https://genie-envisioner.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3
🔹 Title: Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

🔹 Publication Date: Published on Aug 4

🔹 Abstract: A self-supervised RL framework enhances instruction following in reasoning models without external supervision, maintaining reasoning performance and offering scalability and cost-effectiveness. AI-generated summary Reasoning models excel in complex problem solving but exhibit a concerning trade off between reasoning capabilities and instruction following abilities. Existing approaches for improving instruction following rely on stronger external models, creating methodological bottlenecks and practical limitations including increased costs and accessibility constraints. We propose a self-supervised RL framework that leverages reasoning models ' own internal signals to improve instruction following capabilities without external supervision. Extensive experiments demonstrate that our framework significantly improves instruction following capabilities while maintaining reasoning performance, offering a scalable and cost-effective approach to enhance instruction following in reasoning models . The data and code are publicly available at https://github.com/Rainier-rq/verl-if.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02150

• PDF: https://arxiv.org/pdf/2508.02150

• Github: https://github.com/Rainier-rq/verl-if

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1
🔹 Title: CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

🔹 Publication Date: Published on Aug 4

🔹 Abstract: CRINN, a reinforcement learning-based approach, optimizes approximate nearest-neighbor search algorithms for speed while maintaining accuracy, outperforming state-of-the-art methods on several benchmarks. AI-generated summary Approximate nearest-neighbor search ( ANNS ) algorithms have become increasingly critical for recent AI applications, particularly in retrieval-augmented generation ( RAG ) and agent-based LLM applications. In this paper, we present CRINN, a new paradigm for ANNS algorithms. CRINN treats ANNS optimization as a reinforcement learning problem where execution speed serves as the reward signal . This approach enables the automatic generation of progressively faster ANNS implementations while maintaining accuracy constraints. Our experimental evaluation demonstrates CRINN's effectiveness across six widely-used NNS benchmark datasets . When compared against state-of-the-art open-source ANNS algorithms, CRINN achieves best performance on three of them ( GIST-960-Euclidean , MNIST-784-Euclidean , and GloVe-25-angular ), and tied for first place on two of them ( SIFT-128-Euclidean and GloVe-25-angular ). The implications of CRINN's success reach well beyond ANNS optimization: It validates that LLMs augmented with reinforcement learning can function as an effective tool for automating sophisticated algorithmic optimizations that demand specialized knowledge and labor-intensive manual refinement.Code can be found at https://github.com/deepreinforce-ai/CRINN

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.02091

• PDF: https://arxiv.org/pdf/2508.02091

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
2
🔹 Title: Hi3DEval: Advancing 3D Generation Evaluation with Hierarchical Validity

🔹 Publication Date: Published on Aug 7

🔹 Abstract: Hi3DEval is a hierarchical evaluation framework for 3D generative content that combines object-level and part-level assessments, including material realism, using a large-scale dataset and hybrid 3D representations. AI-generated summary Despite rapid advances in 3D content generation, quality assessment for the generated 3D assets remains challenging. Existing methods mainly rely on image-based metrics and operate solely at the object level, limiting their ability to capture spatial coherence, material authenticity, and high-fidelity local details. 1) To address these challenges, we introduce Hi3DEval , a hierarchical evaluation framework tailored for 3D generative content. It combines both object-level and part-level evaluation , enabling holistic assessments across multiple dimensions as well as fine-grained quality analysis. Additionally, we extend texture evaluation beyond aesthetic appearance by explicitly assessing material realism , focusing on attributes such as albedo, saturation, and metallicness. 2) To support this framework, we construct Hi3DBench , a large-scale dataset comprising diverse 3D assets and high-quality annotations, accompanied by a reliable multi-agent annotation pipeline. We further propose a 3D-aware automated scoring system based on hybrid 3D representations. Specifically, we leverage video-based representations for object-level and material-subject evaluations to enhance modeling of spatio-temporal consistency and employ pretrained 3D features for part-level perception . Extensive experiments demonstrate that our approach outperforms existing image-based metrics in modeling 3D characteristics and achieves superior alignment with human preference, providing a scalable alternative to manual evaluations. The project page is available at https://zyh482.github.io/ Hi3DEval /.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.05609

• PDF: https://arxiv.org/pdf/2508.05609

• Project Page: https://zyh482.github.io/Hi3DEval/

• Github: https://zyh482.github.io/Hi3DEval/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
https://huggingface.co/spaces/3DTopia/3DGen-Leaderboard
==================================

For more data science resources:
https://t.iss.one/DataScienceT
3
🔹 Title: Are Today's LLMs Ready to Explain Well-Being Concepts?

🔹 Publication Date: Published on Aug 6

🔹 Abstract: LLMs can be fine-tuned to generate high-quality, audience-tailored explanations of well-being concepts using Supervised Fine-Tuning and Direct Preference Optimization. AI-generated summary Well-being encompasses mental, physical, and social dimensions essential to personal growth and informed life decisions. As individuals increasingly consult Large Language Models ( LLMs ) to understand well-being, a key challenge emerges: Can LLMs generate explanations that are not only accurate but also tailored to diverse audiences? High-quality explanations require both factual correctness and the ability to meet the expectations of users with varying expertise. In this work, we construct a large-scale dataset comprising 43,880 explanations of 2,194 well-being concepts, generated by ten diverse LLMs . We introduce a principle-guided LLM-as-a-judge evaluation framework, employing dual judges to assess explanation quality . Furthermore, we show that fine-tuning an open-source LLM using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) can significantly enhance the quality of generated explanations. Our results reveal: (1) The proposed LLM judges align well with human evaluations; (2) explanation quality varies significantly across models, audiences, and categories; and (3) DPO- and SFT-finetuned models outperform their larger counterparts, demonstrating the effectiveness of preference-based learning for specialized explanation tasks.

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03990

• PDF: https://arxiv.org/pdf/2508.03990

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
🔹 Title: Tool-integrated Reinforcement Learning for Repo Deep Search

🔹 Publication Date: Published on Aug 5

🔹 Abstract: ToolTrain, a two-stage training framework combining supervised fine-tuning and reinforcement learning, enhances LLMs for issue localization by integrating repository retrieval tools, achieving state-of-the-art performance. AI-generated summary Issue localization, the process of identifying code locations that need modification to resolve software issues, is a critical yet challenging task in software development. The semantic gap between natural language issue descriptions and faulty code requires complex multi-hop reasoning through code dependencies. Existing LLM-based agents attempt to address this by integrating repository retrieval tools. However, this transforms issue localization into a demanding task we call Repo Deep Search , which requires the LLM to effectively utilize various repository retrieval tools throughout a multi-step reasoning and navigation process. To tackle this challenge, we present ToolTrain, a two-stage tool-integrated training framework combining rejection-sampled supervised fine-tuning and tool-integrated reinforcement learning to enhance LLMs' ability to use retrieval tools for issue localization. Experimental results show that ToolTrain-trained models achieve state-of-the-art performance, with our 32B model even surpassing Claude-3.7 on function-level localization. The results also show that improved localization performance translates to better end-to-end issue resolution performance. This further demonstrates that training for issue localization is a viable and effective strategy for improving automated software development .

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03012

• PDF: https://arxiv.org/pdf/2508.03012

• Github: https://github.com/Mizersy/RepoDeepSearch

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
https://t.iss.one/DataScienceT
1