🔹 Title:
Video-T1: Test-Time Scaling for Video Generation
🔹 Publication Date: Published on Mar 24
🔹 Abstract:
Test-Time Scaling (TTS) in video generation improves video quality by adaptively sampling from noise space with feedback mechanisms, particularly demonstrated with the Tree-of-Frames method. AI-generated summary With the scale capability of increasing training data, model size, and computational cost, video generation has achieved impressive results in digital creation, enabling users to express creativity across various domains. Recently, researchers in Large Language Models (LLMs) have expanded the scaling to test-time, which can significantly improve LLM performance by using more inference-time computation. Instead of scaling up video foundation models through expensive training costs, we explore the power of Test-Time Scaling (TTS) in video generation , aiming to answer the question: if a video generation model is allowed to use non-trivial amount of inference-time compute, how much can it improve generation quality given a challenging text prompt. In this work, we reinterpret the test-time scaling of video generation as a searching problem to sample better trajectories from Gaussian noise space to the target video distribution. Specifically, we build the search space with test-time verifiers to provide feedback and heuristic algorithms to guide searching process. Given a text prompt, we first explore an intuitive linear search strategy by increasing noise candidates at inference time. As full-step denoising all frames simultaneously requires heavy test-time computation costs, we further design a more efficient TTS method for video generation called Tree-of-Frames (ToF) that adaptively expands and prunes video branches in an autoregressive manner. Extensive experiments on text-conditioned video generation benchmarks demonstrate that increasing test-time compute consistently leads to significant improvements in the quality of videos. Project page: https://liuff19.github.io/Video-T1
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2503.18942
• PDF: https://arxiv.org/pdf/2503.18942
• Project Page: https://liuff19.github.io/Video-T1/
• Github: https://github.com/liuff19/Video-T1
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Video-T1: Test-Time Scaling for Video Generation
🔹 Publication Date: Published on Mar 24
🔹 Abstract:
Test-Time Scaling (TTS) in video generation improves video quality by adaptively sampling from noise space with feedback mechanisms, particularly demonstrated with the Tree-of-Frames method. AI-generated summary With the scale capability of increasing training data, model size, and computational cost, video generation has achieved impressive results in digital creation, enabling users to express creativity across various domains. Recently, researchers in Large Language Models (LLMs) have expanded the scaling to test-time, which can significantly improve LLM performance by using more inference-time computation. Instead of scaling up video foundation models through expensive training costs, we explore the power of Test-Time Scaling (TTS) in video generation , aiming to answer the question: if a video generation model is allowed to use non-trivial amount of inference-time compute, how much can it improve generation quality given a challenging text prompt. In this work, we reinterpret the test-time scaling of video generation as a searching problem to sample better trajectories from Gaussian noise space to the target video distribution. Specifically, we build the search space with test-time verifiers to provide feedback and heuristic algorithms to guide searching process. Given a text prompt, we first explore an intuitive linear search strategy by increasing noise candidates at inference time. As full-step denoising all frames simultaneously requires heavy test-time computation costs, we further design a more efficient TTS method for video generation called Tree-of-Frames (ToF) that adaptively expands and prunes video branches in an autoregressive manner. Extensive experiments on text-conditioned video generation benchmarks demonstrate that increasing test-time compute consistently leads to significant improvements in the quality of videos. Project page: https://liuff19.github.io/Video-T1
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2503.18942
• PDF: https://arxiv.org/pdf/2503.18942
• Project Page: https://liuff19.github.io/Video-T1/
• Github: https://github.com/liuff19/Video-T1
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
liuff19.github.io
TWITTER BANNER TITLE META TAG
TWITTER BANNER DESCRIPTION META TAG
❤5
🔹 Title:
Reinforcement Pre-Training
🔹 Publication Date: Published on Jun 9
🔹 Abstract:
Reinforcement Pre-Training (RPT) improves language model accuracy through reinforcement learning and offers a scalable method for leveraging text data for general-purpose RL. AI-generated summary In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL) . Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it receives verifiable rewards for correctly predicting the next token for a given context. RPT offers a scalable method to leverage vast amounts of text data for general-purpose RL, rather than relying on domain-specific annotated answers. By incentivizing the capability of next-token reasoning, RPT significantly improves the language modeling accuracy of predicting the next tokens. Moreover, RPT provides a strong pre-trained foundation for further reinforcement fine-tuning . The scaling curves show that increased training compute consistently improves the next-token prediction accuracy. The results position RPT as an effective and promising scaling paradigm to advance language model pre-training.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.08007
• PDF: https://arxiv.org/pdf/2506.08007
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Reinforcement Pre-Training
🔹 Publication Date: Published on Jun 9
🔹 Abstract:
Reinforcement Pre-Training (RPT) improves language model accuracy through reinforcement learning and offers a scalable method for leveraging text data for general-purpose RL. AI-generated summary In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL) . Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it receives verifiable rewards for correctly predicting the next token for a given context. RPT offers a scalable method to leverage vast amounts of text data for general-purpose RL, rather than relying on domain-specific annotated answers. By incentivizing the capability of next-token reasoning, RPT significantly improves the language modeling accuracy of predicting the next tokens. Moreover, RPT provides a strong pre-trained foundation for further reinforcement fine-tuning . The scaling curves show that increased training compute consistently improves the next-token prediction accuracy. The results position RPT as an effective and promising scaling paradigm to advance language model pre-training.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.08007
• PDF: https://arxiv.org/pdf/2506.08007
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤8
🔹 Title:
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
🔹 Publication Date: Published on Jun 6
🔹 Abstract:
STARFlow, a generative model combining normalizing flows with autoregressive Transformers, achieves competitive image synthesis performance with innovations in architecture and latent space modeling. AI-generated summary We present S TARFlow , a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of S TARFlow is Transformer Autoregressive Flow ( TARFlow ), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce several key architectural and algorithmic innovations to significantly enhance scalability: (1) a deep-shallow design , wherein a deep Transformer block captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial; (2) modeling in the latent space of pretrained autoencoders , which proves more effective than direct pixel-level modeling; and (3) a novel guidance algorithm that significantly boosts sample quality. Crucially, our model remains an end-to-end normalizing flow , enabling exact maximum likelihood training in continuous spaces without discretization. S TARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality. To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.06276
• PDF: https://arxiv.org/pdf/2506.06276
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
🔹 Publication Date: Published on Jun 6
🔹 Abstract:
STARFlow, a generative model combining normalizing flows with autoregressive Transformers, achieves competitive image synthesis performance with innovations in architecture and latent space modeling. AI-generated summary We present S TARFlow , a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of S TARFlow is Transformer Autoregressive Flow ( TARFlow ), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce several key architectural and algorithmic innovations to significantly enhance scalability: (1) a deep-shallow design , wherein a deep Transformer block captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial; (2) modeling in the latent space of pretrained autoencoders , which proves more effective than direct pixel-level modeling; and (3) a novel guidance algorithm that significantly boosts sample quality. Crucially, our model remains an end-to-end normalizing flow , enabling exact maximum likelihood training in continuous spaces without discretization. S TARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality. To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.06276
• PDF: https://arxiv.org/pdf/2506.06276
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
STARFlow: Scaling Latent Normalizing Flows for High-resolution...
We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of STARFlow is Transformer Autoregressive...
❤6
Article Title:
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Article Date: 30 Apr 2025
Article Description:
Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extensive failure logs from 127 LLM multi-agent systems with fine-grained annotations linking failures to specific agents and decisive error steps. Using the Who&When, we develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons. The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps, with some methods performing below random. Even SOTA reasoning models, such as OpenAI o1 and DeepSeek R1, fail to achieve practical usability. These results highlight the task's complexity and the need for further research in this area. Code and dataset are available at https://github.com/mingyin1/Agents_Failure_AttributionPDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.00212v3.pdf
GitHub:
• https://github.com/mingyin1/agents_failure_attribution
Datasets:
• GAIA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Article Date: 30 Apr 2025
Article Description:
Failure attribution in LLM multi-agent systems-identifying the agent and step responsible for task failures-provides crucial clues for systems debugging but remains underexplored and labor-intensive. In this paper, we propose and formulate a new research area: automated failure attribution for LLM multi-agent systems. To support this initiative, we introduce the Who&When dataset, comprising extensive failure logs from 127 LLM multi-agent systems with fine-grained annotations linking failures to specific agents and decisive error steps. Using the Who&When, we develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons. The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps, with some methods performing below random. Even SOTA reasoning models, such as OpenAI o1 and DeepSeek R1, fail to achieve practical usability. These results highlight the task's complexity and the need for further research in this area. Code and dataset are available at https://github.com/mingyin1/Agents_Failure_AttributionPDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.00212v3.pdf
GitHub:
• https://github.com/mingyin1/agents_failure_attribution
Datasets:
• GAIA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title:
Search-o1: Agentic Search-Enhanced Large Reasoning Models
🔹 Publication Date: Published on Jan 9
🔹 Abstract:
Search-o1 enhances large reasoning models with an agentic retrieval-augmented generation mechanism and a Reason-in-Documents module to improve performance on complex reasoning tasks. AI-generated summary Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency , leading to frequent uncertainties and potential errors. To address this limitation, we introduce Search-o1 , a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science , mathematics , and coding , as well as six open-domain QA benchmarks , demonstrate the strong performance of Search-o1 . This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems. The code is available at https://github.com/sunnynexus/ Search-o1 .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2501.05366
• PDF: https://arxiv.org/pdf/2501.05366
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/mangopy/autotools
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Search-o1: Agentic Search-Enhanced Large Reasoning Models
🔹 Publication Date: Published on Jan 9
🔹 Abstract:
Search-o1 enhances large reasoning models with an agentic retrieval-augmented generation mechanism and a Reason-in-Documents module to improve performance on complex reasoning tasks. AI-generated summary Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency , leading to frequent uncertainties and potential errors. To address this limitation, we introduce Search-o1 , a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science , mathematics , and coding , as well as six open-domain QA benchmarks , demonstrate the strong performance of Search-o1 . This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems. The code is available at https://github.com/sunnynexus/ Search-o1 .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2501.05366
• PDF: https://arxiv.org/pdf/2501.05366
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/mangopy/autotools
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title:
Surface-based parcellation and vertex-wise analysis of ultra high-resolution ex vivo 7 tesla MRI in Alzheimer's disease and related dementias
🔹 Publication Date: Published on Mar 28, 2024
🔹 Abstract:
A new dataset of ex vivo MRI brain scans and an automated pipeline using the DKT atlas enable high-resolution, vertex-wise analysis for linking morphometry with histology in Alzheimer's disease research. AI-generated summary Magnetic resonance imaging (MRI) is the standard modality to understand human brain structure and function in vivo (antemortem). Decades of research in human neuroimaging has led to the widespread development of methods and tools to provide automated volume-based segmentations and surface-based parcellations which help localize brain functions to specialized anatomical regions. Recently ex vivo (postmortem) imaging of the brain has opened-up avenues to study brain structure at sub-millimeter ultra high-resolution revealing details not possible to observe with in vivo MRI . Unfortunately, there has been limited methodological development in ex vivo MRI primarily due to lack of datasets and limited centers with such imaging resources. Therefore, in this work, we present one-of-its-kind dataset of 82 ex vivo T2w whole brain hemispheres MRI at 0.3 mm isotropic resolution spanning Alzheimer's disease and related dementias. We adapted and developed a fast and easy-to-use automated surface-based pipeline to parcellate, for the first time, ultra high-resolution ex vivo brain tissue at the native subject space resolution using the Desikan-Killiany-Tourville (DKT) brain atlas . This allows us to perform vertex-wise analysis in the template space and thereby link morphometry measures with pathology measurements derived from histology . We will open-source our dataset docker container , Jupyter notebooks for ready-to-use out-of-the-box set of tools and command line options to advance ex vivo MRI clinical brain imaging research on the project webpage.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2403.19497
• PDF: https://arxiv.org/pdf/2403.19497
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Surface-based parcellation and vertex-wise analysis of ultra high-resolution ex vivo 7 tesla MRI in Alzheimer's disease and related dementias
🔹 Publication Date: Published on Mar 28, 2024
🔹 Abstract:
A new dataset of ex vivo MRI brain scans and an automated pipeline using the DKT atlas enable high-resolution, vertex-wise analysis for linking morphometry with histology in Alzheimer's disease research. AI-generated summary Magnetic resonance imaging (MRI) is the standard modality to understand human brain structure and function in vivo (antemortem). Decades of research in human neuroimaging has led to the widespread development of methods and tools to provide automated volume-based segmentations and surface-based parcellations which help localize brain functions to specialized anatomical regions. Recently ex vivo (postmortem) imaging of the brain has opened-up avenues to study brain structure at sub-millimeter ultra high-resolution revealing details not possible to observe with in vivo MRI . Unfortunately, there has been limited methodological development in ex vivo MRI primarily due to lack of datasets and limited centers with such imaging resources. Therefore, in this work, we present one-of-its-kind dataset of 82 ex vivo T2w whole brain hemispheres MRI at 0.3 mm isotropic resolution spanning Alzheimer's disease and related dementias. We adapted and developed a fast and easy-to-use automated surface-based pipeline to parcellate, for the first time, ultra high-resolution ex vivo brain tissue at the native subject space resolution using the Desikan-Killiany-Tourville (DKT) brain atlas . This allows us to perform vertex-wise analysis in the template space and thereby link morphometry measures with pathology measurements derived from histology . We will open-source our dataset docker container , Jupyter notebooks for ready-to-use out-of-the-box set of tools and command line options to advance ex vivo MRI clinical brain imaging research on the project webpage.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2403.19497
• PDF: https://arxiv.org/pdf/2403.19497
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Surface-based parcellation and vertex-wise analysis of ultra...
Magnetic resonance imaging (MRI) is the standard modality to understand human brain structure and function in vivo (antemortem). Decades of research in human neuroimaging has led to the widespread...
❤2🔥1
🔹 Title:
Seedance 1.0: Exploring the Boundaries of Video Generation Models
🔹 Publication Date: Published on Jun 10
🔹 Abstract:
Seedance 1.0 offers high-performance video generation by integrating advanced data curation, efficient architecture, post-training optimization, and model acceleration, resulting in superior quality and speed. AI-generated summary Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning , enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm , which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning , and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability , precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09113
• PDF: https://arxiv.org/pdf/2506.09113
• Project Page: https://seed.bytedance.com/seedance
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Seedance 1.0: Exploring the Boundaries of Video Generation Models
🔹 Publication Date: Published on Jun 10
🔹 Abstract:
Seedance 1.0 offers high-performance video generation by integrating advanced data curation, efficient architecture, post-training optimization, and model acceleration, resulting in superior quality and speed. AI-generated summary Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning , enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm , which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning , and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability , precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09113
• PDF: https://arxiv.org/pdf/2506.09113
• Project Page: https://seed.bytedance.com/seedance
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt...
❤3
Article Title:
GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval
Article Date: 5 Jun 2025
Article Description:
Large language models (LLMs)-based query expansion for information retrieval augments queries with generated hypothetical documents with LLMs. However, its performance relies heavily on the scale of the language models (LMs), necessitating larger, more advanced LLMs. This approach is costly, computationally intensive, and often has limited accessibility. To address these limitations, we introduce GOLFer - Smaller LMs-Generated Documents Hallucination Filter & Combiner - a novel method leveraging smaller open-source LMs for query expansion. GOLFer comprises two modules: a hallucination filter and a documents combiner. The former detects and removes non-factual and inconsistent sentences in generated documents, a common issue with smaller LMs, while the latter combines the filtered content with the query using a weight vector to balance their influence. We evaluate GOLFer alongside dominant LLM-based query expansion methods on three web search and ten low-resource datasets. Experimental results demonstrate that GOLFer consistently outperforms other methods using smaller LMs, and maintains competitive performance against methods using large-size LLMs, demonstrating its effectiveness.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.04762v1.pdf
GitHub:
• https://github.com/liuliuyuan6/GOLFer
Datasets:
• MS MARCO
• BEIR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
GOLFer: Smaller LM-Generated Documents Hallucination Filter & Combiner for Query Expansion in Information Retrieval
Article Date: 5 Jun 2025
Article Description:
Large language models (LLMs)-based query expansion for information retrieval augments queries with generated hypothetical documents with LLMs. However, its performance relies heavily on the scale of the language models (LMs), necessitating larger, more advanced LLMs. This approach is costly, computationally intensive, and often has limited accessibility. To address these limitations, we introduce GOLFer - Smaller LMs-Generated Documents Hallucination Filter & Combiner - a novel method leveraging smaller open-source LMs for query expansion. GOLFer comprises two modules: a hallucination filter and a documents combiner. The former detects and removes non-factual and inconsistent sentences in generated documents, a common issue with smaller LMs, while the latter combines the filtered content with the query using a weight vector to balance their influence. We evaluate GOLFer alongside dominant LLM-based query expansion methods on three web search and ten low-resource datasets. Experimental results demonstrate that GOLFer consistently outperforms other methods using smaller LMs, and maintains competitive performance against methods using large-size LLMs, demonstrating its effectiveness.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.04762v1.pdf
GitHub:
• https://github.com/liuliuyuan6/GOLFer
Datasets:
• MS MARCO
• BEIR
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title:
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
🔹 Publication Date: Published on Jan 8
🔹 Abstract:
rStar-Math enhances small language models' math reasoning capabilities through Monte Carlo Tree Search and self-evolution, achieving state-of-the-art performance on various benchmarks without distillation from larger models. AI-generated summary We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achieves this by exercising "deep thinking" through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model . rStar-Math introduces three innovations to tackle the challenges in training the two SLMs: (1) a novel code-augmented CoT data sythesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning trajectories used to train the policy SLM ; (2) a novel process reward model training method that avoids na\"ive step-level score annotation, yielding a more effective process preference model ( PPM ); (3) a self-evolution recipe in which the policy SLM and PPM are built from scratch and iteratively evolved to improve reasoning capabilities. Through 4 rounds of self-evolution with millions of synthesized solutions for 747k math problems, rStar-Math boosts SLMs' math reasoning to state-of-the-art levels. On the MATH benchmark , it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad ( AIME ), rStar-Math solves an average of 53.3% (8/15) of problems, ranking among the top 20% the brightest high school math students. Code and data will be available at https://github.com/microsoft/rStar.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2408.08152
• PDF: https://arxiv.org/pdf/2501.04519
• Github: https://github.com/microsoft/rStar/issues/9
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
🔹 Publication Date: Published on Jan 8
🔹 Abstract:
rStar-Math enhances small language models' math reasoning capabilities through Monte Carlo Tree Search and self-evolution, achieving state-of-the-art performance on various benchmarks without distillation from larger models. AI-generated summary We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achieves this by exercising "deep thinking" through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model . rStar-Math introduces three innovations to tackle the challenges in training the two SLMs: (1) a novel code-augmented CoT data sythesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning trajectories used to train the policy SLM ; (2) a novel process reward model training method that avoids na\"ive step-level score annotation, yielding a more effective process preference model ( PPM ); (3) a self-evolution recipe in which the policy SLM and PPM are built from scratch and iteratively evolved to improve reasoning capabilities. Through 4 rounds of self-evolution with millions of synthesized solutions for 747k math problems, rStar-Math boosts SLMs' math reasoning to state-of-the-art levels. On the MATH benchmark , it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad ( AIME ), rStar-Math solves an average of 53.3% (8/15) of problems, ranking among the top 20% the brightest high school math students. Code and data will be available at https://github.com/microsoft/rStar.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2408.08152
• PDF: https://arxiv.org/pdf/2501.04519
• Github: https://github.com/microsoft/rStar/issues/9
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
GitHub
GitHub - microsoft/rStar
Contribute to microsoft/rStar development by creating an account on GitHub.
❤3🔥1
🔹 Title:
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
🔹 Publication Date: Published on Jun 9
🔹 Abstract:
PolyVivid is a multi-subject video customization framework that uses text-image fusion, 3D-RoPE enhancement, attention-inherited identity injection, and MLLM-based data processing to ensure identity consistency and realistic video generation. AI-generated summary Despite recent advances in video generation, existing models still lack fine-grained controllability, especially for multi-subject customization with consistent identity and interaction. In this paper, we propose PolyVivid, a multi-subject video customization framework that enables flexible and identity-consistent generation. To establish accurate correspondences between subject images and textual entities, we design a VLLM-based text-image fusion module that embeds visual identities into the textual space for precise grounding. To further enhance identity preservation and subject interaction, we propose a 3D-RoPE-based enhancement module that enables structured bidirectional fusion between text and image embeddings. Moreover, we develop an attention-inherited identity injection module to effectively inject fused identity features into the video generation process, mitigating identity drift. Finally, we construct an MLLM-based data pipeline that combines MLLM-based grounding, segmentation, and a clique-based subject consolidation strategy to produce high-quality multi-subject data, effectively enhancing subject distinction and reducing ambiguity in downstream video generation. Extensive experiments demonstrate that PolyVivid achieves superior performance in identity fidelity , video realism , and subject alignment , outperforming existing open-source and commercial baselines.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.07848
• PDF: https://arxiv.org/pdf/2506.07848
• Project Page: https://sjtuplayer.github.io/projects/PolyVivid/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
🔹 Publication Date: Published on Jun 9
🔹 Abstract:
PolyVivid is a multi-subject video customization framework that uses text-image fusion, 3D-RoPE enhancement, attention-inherited identity injection, and MLLM-based data processing to ensure identity consistency and realistic video generation. AI-generated summary Despite recent advances in video generation, existing models still lack fine-grained controllability, especially for multi-subject customization with consistent identity and interaction. In this paper, we propose PolyVivid, a multi-subject video customization framework that enables flexible and identity-consistent generation. To establish accurate correspondences between subject images and textual entities, we design a VLLM-based text-image fusion module that embeds visual identities into the textual space for precise grounding. To further enhance identity preservation and subject interaction, we propose a 3D-RoPE-based enhancement module that enables structured bidirectional fusion between text and image embeddings. Moreover, we develop an attention-inherited identity injection module to effectively inject fused identity features into the video generation process, mitigating identity drift. Finally, we construct an MLLM-based data pipeline that combines MLLM-based grounding, segmentation, and a clique-based subject consolidation strategy to produce high-quality multi-subject data, effectively enhancing subject distinction and reducing ambiguity in downstream video generation. Extensive experiments demonstrate that PolyVivid achieves superior performance in identity fidelity , video realism , and subject alignment , outperforming existing open-source and commercial baselines.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.07848
• PDF: https://arxiv.org/pdf/2506.07848
• Project Page: https://sjtuplayer.github.io/projects/PolyVivid/
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal...
Despite recent advances in video generation, existing models still lack fine-grained controllability, especially for multi-subject customization with consistent identity and interaction. In this...
❤3
🔹 Title:
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
🔹 Publication Date: Published on Jun 10
🔹 Abstract:
Autoregressive Semantic Visual Reconstruction (ASVR) improves multimodal understanding by focusing on semantic reconstruction rather than raw visual appearance, enhancing performance across various benchmarks. AI-generated summary Typical large vision-language models (LVLMs) apply autoregressive supervision solely to textual sequences, without fully incorporating the visual modality into the learning process. This results in three key limitations: (1) an inability to utilize images without accompanying captions, (2) the risk that captions omit critical visual details, and (3) the challenge that certain vision-centric content cannot be adequately conveyed through text. As a result, current LVLMs often prioritize vision-to-language alignment while potentially overlooking fine-grained visual information. While some prior works have explored autoregressive image generation , effectively leveraging autoregressive visual supervision to enhance image understanding remains an open challenge. In this paper, we introduce Autoregressive Semantic Visual Reconstruction (ASVR), which enables joint learning of visual and textual modalities within a unified autoregressive framework. We show that autoregressively reconstructing the raw visual appearance of images does not enhance and may even impair multimodal understanding. In contrast, autoregressively reconstructing the semantic representation of images consistently improves comprehension. Notably, we find that even when models are given continuous image features as input, they can effectively reconstruct discrete semantic tokens , resulting in stable and consistent improvements across a wide range of multimodal understanding benchmarks. Our approach delivers significant performance gains across varying data scales (556k-2M) and types of LLM bacbones. Specifically, ASVR improves LLaVA-1.5 by 5% in average scores across 14 multimodal benchmarks. The code is available at https://github.com/AlenjandroWang/ASVR.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09040
• PDF: https://arxiv.org/pdf/2506.09040
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
🔹 Publication Date: Published on Jun 10
🔹 Abstract:
Autoregressive Semantic Visual Reconstruction (ASVR) improves multimodal understanding by focusing on semantic reconstruction rather than raw visual appearance, enhancing performance across various benchmarks. AI-generated summary Typical large vision-language models (LVLMs) apply autoregressive supervision solely to textual sequences, without fully incorporating the visual modality into the learning process. This results in three key limitations: (1) an inability to utilize images without accompanying captions, (2) the risk that captions omit critical visual details, and (3) the challenge that certain vision-centric content cannot be adequately conveyed through text. As a result, current LVLMs often prioritize vision-to-language alignment while potentially overlooking fine-grained visual information. While some prior works have explored autoregressive image generation , effectively leveraging autoregressive visual supervision to enhance image understanding remains an open challenge. In this paper, we introduce Autoregressive Semantic Visual Reconstruction (ASVR), which enables joint learning of visual and textual modalities within a unified autoregressive framework. We show that autoregressively reconstructing the raw visual appearance of images does not enhance and may even impair multimodal understanding. In contrast, autoregressively reconstructing the semantic representation of images consistently improves comprehension. Notably, we find that even when models are given continuous image features as input, they can effectively reconstruct discrete semantic tokens , resulting in stable and consistent improvements across a wide range of multimodal understanding benchmarks. Our approach delivers significant performance gains across varying data scales (556k-2M) and types of LLM bacbones. Specifically, ASVR improves LLaVA-1.5 by 5% in average scores across 14 multimodal benchmarks. The code is available at https://github.com/AlenjandroWang/ASVR.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09040
• PDF: https://arxiv.org/pdf/2506.09040
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
GitHub
GitHub - AlenjandroWang/ASVR: Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better - AlenjandroWang/ASVR
❤3
Article Title:
SkyReels-V2: Infinite-length Film Generative Model
Article Date: 17 Apr 2025
Article Description:
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming from general-purpose MLLMs' inability to interpret cinematic grammar, such as shot composition, actor expressions, and camera motions. These intertwined limitations hinder realistic long-form synthesis and professional film-style generation. To address these limitations, we propose SkyReels-V2, an Infinite-length Film Generative Model, that synergizes Multi-modal Large Language Model (MLLM), Multi-stage Pretraining, Reinforcement Learning, and Diffusion Forcing Framework. Firstly, we design a comprehensive structural representation of video that combines the general descriptions by the Multi-modal LLM and the detailed shot language by sub-expert models. Aided with human annotation, we then train a unified Video Captioner, named SkyCaptioner-V1, to efficiently label the video data. Secondly, we establish progressive-resolution pretraining for the fundamental video generation, followed by a four-stage post-training enhancement: Initial concept-balanced Supervised Fine-Tuning (SFT) improves baseline quality; Motion-specific Reinforcement Learning (RL) training with human-annotated and synthetic distortion data addresses dynamic artifacts; Our diffusion forcing framework with non-decreasing noise schedules enables long-video synthesis in an efficient search space; Final high-quality SFT refines visual fidelity. All the code and models are available at https://github.com/SkyworkAI/SkyReels-V2.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.13074v3.pdf
GitHub:
• https://github.com/skyworkai/skyreels-v2
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
SkyReels-V2: Infinite-length Film Generative Model
Article Date: 17 Apr 2025
Article Description:
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming from general-purpose MLLMs' inability to interpret cinematic grammar, such as shot composition, actor expressions, and camera motions. These intertwined limitations hinder realistic long-form synthesis and professional film-style generation. To address these limitations, we propose SkyReels-V2, an Infinite-length Film Generative Model, that synergizes Multi-modal Large Language Model (MLLM), Multi-stage Pretraining, Reinforcement Learning, and Diffusion Forcing Framework. Firstly, we design a comprehensive structural representation of video that combines the general descriptions by the Multi-modal LLM and the detailed shot language by sub-expert models. Aided with human annotation, we then train a unified Video Captioner, named SkyCaptioner-V1, to efficiently label the video data. Secondly, we establish progressive-resolution pretraining for the fundamental video generation, followed by a four-stage post-training enhancement: Initial concept-balanced Supervised Fine-Tuning (SFT) improves baseline quality; Motion-specific Reinforcement Learning (RL) training with human-annotated and synthetic distortion data addresses dynamic artifacts; Our diffusion forcing framework with non-decreasing noise schedules enables long-video synthesis in an efficient search space; Final high-quality SFT refines visual fidelity. All the code and models are available at https://github.com/SkyworkAI/SkyReels-V2.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.13074v3.pdf
GitHub:
• https://github.com/skyworkai/skyreels-v2
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title:
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
🔹 Publication Date: Published on Jun 11
🔹 Abstract:
ComfyUI-R1, a large reasoning model for automated workflow generation, demonstrates superior performance in creating AI art workflows through long chain-of-thought reasoning and reinforcement learning. AI-generated summary AI-generated content has evolved from monolithic models to modular workflows , particularly on platforms like ComfyUI , enabling customization in creative pipelines. However, crafting effective workflows requires great expertise to orchestrate numerous specialized components, presenting a steep learning curve for users. To address this challenge, we introduce ComfyUI -R1, the first large reasoning model for automated workflow generation . Starting with our curated dataset of 4K workflows, we construct long chain-of-thought (CoT) reasoning data, including node selection , workflow planning , and code-level workflow representation. ComfyUI -R1 is trained through a two-stage framework: (1) CoT fine-tuning for cold start, adapting models to the ComfyUI domain; (2) reinforcement learning for incentivizing reasoning capability, guided by a fine-grained rule-metric hybrid reward , ensuring format validity , structural integrity, and node-level fidelity . Experiments show that our 7B-parameter model achieves a 97\% format validity rate, along with high pass rate , node-level and graph-level F1 scores , significantly surpassing prior state-of-the-art methods that employ leading closed-source models such as GPT-4o and Claude series . Further analysis highlights the critical role of the reasoning process and the advantage of transforming workflows into code. Qualitative comparison reveals our strength in synthesizing intricate workflows with diverse nodes , underscoring the potential of long CoT reasoning in AI art creation.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09790
• PDF: https://arxiv.org/pdf/2506.09790
• Project Page: https://github.com/AIDC-AI/ComfyUI-Copilot
• Github: https://github.com/AIDC-AI/ComfyUI-Copilot
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
🔹 Publication Date: Published on Jun 11
🔹 Abstract:
ComfyUI-R1, a large reasoning model for automated workflow generation, demonstrates superior performance in creating AI art workflows through long chain-of-thought reasoning and reinforcement learning. AI-generated summary AI-generated content has evolved from monolithic models to modular workflows , particularly on platforms like ComfyUI , enabling customization in creative pipelines. However, crafting effective workflows requires great expertise to orchestrate numerous specialized components, presenting a steep learning curve for users. To address this challenge, we introduce ComfyUI -R1, the first large reasoning model for automated workflow generation . Starting with our curated dataset of 4K workflows, we construct long chain-of-thought (CoT) reasoning data, including node selection , workflow planning , and code-level workflow representation. ComfyUI -R1 is trained through a two-stage framework: (1) CoT fine-tuning for cold start, adapting models to the ComfyUI domain; (2) reinforcement learning for incentivizing reasoning capability, guided by a fine-grained rule-metric hybrid reward , ensuring format validity , structural integrity, and node-level fidelity . Experiments show that our 7B-parameter model achieves a 97\% format validity rate, along with high pass rate , node-level and graph-level F1 scores , significantly surpassing prior state-of-the-art methods that employ leading closed-source models such as GPT-4o and Claude series . Further analysis highlights the critical role of the reasoning process and the advantage of transforming workflows into code. Qualitative comparison reveals our strength in synthesizing intricate workflows with diverse nodes , underscoring the potential of long CoT reasoning in AI art creation.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09790
• PDF: https://arxiv.org/pdf/2506.09790
• Project Page: https://github.com/AIDC-AI/ComfyUI-Copilot
• Github: https://github.com/AIDC-AI/ComfyUI-Copilot
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
AI-generated content has evolved from monolithic models to modular workflows, particularly on platforms like ComfyUI, enabling customization in creative pipelines. However, crafting effective...
❤3
🔹 Title:
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
🔹 Publication Date: Published on Jun 11
🔹 Abstract:
ReasonMed, a large medical reasoning dataset, enhances the accuracy of medical question answering models by combining detailed reasoning paths with concise summaries, setting new benchmarks for model performance. AI-generated summary Though reasoning-based large language models ( LLMs ) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed , the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs . ReasonMed is constructed through a multi-agent verification and refinement process, where we design an Error Refiner to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed , we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B , which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09513
• PDF: https://arxiv.org/pdf/2506.09513
• Github: https://github.com/YuSun-Work/ReasonMed
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/YuSun-AI/ReasonMed
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
🔹 Publication Date: Published on Jun 11
🔹 Abstract:
ReasonMed, a large medical reasoning dataset, enhances the accuracy of medical question answering models by combining detailed reasoning paths with concise summaries, setting new benchmarks for model performance. AI-generated summary Though reasoning-based large language models ( LLMs ) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed , the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs . ReasonMed is constructed through a multi-agent verification and refinement process, where we design an Error Refiner to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed , we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B , which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09513
• PDF: https://arxiv.org/pdf/2506.09513
• Github: https://github.com/YuSun-Work/ReasonMed
🔹 Datasets citing this paper:
• https://huggingface.co/datasets/YuSun-AI/ReasonMed
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing...
Reasoning-based large language models have excelled in mathematics and programming, yet their potential in knowledge-intensive medical question answering remains underexplored and insufficiently...
❤1
🔹 Title:
EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
🔹 Publication Date: Published on Jun 12
🔹 Abstract:
EmbodiedGen is a platform that generates high-quality, photorealistic 3D assets at low cost, enabling scalable and realistic embodied AI research through generative AI techniques. AI-generated summary Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D computer graphics assets manually created and annotated, which suffer from high production costs and limited realism. These limitations significantly hinder the scalability of data driven approaches. We present EmbodiedGen, a foundational platform for interactive 3D world generation. It enables the scalable generation of high-quality , controllable and photorealistic 3D assets with accurate physical properties and real-world scale in the Unified Robotics Description Format (URDF) at low cost. These assets can be directly imported into various physics simulation engines for fine-grained physical control, supporting downstream tasks in training and evaluation. EmbodiedGen is an easy-to-use, full-featured toolkit composed of six key modules: Image-to-3D , Text-to-3D , Texture Generation , Articulated Object Generation, Scene Generation and Layout Generation . EmbodiedGen generates diverse and interactive 3D worlds composed of generative 3D assets , leveraging generative AI to address the challenges of generalization and evaluation to the needs of embodied intelligence related research. Code is available at https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.10600
• PDF: https://arxiv.org/pdf/2506.10600
• Project Page: https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html
• Github: https://github.com/HorizonRobotics/EmbodiedGen.git
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D
• https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen
• https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence
🔹 Publication Date: Published on Jun 12
🔹 Abstract:
EmbodiedGen is a platform that generates high-quality, photorealistic 3D assets at low cost, enabling scalable and realistic embodied AI research through generative AI techniques. AI-generated summary Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D computer graphics assets manually created and annotated, which suffer from high production costs and limited realism. These limitations significantly hinder the scalability of data driven approaches. We present EmbodiedGen, a foundational platform for interactive 3D world generation. It enables the scalable generation of high-quality , controllable and photorealistic 3D assets with accurate physical properties and real-world scale in the Unified Robotics Description Format (URDF) at low cost. These assets can be directly imported into various physics simulation engines for fine-grained physical control, supporting downstream tasks in training and evaluation. EmbodiedGen is an easy-to-use, full-featured toolkit composed of six key modules: Image-to-3D , Text-to-3D , Texture Generation , Articulated Object Generation, Scene Generation and Layout Generation . EmbodiedGen generates diverse and interactive 3D worlds composed of generative 3D assets , leveraging generative AI to address the challenges of generalization and evaluation to the needs of embodied intelligence related research. Code is available at https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.10600
• PDF: https://arxiv.org/pdf/2506.10600
• Project Page: https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html
• Github: https://github.com/HorizonRobotics/EmbodiedGen.git
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
• https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D
• https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen
• https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
🔹 Title:
Branched Schrödinger Bridge Matching
🔹 Publication Date: Published on Jun 10
🔹 Abstract:
BranchSBM, a novel generative modeling framework, extends Schr\"odinger Bridge Matching to model branched stochastic paths and multi-path evolution from a single initial distribution to multiple outcomes. AI-generated summary Predicting the intermediate trajectories between an initial and target distribution is a central problem in generative modeling. Existing approaches, such as flow matching and Schr\"odinger Bridge Matching , effectively learn mappings between two distributions by modeling a single stochastic path. However, these methods are inherently limited to unimodal transitions and cannot capture branched or divergent evolution from a common origin to multiple distinct outcomes. To address this, we introduce Branched Schr\"odinger Bridge Matching ( BranchSBM ), a novel framework that learns branched Schr\"odinger bridges. BranchSBM parameterizes multiple time-dependent velocity fields and growth processes , enabling the representation of population-level divergence into multiple terminal distributions. We show that BranchSBM is not only more expressive but also essential for tasks involving multi-path surface navigation, modeling cell fate bifurcations from homogeneous progenitor states, and simulating diverging cellular responses to perturbations .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09007
• PDF: https://arxiv.org/pdf/2506.09007
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Branched Schrödinger Bridge Matching
🔹 Publication Date: Published on Jun 10
🔹 Abstract:
BranchSBM, a novel generative modeling framework, extends Schr\"odinger Bridge Matching to model branched stochastic paths and multi-path evolution from a single initial distribution to multiple outcomes. AI-generated summary Predicting the intermediate trajectories between an initial and target distribution is a central problem in generative modeling. Existing approaches, such as flow matching and Schr\"odinger Bridge Matching , effectively learn mappings between two distributions by modeling a single stochastic path. However, these methods are inherently limited to unimodal transitions and cannot capture branched or divergent evolution from a common origin to multiple distinct outcomes. To address this, we introduce Branched Schr\"odinger Bridge Matching ( BranchSBM ), a novel framework that learns branched Schr\"odinger bridges. BranchSBM parameterizes multiple time-dependent velocity fields and growth processes , enabling the representation of population-level divergence into multiple terminal distributions. We show that BranchSBM is not only more expressive but also essential for tasks involving multi-path surface navigation, modeling cell fate bifurcations from homogeneous progenitor states, and simulating diverging cellular responses to perturbations .
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09007
• PDF: https://arxiv.org/pdf/2506.09007
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤7
Article Title:
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Article Date: 9 Apr 2024
Article Description:
The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants, not only excel in their respective categories but also demonstrate capabilities on par with 7B-13B LLMs. While focusing on SLMs, our approach exhibits scalability in both model and data dimensions for future LLM research. Regarding model scaling, we employ extensive model wind tunnel experiments for stable and optimal scaling. For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation. We present an in-depth analysis of the intriguing training dynamics that occurred in the WSD LRS. With WSD LRS, we are now able to efficiently study data-model scaling law without extensive retraining experiments on both axes of model and data, from which we derive the much higher compute optimal data-model ratio than Chinchilla Optimal. Additionally, we introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K, whose excellent performance further cementing MiniCPM's foundation in diverse SLM applications. MiniCPM models are available publicly at https://github.com/OpenBMB/MiniCPM .PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2404.06395v3.pdf
GitHub:
• https://github.com/openbmb/minicpm
• https://github.com/pwc-1/Paper-9/tree/main/2/minicpm
• https://github.com/pwc-1/Paper-5/tree/main/minicpm
Datasets:
• MML
• MMLU
• GSM8K
• MATH
• HumanEval
• HellaSwag
• C4
• MBPP
• MT-Bench
• BBH
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Article Date: 9 Apr 2024
Article Description:
The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants, not only excel in their respective categories but also demonstrate capabilities on par with 7B-13B LLMs. While focusing on SLMs, our approach exhibits scalability in both model and data dimensions for future LLM research. Regarding model scaling, we employ extensive model wind tunnel experiments for stable and optimal scaling. For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation. We present an in-depth analysis of the intriguing training dynamics that occurred in the WSD LRS. With WSD LRS, we are now able to efficiently study data-model scaling law without extensive retraining experiments on both axes of model and data, from which we derive the much higher compute optimal data-model ratio than Chinchilla Optimal. Additionally, we introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K, whose excellent performance further cementing MiniCPM's foundation in diverse SLM applications. MiniCPM models are available publicly at https://github.com/OpenBMB/MiniCPM .PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2404.06395v3.pdf
GitHub:
• https://github.com/openbmb/minicpm
• https://github.com/pwc-1/Paper-9/tree/main/2/minicpm
• https://github.com/pwc-1/Paper-5/tree/main/minicpm
Datasets:
• MML
• MMLU
• GSM8K
• MATH
• HumanEval
• HellaSwag
• C4
• MBPP
• MT-Bench
• BBH
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
GitHub
GitHub - OpenBMB/MiniCPM: MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning…
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks - OpenBMB/MiniCPM
❤1
Article Title:
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
Article Date: 20 Dec 2023
Article Description:
Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic "prompt engineering". We introduce LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems. We also propose strategies to use assertions at inference time for automatic self-refinement with LMs. We report on four diverse case studies for text generation and find that LM Assertions improve not only compliance with imposed rules but also downstream task performance, passing constraints up to 164% more often and generating up to 37% more higher-quality responses. Our reference implementation of LM Assertions is integrated into DSPy at https://github.com/stanfordnlp/dspyPDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2312.13382v2.pdf
GitHub:
• https://github.com/stanfordnlp/dspy
Datasets:
• HotpotQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
Article Date: 20 Dec 2023
Article Description:
Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic "prompt engineering". We introduce LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems. We also propose strategies to use assertions at inference time for automatic self-refinement with LMs. We report on four diverse case studies for text generation and find that LM Assertions improve not only compliance with imposed rules but also downstream task performance, passing constraints up to 164% more often and generating up to 37% more higher-quality responses. Our reference implementation of LM Assertions is integrated into DSPy at https://github.com/stanfordnlp/dspyPDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2312.13382v2.pdf
GitHub:
• https://github.com/stanfordnlp/dspy
Datasets:
• HotpotQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title:
Reparameterized LLM Training via Orthogonal Equivalence Transformation
🔹 Publication Date: Published on Jun 9
🔹 Abstract:
A new reParameterized training algorithm named POET uses Orthogonal Equivalence Transformation to optimize neurons, providing stable optimization and improved generalization for training large-scale neural networks including LLMs. AI-generated summary While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices , POET can stably optimize the objective function with improved generalization . We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.08001
• PDF: https://arxiv.org/pdf/2506.08001
• Project Page: https://spherelab.ai/poet/
• Github: https://github.com/Sphere-AI-Lab/poet
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Reparameterized LLM Training via Orthogonal Equivalence Transformation
🔹 Publication Date: Published on Jun 9
🔹 Abstract:
A new reParameterized training algorithm named POET uses Orthogonal Equivalence Transformation to optimize neurons, providing stable optimization and improved generalization for training large-scale neural networks including LLMs. AI-generated summary While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices , POET can stably optimize the objective function with improved generalization . We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.08001
• PDF: https://arxiv.org/pdf/2506.08001
• Project Page: https://spherelab.ai/poet/
• Github: https://github.com/Sphere-AI-Lab/poet
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤5