Article Title:
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Article Date: 27 Apr 2025
Article Description:
Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transforming them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we introduce a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper's companion Python toolkit, UQLM. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.19254v2.pdf
GitHub:
• https://github.com/cvs-health/uqlm
Datasets:
• GSM8K
• SVAMP
• PopQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Article Date: 27 Apr 2025
Article Description:
Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transforming them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we introduce a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper's companion Python toolkit, UQLM. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.19254v2.pdf
GitHub:
• https://github.com/cvs-health/uqlm
Datasets:
• GSM8K
• SVAMP
• PopQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤5
Article Title:
s3: You Don't Need That Much Data to Train a Search Agent via RL
Article Date: 20 May 2025
Article Description:
Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.14146v1.pdf
GitHub:
• https://github.com/pat-jj/s3
Datasets:
• Natural Questions
• TriviaQA
• HotpotQA
• MedQA
• PubMedQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
s3: You Don't Need That Much Data to Train a Search Agent via RL
Article Date: 20 May 2025
Article Description:
Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.14146v1.pdf
GitHub:
• https://github.com/pat-jj/s3
Datasets:
• Natural Questions
• TriviaQA
• HotpotQA
• MedQA
• PubMedQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤4
Please open Telegram to view this post
VIEW IN TELEGRAM
Update Telegram, now job seekers can advertise their expertise and job opportunities without having to go back to the channel owner
Article Title:
Vision as LoRA
Article Date: 26 Mar 2025
Article Description:
We introduce Vision as LoRA (VoRA), a novel paradigm for transforming an LLM into an MLLM. Unlike prevalent MLLM architectures that rely on external vision modules for vision encoding, VoRA internalizes visual capabilities by integrating vision-specific LoRA layers directly into the LLM. This design allows the added parameters to be seamlessly merged into the LLM during inference, eliminating structural complexity and minimizing computational overhead. Moreover, inheriting the LLM's ability of handling flexible context, VoRA can process inputs at arbitrary resolutions. To further strengthen VoRA's visual capabilities, we introduce a block-wise distillation method that transfers visual priors from a pre-trained ViT into the LoRA layers, effectively accelerating training by injecting visual knowledge. Additionally, we apply bi-directional attention masks to better capture the context information of an image. We successfully demonstrate that with additional pre-training data, VoRA can perform comparably with conventional encode-based MLLMs. All training data, codes, and model weights will be released at https://github.com/Hon-Wong/VoRA.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.20680v1.pdf
GitHub:
• https://github.com/hon-wong/vora
Datasets:
• MM-Vet
• Google Landmarks Dataset v2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Vision as LoRA
Article Date: 26 Mar 2025
Article Description:
We introduce Vision as LoRA (VoRA), a novel paradigm for transforming an LLM into an MLLM. Unlike prevalent MLLM architectures that rely on external vision modules for vision encoding, VoRA internalizes visual capabilities by integrating vision-specific LoRA layers directly into the LLM. This design allows the added parameters to be seamlessly merged into the LLM during inference, eliminating structural complexity and minimizing computational overhead. Moreover, inheriting the LLM's ability of handling flexible context, VoRA can process inputs at arbitrary resolutions. To further strengthen VoRA's visual capabilities, we introduce a block-wise distillation method that transfers visual priors from a pre-trained ViT into the LoRA layers, effectively accelerating training by injecting visual knowledge. Additionally, we apply bi-directional attention masks to better capture the context information of an image. We successfully demonstrate that with additional pre-training data, VoRA can perform comparably with conventional encode-based MLLMs. All training data, codes, and model weights will be released at https://github.com/Hon-Wong/VoRA.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.20680v1.pdf
GitHub:
• https://github.com/hon-wong/vora
Datasets:
• MM-Vet
• Google Landmarks Dataset v2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤7👏1
Article Title:
Harnessing the Universal Geometry of Embeddings
Article Date: 18 May 2025
Article Description:
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity across model pairs with different architectures, parameter counts, and training datasets. The ability to translate unknown embeddings into a different space while preserving their geometry has serious implications for the security of vector databases. An adversary with access only to embedding vectors can extract sensitive information about the underlying documents, sufficient for classification and attribute inference.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.12540v2.pdf
GitHub:
• https://github.com/rjha18/vec2vec
• https://github.com/zhaoolee/garss
Datasets:
• Natural Questions
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Harnessing the Universal Geometry of Embeddings
Article Date: 18 May 2025
Article Description:
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity across model pairs with different architectures, parameter counts, and training datasets. The ability to translate unknown embeddings into a different space while preserving their geometry has serious implications for the security of vector databases. An adversary with access only to embedding vectors can extract sensitive information about the underlying documents, sufficient for classification and attribute inference.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.12540v2.pdf
GitHub:
• https://github.com/rjha18/vec2vec
• https://github.com/zhaoolee/garss
Datasets:
• Natural Questions
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤6🔥1🎉1
Article Title:
MTGS: Multi-Traversal Gaussian Splatting
Article Date: 16 Mar 2025
Article Description:
Multi-traversal data, commonly collected through daily commutes or by self-driving fleets, provides multiple viewpoints for scene reconstruction within a road block. This data offers significant potential for high-quality novel view synthesis, which is crucial for applications such as autonomous vehicle simulators. However, inherent challenges in multi-traversal data often result in suboptimal reconstruction quality, including variations in appearance and the presence of dynamic objects. To address these issues, we propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data by modeling a shared static geometry while separately handling dynamic elements and appearance variations. Our method employs a multi-traversal dynamic scene graph with a shared static node and traversal-specific dynamic nodes, complemented by color correction nodes with learnable spherical harmonics coefficient residuals. This approach enables high-fidelity novel view synthesis and provides flexibility to navigate any viewpoint. We conduct extensive experiments on a large-scale driving dataset, nuPlan, with multi-traversal data. Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines. The code and data would be available to the public.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.12552v3.pdf
GitHub:
• https://github.com/OpenDriveLab/MTGS
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
MTGS: Multi-Traversal Gaussian Splatting
Article Date: 16 Mar 2025
Article Description:
Multi-traversal data, commonly collected through daily commutes or by self-driving fleets, provides multiple viewpoints for scene reconstruction within a road block. This data offers significant potential for high-quality novel view synthesis, which is crucial for applications such as autonomous vehicle simulators. However, inherent challenges in multi-traversal data often result in suboptimal reconstruction quality, including variations in appearance and the presence of dynamic objects. To address these issues, we propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data by modeling a shared static geometry while separately handling dynamic elements and appearance variations. Our method employs a multi-traversal dynamic scene graph with a shared static node and traversal-specific dynamic nodes, complemented by color correction nodes with learnable spherical harmonics coefficient residuals. This approach enables high-fidelity novel view synthesis and provides flexibility to navigate any viewpoint. We conduct extensive experiments on a large-scale driving dataset, nuPlan, with multi-traversal data. Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines. The code and data would be available to the public.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.12552v3.pdf
GitHub:
• https://github.com/OpenDriveLab/MTGS
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤5🎉1
Article Title:
ImgEdit: A Unified Image Editing Dataset and Benchmark
Article Date: 26 May 2025
Article Description:
Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.20275v1.pdf
GitHub:
• https://github.com/pku-yuangroup/imgedit
Datasets:
• MagicBrush
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
ImgEdit: A Unified Image Editing Dataset and Benchmark
Article Date: 26 May 2025
Article Description:
Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.20275v1.pdf
GitHub:
• https://github.com/pku-yuangroup/imgedit
Datasets:
• MagicBrush
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤6👏1
Article Title:
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Article Date: 24 May 2025
Article Description:
Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose \textbf{OmniConsistency}, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18445v1.pdf
GitHub:
• https://github.com/showlab/omniconsistency
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Article Date: 24 May 2025
Article Description:
Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose \textbf{OmniConsistency}, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18445v1.pdf
GitHub:
• https://github.com/showlab/omniconsistency
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤6
Article Title:
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Article Date: 27 Apr 2025
Article Description:
Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transforming them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we introduce a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper's companion Python toolkit, UQLM. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.19254v2.pdf
GitHub:
• https://github.com/cvs-health/uqlm
Datasets:
• GSM8K
• SVAMP
• PopQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Article Date: 27 Apr 2025
Article Description:
Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transforming them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we introduce a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper's companion Python toolkit, UQLM. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.19254v2.pdf
GitHub:
• https://github.com/cvs-health/uqlm
Datasets:
• GSM8K
• SVAMP
• PopQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤6
Please open Telegram to view this post
VIEW IN TELEGRAM
❤5
SWE-bench Goes Live
🖥 Github: https://github.com/microsoft/swe-bench-live
📕 Paper: https://arxiv.org/abs/2505.23419v1
🔗 Tasks: https://paperswithcode.com/dataset/humaneval
For more data science resources:
✓ https://t.iss.one/DataScienceT
🔗 Tasks: https://paperswithcode.com/dataset/humaneval
For more data science resources:
✓ https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
❤6
important resource
https://www.linkedin.com/posts/hussein-sheikho-4a8187246_a-comprehensive-cheat-sheet-for-working-with-activity-7337103606531186688-Nn0q
https://www.linkedin.com/posts/hussein-sheikho-4a8187246_a-comprehensive-cheat-sheet-for-working-with-activity-7337103606531186688-Nn0q
Linkedin
📖 A comprehensive cheat sheet for working with Polars. | Hussein Sheikho posted on the topic | LinkedIn
📖 A comprehensive cheat sheet for working with Polars.
✏️ This cheat sheet explains everything about Polars in a concise and simple way. Not just theory! But also a bunch of real examples, practical experience, and projects that will really help you in the…
✏️ This cheat sheet explains everything about Polars in a concise and simple way. Not just theory! But also a bunch of real examples, practical experience, and projects that will really help you in the…
❤1
Forwarded from Python | Machine Learning | Coding | R
This channels is for Programmers, Coders, Software Engineers.
0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages
✅ https://t.iss.one/addlist/8_rRW2scgfRhOTc0
✅ https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
❤3
Article Title:
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
Article Date: 21 Apr 2025
Article Description:
Camera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects. To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise control of both camera and human motion in video generation. Uni3C includes two key contributions. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController, which utilizes unprojected point clouds from monocular depth to achieve accurate camera control. By leveraging the strong 3D priors of point clouds and the powerful capacities of video foundational models, PCDController shows impressive generalization, performing well regardless of whether the inference backbone is frozen or fine-tuned. This flexibility enables different modules of Uni3C to be trained in specific domains, i.e., either camera control or human motion control, reducing the dependency on jointly annotated data. Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters to unify the control signals for camera and human motion, respectively. Extensive experiments confirm that PCDController enjoys strong robustness in driving camera motion for fine-tuned backbones of video generation. Uni3C substantially outperforms competitors in both camera controllability and human motion quality. Additionally, we collect tailored validation sets featuring challenging camera movements and human actions to validate the effectiveness of our method.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.14899v1.pdf
GitHub:
• https://github.com/ewrfcas/uni3c
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
Article Date: 21 Apr 2025
Article Description:
Camera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects. To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise control of both camera and human motion in video generation. Uni3C includes two key contributions. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController, which utilizes unprojected point clouds from monocular depth to achieve accurate camera control. By leveraging the strong 3D priors of point clouds and the powerful capacities of video foundational models, PCDController shows impressive generalization, performing well regardless of whether the inference backbone is frozen or fine-tuned. This flexibility enables different modules of Uni3C to be trained in specific domains, i.e., either camera control or human motion control, reducing the dependency on jointly annotated data. Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters to unify the control signals for camera and human motion, respectively. Extensive experiments confirm that PCDController enjoys strong robustness in driving camera motion for fine-tuned backbones of video generation. Uni3C substantially outperforms competitors in both camera controllability and human motion quality. Additionally, we collect tailored validation sets featuring challenging camera movements and human actions to validate the effectiveness of our method.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2504.14899v1.pdf
GitHub:
• https://github.com/ewrfcas/uni3c
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤7
Article Title:
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Article Date: 22 Feb 2025
Article Description:
Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4$\times$ improvement in correctly answering experimental questions.Curie is open-sourced at https://github.com/Just-Curieous/Curie.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2502.16069v1.pdf
GitHub:
• https://github.com/just-curieous/curie
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Article Date: 22 Feb 2025
Article Description:
Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4$\times$ improvement in correctly answering experimental questions.Curie is open-sourced at https://github.com/Just-Curieous/Curie.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2502.16069v1.pdf
GitHub:
• https://github.com/just-curieous/curie
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤4👍1🎉1
Article Title:
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality
Article Date: 23 May 2025
Article Description:
In Transformer architectures, tokens\textemdash discrete units derived from raw data\textemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input's essential information. Due to the quadratic computational complexity of transformer self-attention mechanisms, token reduction has primarily been used as an efficiency strategy. This is especially true in single vision and language domains, where it helps balance computational costs, memory usage, and inference latency. Despite these advances, this paper argues that token reduction should transcend its traditional efficiency-oriented role in the era of large generative models. Instead, we position it as a fundamental principle in generative modeling, critically influencing both model architecture and broader applications. Specifically, we contend that across vision, language, and multimodal systems, token reduction can: (i) facilitate deeper multimodal integration and alignment, (ii) mitigate "overthinking" and hallucinations, (iii) maintain coherence over long inputs, and (iv) enhance training stability, etc. We reframe token reduction as more than an efficiency measure. By doing so, we outline promising future directions, including algorithm design, reinforcement learning-guided token reduction, token optimization for in-context learning, and broader ML and scientific domains. We highlight its potential to drive new model architectures and learning strategies that improve robustness, increase interpretability, and better align with the objectives of generative modeling.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18227v1.pdf
GitHub:
• https://github.com/zlkong/awesome-token-compression-reduction
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality
Article Date: 23 May 2025
Article Description:
In Transformer architectures, tokens\textemdash discrete units derived from raw data\textemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input's essential information. Due to the quadratic computational complexity of transformer self-attention mechanisms, token reduction has primarily been used as an efficiency strategy. This is especially true in single vision and language domains, where it helps balance computational costs, memory usage, and inference latency. Despite these advances, this paper argues that token reduction should transcend its traditional efficiency-oriented role in the era of large generative models. Instead, we position it as a fundamental principle in generative modeling, critically influencing both model architecture and broader applications. Specifically, we contend that across vision, language, and multimodal systems, token reduction can: (i) facilitate deeper multimodal integration and alignment, (ii) mitigate "overthinking" and hallucinations, (iii) maintain coherence over long inputs, and (iv) enhance training stability, etc. We reframe token reduction as more than an efficiency measure. By doing so, we outline promising future directions, including algorithm design, reinforcement learning-guided token reduction, token optimization for in-context learning, and broader ML and scientific domains. We highlight its potential to drive new model architectures and learning strategies that improve robustness, increase interpretability, and better align with the objectives of generative modeling.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18227v1.pdf
GitHub:
• https://github.com/zlkong/awesome-token-compression-reduction
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤5👍1
Article Title:
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
Article Date: 29 May 2025
Article Description:
Today's AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would accelerate AI development and allow us to reap its benefits much sooner. Meta-learning can automate the discovery of novel algorithms, but is limited by first-order improvements and the human design of a suitable search space. The G\"odel machine proposed a theoretical alternative: a self-improving AI that repeatedly modifies itself in a provably beneficial manner. Unfortunately, proving that most changes are net beneficial is impossible in practice. We introduce the Darwin G\"odel Machine (DGM), a self-improving system that iteratively modifies its own code (thereby also improving its ability to modify its own codebase) and empirically validates each change using coding benchmarks. Inspired by Darwinian evolution and open-endedness research, the DGM maintains an archive of generated coding agents. It grows the archive by sampling an agent from it and using a foundation model to create a new, interesting, version of the sampled agent. This open-ended exploration forms a growing tree of diverse, high-quality agents and allows the parallel exploration of many different paths through the search space. Empirically, the DGM automatically improves its coding capabilities (e.g., better code editing tools, long-context window management, peer-review mechanisms), increasing performance on SWE-bench from 20.0% to 50.0%, and on Polyglot from 14.2% to 30.7%. Furthermore, the DGM significantly outperforms baselines without self-improvement or open-ended exploration. All experiments were done with safety precautions (e.g., sandboxing, human oversight). The DGM is a significant step toward self-improving AI, capable of gathering its own stepping stones along paths that unfold into endless innovation.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.22954v1.pdf
GitHub:
• https://github.com/jennyzzt/dgm
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
Article Date: 29 May 2025
Article Description:
Today's AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would accelerate AI development and allow us to reap its benefits much sooner. Meta-learning can automate the discovery of novel algorithms, but is limited by first-order improvements and the human design of a suitable search space. The G\"odel machine proposed a theoretical alternative: a self-improving AI that repeatedly modifies itself in a provably beneficial manner. Unfortunately, proving that most changes are net beneficial is impossible in practice. We introduce the Darwin G\"odel Machine (DGM), a self-improving system that iteratively modifies its own code (thereby also improving its ability to modify its own codebase) and empirically validates each change using coding benchmarks. Inspired by Darwinian evolution and open-endedness research, the DGM maintains an archive of generated coding agents. It grows the archive by sampling an agent from it and using a foundation model to create a new, interesting, version of the sampled agent. This open-ended exploration forms a growing tree of diverse, high-quality agents and allows the parallel exploration of many different paths through the search space. Empirically, the DGM automatically improves its coding capabilities (e.g., better code editing tools, long-context window management, peer-review mechanisms), increasing performance on SWE-bench from 20.0% to 50.0%, and on Polyglot from 14.2% to 30.7%. Furthermore, the DGM significantly outperforms baselines without self-improvement or open-ended exploration. All experiments were done with safety precautions (e.g., sandboxing, human oversight). The DGM is a significant step toward self-improving AI, capable of gathering its own stepping stones along paths that unfold into endless innovation.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.22954v1.pdf
GitHub:
• https://github.com/jennyzzt/dgm
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤4
Article Title:
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Article Date: 30 May 2025
Article Description:
Reinforcement learning (RL) has become a dominant paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous, alternating generation and training in a batch setting where rollouts in each training batch are generated by the same model. This approach stabilizes RL training but suffers from severe system-level inefficiency: generation must wait until the longest output in the batch is completed before model updates, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. AReaL also incorporates a collection of system-level optimizations, leading to substantially higher GPU utilization. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness, and adopts a staleness-enhanced PPO variant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2.77$\times$ training speedup compared to synchronous systems with the same number of GPUs and matched or improved final performance. The code of AReaL is available at https://github.com/inclusionAI/AReaL/.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.24298v2.pdf
GitHub:
• https://github.com/inclusionai/areal
Datasets:
• MATH
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Article Date: 30 May 2025
Article Description:
Reinforcement learning (RL) has become a dominant paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous, alternating generation and training in a batch setting where rollouts in each training batch are generated by the same model. This approach stabilizes RL training but suffers from severe system-level inefficiency: generation must wait until the longest output in the batch is completed before model updates, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. AReaL also incorporates a collection of system-level optimizations, leading to substantially higher GPU utilization. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness, and adopts a staleness-enhanced PPO variant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2.77$\times$ training speedup compared to synchronous systems with the same number of GPUs and matched or improved final performance. The code of AReaL is available at https://github.com/inclusionAI/AReaL/.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.24298v2.pdf
GitHub:
• https://github.com/inclusionai/areal
Datasets:
• MATH
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤4🙏1