Data Science | Machine Learning with Python for Researchers
31.8K subscribers
2.08K photos
102 videos
22 files
2.36K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
Article Title:
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Article Date: 30 Mar 2021

Article Description:
Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data-driven models over vector and matrix features. They are more predictive than formulaic alphas, but are too complex to mine into a weakly correlated set. In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes. The new alphas predict returns with high accuracy and can be mined into a weakly correlated set. In addition, we propose a novel alpha mining framework based on AutoML, called AlphaEvolve, to generate the new alphas. To this end, we first propose operators for generating the new alphas and selectively injecting relational domain knowledge to model the relations between stocks. We then accelerate the alpha mining by proposing a pruning technique for redundant alphas. Experiments show that AlphaEvolve can evolve initial alphas into the new alphas with high returns and weak correlations.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2103.16196v2.pdf

GitHub:
https://github.com/codelion/openevolve

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
4
Article Title:
Aligning Multimodal LLM with Human Preference: A Survey

Article Date: 18 Mar 2025

Article Description:
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2503.14504v1.pdf

GitHub:
https://github.com/bradyfu/awesome-multimodal-large-language-models

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
3
Article Title:
A Survey of LLM $\times$ DATA

Article Date: 24 May 2025

Article Description:
The integration of large language model (LLM) and data management (DATA) is rapidly redefining both domains. In this survey, we comprehensively review the bidirectional relationships. On the one hand, DATA4LLM, spanning large-scale data processing, storage, and serving, feeds LLMs with high quality, diversity, and timeliness of data required for stages like pre-training, post-training, retrieval-augmented generation, and agentic workflows: (i) Data processing for LLMs includes scalable acquisition, deduplication, filtering, selection, domain mixing, and synthetic augmentation; (ii) Data Storage for LLMs focuses on efficient data and model formats, distributed and heterogeneous storage hierarchies, KV-cache management, and fault-tolerant checkpointing; (iii) Data serving for LLMs tackles challenges in RAG (e.g., knowledge post-processing), LLM inference (e.g., prompt compression, data provenance), and training strategies (e.g., data packing and shuffling). On the other hand, in LLM4DATA, LLMs are emerging as general-purpose engines for data management. We review recent advances in (i) data manipulation, including automatic data cleaning, integration, discovery; (ii) data analysis, covering reasoning over structured, semi-structured, and unstructured data, and (iii) system optimization (e.g., configuration tuning, query rewriting, anomaly diagnosis), powered by LLM techniques like retrieval-augmented prompting, task-specialized fine-tuning, and multi-agent collaboration.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.18458v3.pdf

GitHub:
https://github.com/weaidb/awsome-data-llm
https://github.com/weaidb/awesome-data-llm

Datasets:
• HumanEval
• C4
• CCNet
• MassiveText
==================================

For more data science resources:

https://t.iss.one/DataScienceT
2
Article Title:
VACE: All-in-One Video Creation and Editing

Article Date: 10 Mar 2025

Article Description:
Diffusion Transformer has demonstrated powerful capability and scalability in generating high-quality images and videos. Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation. However, due to the intrinsic demands for consistency across both temporal and spatial dynamics, achieving a unified approach for video synthesis remains challenging. We introduce VACE, which enables users to perform Video tasks within an All-in-one framework for Creation and Editing. These tasks include reference-to-video generation, video-to-video editing, and masked video-to-video editing. Specifically, we effectively integrate the requirements of various tasks by organizing video task inputs, such as editing, reference, and masking, into a unified interface referred to as the Video Condition Unit (VCU). Furthermore, by utilizing a Context Adapter structure, we inject different task concepts into the model using formalized representations of temporal and spatial dimensions, allowing it to handle arbitrary video synthesis tasks flexibly. Extensive experiments demonstrate that the unified model of VACE achieves performance on par with task-specific models across various subtasks. Simultaneously, it enables diverse applications through versatile task combinations. Project page: https://ali-vilab.github.io/VACE-Page/.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2503.07598v1.pdf

GitHub:
https://github.com/wan-video/wan2.1
https://github.com/ali-vilab/vace

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
2
Article Title:
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Article Date: Fiona Ryan

Article Description:
We address the problem of gaze target estimation, which aims to predict where a person is looking in a scene. Predicting a person's gaze target requires reasoning both about the person's appearance and the contents of the scene. Prior works have developed increasingly complex, hand-crafted pipelines for gaze target estimation that carefully fuse features from separate scene encoders, head encoders, and auxiliary models for signals like depth and pose. Motivated by the success of general-purpose feature extractors on a variety of visual tasks, we propose Gaze-LLE, a novel transformer framework that streamlines gaze target estimation by leveraging features from a frozen DINOv2 encoder. We extract a single feature representation for the scene, and apply a person-specific positional prompt to decode gaze with a lightweight module. We demonstrate state-of-the-art performance across several gaze benchmarks and provide extensive analysis to validate our design choices. Our code is available at: https://github.com/fkryan/gazelle .PDFAbstractCVPR 2025 PDFCVPR 2025 Abstract

PDF Download Link:
https://arxiv.org/pdf/2412.09586v1.pdf

GitHub:
https://github.com/fkryan/gazelle

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1
🔹 Title:
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

🔹 Publication Date: Published on Jun 8

🔹 Abstract:
Frame Guidance offers a training-free method for controlling video generation using frame-level signals, reducing memory usage and enhancing globally coherent video output. AI-generated summary Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale video models for specific tasks, which becomes increasingly impractical as model sizes continue to grow. In this work, we present Frame Guidance, a training-free guidance for controllable video generation based on frame-level signals , such as keyframes , style reference images, sketches , or depth maps . For practical training-free guidance, we propose a simple latent processing method that dramatically reduces memory usage, and apply a novel latent optimization strategy designed for globally coherent video generation . Frame Guidance enables effective control across diverse tasks, including keyframe guidance , stylization , and looping , without any training, compatible with any video models . Experimental results show that Frame Guidance can produce high-quality controlled videos for a wide range of tasks and input signals.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.07177
• PDF: https://arxiv.org/pdf/2506.07177
• Github: https://frame-guidance-video.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1
🔹 Title:
PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

🔹 Publication Date: Published on Jun 26

🔹 Abstract:
A physics-based skinning and rigging framework called PhysRig uses volumetric representation and continuum mechanics for more realistic and physically plausible animations. AI-generated summary Skinning and rigging are fundamental components in animation, articulated object reconstruction, motion transfer, and 4D generation. Existing approaches predominantly rely on Linear Blend Skinning (LBS) , due to its simplicity and differentiability. However, LBS introduces artifacts such as volume loss and unnatural deformations, and it fails to model elastic materials like soft tissues, fur, and flexible appendages (e.g., elephant trunks, ears, and fatty tissues). In this work, we propose PhysRig: a differentiable physics-based skinning and rigging framework that overcomes these limitations by embedding the rigid skeleton into a volumetric representation (e.g., a tetrahedral mesh ), which is simulated as a deformable soft-body structure driven by the animated skeleton. Our method leverages continuum mechanics and discretizes the object as particles embedded in an Eulerian background grid to ensure differentiability with respect to both material properties and skeletal motion. Additionally, we introduce material prototypes , significantly reducing the learning space while maintaining high expressiveness. To evaluate our framework, we construct a comprehensive synthetic dataset using meshes from Objaverse, The Amazing Animals Zoo, and MixaMo, covering diverse object categories and motion patterns. Our method consistently outperforms traditional LBS-based approaches, generating more realistic and physically plausible results. Furthermore, we demonstrate the applicability of our framework in the pose transfer task highlighting its versatility for articulated object modeling.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.20936
• PDF: https://arxiv.org/pdf/2506.20936
• Project Page: https://physrig.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
4
🔹 Title:
Scaling Test-time Compute for LLM Agents

🔹 Publication Date: Published on Jun 15

🔹 Abstract:
Systematic exploration of test-time scaling methods in large language agents reveals that computational scaling improves performance, especially through parallel sampling, sequential revision, effective verification, and increased rollout diversity. AI-generated summary Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms ; (2) sequential revision strategies; (3) verifiers and merging methods ; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.12928
• PDF: https://arxiv.org/pdf/2506.12928

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
5
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
Article Title:
SymbolicAI: A framework for logic-based approaches combining generative models and solvers

Article Date: 1 Feb 2024

Article Description:
We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. SymbolicAI enables the seamless integration of generative models with a diverse range of solvers by treating large language models (LLMs) as semantic parsers that execute tasks based on both natural and formal language instructions, thus bridging the gap between symbolic reasoning and generative AI. We leverage probabilistic programming principles to tackle complex tasks, and utilize differentiable and classical programming paradigms with their respective strengths. The framework introduces a set of polymorphic, compositional, and self-referential operations for multi-modal data that connects multi-step generative processes and aligns their outputs with user objectives in complex workflows. As a result, we can transition between the capabilities of various foundation models with in-context learning capabilities and specialized, fine-tuned models or solvers proficient in addressing specific problems. Through these operations based on in-context learning our framework enables the creation and evaluation of explainable computational graphs. Finally, we introduce a quality measure and its empirical score for evaluating these computational graphs, and propose a benchmark that compares various state-of-the-art LLMs across a set of complex workflows. We refer to the empirical score as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. The framework codebase and benchmark are linked below.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2402.00854v4.pdf

GitHub:
https://github.com/ExtensityAI/symbolicai
https://github.com/extensityai/benchmark
https://github.com/xpitfire/symbolicai

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
8
Article Title:
Smaller But Better: Unifying Layout Generation with Smaller Large Language Models

Article Date: 19 Feb 2025

Article Description:
We propose LGGPT, an LLM-based model tailored for unified layout generation. First, we propose Arbitrary Layout Instruction (ALI) and Universal Layout Response (ULR) as the uniform I/O template. ALI accommodates arbitrary layout generation task inputs across multiple layout domains, enabling LGGPT to unify both task-generic and domain-generic layout generation hitherto unexplored. Collectively, ALI and ULR boast a succinct structure that forgoes superfluous tokens typically found in existing HTML-based formats, facilitating efficient instruction tuning and boosting unified generation performance. In addition, we propose an Interval Quantization Encoding (IQE) strategy that compresses ALI into a more condensed structure. IQE precisely preserves valid layout clues while eliminating the less informative placeholders, facilitating LGGPT to capture complex and variable layout generation conditions during the unified training process. Experimental results demonstrate that LGGPT achieves superior or on par performance compared to existing methods. Notably, LGGPT strikes a prominent balance between proficiency and efficiency with a compact 1.5B parameter LLM, which beats prior 7B or 175B models even in the most extensive and challenging unified scenario. Furthermore, we underscore the necessity of employing LLMs for unified layout generation and suggest that 1.5B could be an optimal parameter size by comparing LLMs of varying scales. Code is available at https://github.com/NiceRingNode/LGGPT.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2502.14005v1.pdf

GitHub:
https://github.com/niceringnode/lggpt

Datasehttps://t.iss.one/DataScienceTts:
• PubLayNet
==================================

For more data science resources:

https://t.iss.one/DataScienceT
3
Article Title:
Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science

Article Date: 4 Jun 2025

Article Description:
Contemporary approaches to assisted scientific discovery use language models to automatically generate large numbers of potential hypothesis to test, while also automatically generating code-based experiments to test those hypotheses. While hypotheses can be comparatively inexpensive to generate, automated experiments can be costly, particularly when run at scale (i.e. thousands of experiments). Developing the capacity to filter hypotheses based on their feasibility would allow discovery systems to run at scale, while increasing their likelihood of making significant discoveries. In this work we introduce Matter-of-Fact, a challenge dataset for determining the feasibility of hypotheses framed as claims. Matter-of-Fact includes 8.4k claims extracted from scientific articles spanning four high-impact contemporary materials science topics, including superconductors, semiconductors, batteries, and aerospace materials, while including qualitative and quantitative claims from theoretical, experimental, and code/simulation results. We show that strong baselines that include retrieval augmented generation over scientific literature and code generation fail to exceed 72% performance on this task (chance performance is 50%), while domain-expert verification suggests nearly all are solvable -- highlighting both the difficulty of this task for current models, and the potential to accelerate scientific discovery by making near-term progress.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2506.04410v1.pdf

GitHub:
https://github.com/cognitiveailab/matter-of-fact

Datasets:
• COVID-Fact
==================================

For more data science resources:

https://t.iss.one/DataScienceT
4
Article Title:
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

Article Date: 23 Aug 2023

Article Description:
Despite the progress of foundation models, knowledge-based reasoning remains a persistent challenge due to their limited capacity for knowledge recall and inference. Existing methods primarily focus on encouraging these models to plan and solve problems or extensively sample reasoning chains independently. However, these methods often overlook conceptual errors and inferential fallacies, inevitably leading to a series of notorious issues such as misleading conclusions, cognitive biases, and reduced decision quality. While explicit modeling of causality is argued to hold promise in addressing these issues, contemporary research efforts have thus far fallen short in achieving causality-based foundation models. Drawing inspiration from the orchestration of diverse specialized agents collaborating to tackle intricate tasks, we propose a framework named Causal-Consistency Chain-of-Thought (CaCo-CoT) that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models, involving a set of reasoners and evaluators. These agents collaboratively work within a reasoning-and-consensus paradigm to improve faithfulness. The reasoners are tasked with generating reasoning chains for knowledge-intensive problems by mimicking human causal reasoning. Meanwhile, the evaluator scrutinizes the causal consistency of a reasoner's reasoning chain from a non-causal and a counterfactual perspective. Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations across text-based and multi-modal knowledge reasoning tasks (e.g., science question answering and commonsense reasoning).PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2308.11914v4.pdf

GitHub:
https://github.com/hcplab-sysu/causalvlr
https://github.com/hcplab-sysu/causal-vlreasoning

Datasets:
• BoolQ
• ScienceQA
• Com2Sense
==================================

For more data science resources:

https://t.iss.one/DataScienceT
3
🔹 Title:
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models

🔹 Publication Date: Published on Jun 24

🔹 Abstract:
Outlier-Safe Pre-Training improves large language model quantization performance by preventing extreme activation outliers through innovative training techniques. AI-generated summary Extreme activation outliers in Large Language Models (LLMs) critically degrade quantization performance , hindering efficient on-device deployment. While channel-wise operations and adaptive gradient scaling are recognized causes, practical mitigation remains challenging. We introduce Outlier-Safe Pre-Training (OSP), a practical guideline that proactively prevents outlier formation rather than relying on post-hoc mitigation. OSP combines three key innovations: (1) the Muon optimizer , eliminating privileged bases while maintaining training efficiency; (2) Single-Scale RMSNorm , preventing channel-wise amplification; and (3) a learnable embedding projection , redistributing activation magnitudes originating from embedding matrices. We validate OSP by training a 1.4B-parameter model on 1 trillion tokens, which is the first production-scale LLM trained without such outliers. Under aggressive 4-bit quantization, our OSP model achieves a 35.7 average score across 10 benchmarks (compared to 26.5 for an Adam-trained model), with only a 2% training overhead. Remarkably, OSP models exhibit near-zero excess kurtosis (0.04) compared to extreme values (1818.56) in standard models, fundamentally altering LLM quantization behavior. Our work demonstrates that outliers are not inherent to LLMs but are consequences of training strategies, paving the way for more efficient LLM deployment . The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Outlier-Safe-Pre-Training.

🔹 Links:
• arXiv Page: https://arxivexplained.com/papers/outlier-safe-pre-training-for-robust-4-bit-quantization-of-large-language-models
• PDF: https://arxiv.org/pdf/2506.19697
• Project Page: https://huggingface.co/papers?q=learnable%20embedding%20projection
• Github: https://github.com/dmis-lab/Outlier-Safe-Pre-Training

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1
Article Title:
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Article Date: 5 Jun 2025

Article Description:
LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are inherently asynchronous. For example, in group chats, online team meetings, or social games, there is no inherent notion of turns; therefore, the decision of when to speak forms a crucial part of the participant's decision making. In this work, we develop an adaptive asynchronous LLM-agent which, in addition to determining what to say, also decides when to say it. To evaluate our agent, we collect a unique dataset of online Mafia games, including both human participants, as well as our asynchronous agent. Overall, our agent performs on par with human players, both in game performance, as well as in its ability to blend in with the other human players. Our analysis shows that the agent's behavior in deciding when to speak closely mirrors human patterns, although differences emerge in message content. We release all our data and code to support and encourage further research for more realistic asynchronous communication between LLM agents. This work paves the way for integration of LLMs into realistic human group settings, from assistance in team discussions to educational and professional environments where complex social dynamics must be navigated.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2506.05309v1.pdf

GitHub:
https://github.com/niveck/LLMafia

Datasets:
• LLMafia
==================================

For more data science resources:

https://t.iss.one/DataScienceT
3
Article Title:
Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Article Date: 28 May 2025

Article Description:
Audio-driven human animation methods, such as talking head and talking body generation, have made remarkable progress in generating synchronized facial movements and appealing visual quality videos. However, existing methods primarily focus on single human animation and struggle with multi-stream audio inputs, facing incorrect binding problems between audio and persons. Additionally, they exhibit limitations in instruction-following capabilities. To solve this problem, in this paper, we propose a novel task: Multi-Person Conversational Video Generation, and introduce a new framework, MultiTalk, to address the challenges during multi-person generation. Specifically, for audio injection, we investigate several schemes and propose the Label Rotary Position Embedding (L-RoPE) method to resolve the audio and person binding problem. Furthermore, during training, we observe that partial parameter training and multi-task training are crucial for preserving the instruction-following ability of the base model. MultiTalk achieves superior performance compared to other methods on several datasets, including talking head, talking body, and multi-person datasets, demonstrating the powerful generation capabilities of our approach.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.22647v1.pdf

GitHub:
https://github.com/meigen-ai/multitalk

Datasets:
• CelebV-HQ
==================================

For more data science resources:

https://t.iss.one/DataScienceT
4
🔹 Title:
SAFE: Multitask Failure Detection for Vision-Language-Action Models

🔹 Publication Date: Published on Jun 11

🔹 Abstract:
SAFE is a failure detector for vision-language-action models that generalizes to unseen tasks by learning from high-level internal features of the models. AI-generated summary While vision-language-action models ( VLAs ) have shown promising robotic behaviors across a diverse set of manipulation tasks, they achieve limited success rates when deployed on novel tasks out-of-the-box. To allow these policies to safely interact with their environments, we need a failure detector that gives a timely alert such that the robot can stop, backtrack, or ask for help. However, existing failure detector s are trained and tested only on one or a few specific tasks, while VLAs require the detector to generalize and detect failures also in unseen tasks and novel environments. In this paper, we introduce the multitask failure detection problem and propose SAFE, a failure detector for generalist robot policies such as VLAs . We analyze the VLA feature space and find that VLAs have sufficient high-level knowledge about task success and failure, which is generic across different tasks. Based on this insight, we design SAFE to learn from VLA internal features and predict a single scalar indicating the likelihood of task failure. SAFE is trained on both successful and failed rollout s, and is evaluated on unseen tasks. SAFE is compatible with different policy architectures. We test it on OpenVLA, pi_0, and pi_0-FAST in both simulated and real-world environments extensively. We compare SAFE with diverse baselines and show that SAFE achieves state-of-the-art failure detection performance and the best trade-off between accuracy and detection time using conformal prediction . More qualitative results can be found at https://vla-safe.github.io/.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.09937
• PDF: https://arxiv.org/pdf/2506.09937
• Github: https://vla-safe.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:

https://t.iss.one/DataScienceT
2
Article Title:


Article Date: Zhaoxuan Lu

Article Description:
The rapid advancement of autonomous driving systems has created a pressing need for accurate and robust lane detection to ensure driving safety and reliability. However, lane detection still faces several critical challenges in real-world scenarios: (1) severe occlusions caused by urban traffic and complex road layouts; (2) the difficulty of handling sharp curves and large curvature variations; and (3) varying lighting conditions that blur or degrade lane markings. To address these challenges, we propose DLNet, a novel direction-aware feature integration framework that integrates both low-level geometric details and high-level semantic cues. In particular, the approach includes:
(i) a Multi-Skip Feature Attention Block (MSFAB) to refine local lane features by adaptively fusing multi-scale representations,
(ii) a Context-Aware Feature Pyramid Network (CAFPN) to enhance global context modeling under adverse conditions, and
(iii) a Directional Lane IoU (DLIoU) loss function that explicitly encodes lane directionality and curvature, providing more accurate lane overlap estimation. Extensive experiments conducted on two benchmark datasets, CULane and CurveLanes, show DLNet achieves new state-of-the-art results, with F150 and F175 scores of 81.23% and 64.75% on CULane, an F150 score of 86.51% on CurveLanes and a high F1 score of 97.62 on the TUSimple dataset. The source code and pretrained models will be made publicly available at
https://github.com/RDXiaoLu/DLNet.git.PDFPrepare for 2025 PDFPrepare for 2025 Abstract

PDF Download Link:
Not Available

GitHub:
https://github.com/RDXiaoLu/DLNet
https://github.com/RDXiaoLu/DLNet.git
https://github.com/RDXiaoLu/DLNet/tree/main

Datasets:
• CULane
==================================

For more data science resources:

https://t.iss.one/DataScienceT
Article Title:
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding

Article Date: 9 Jun 2025

Article Description:
Large Language Models (LLMs) require alignment with human preferences to avoid generating offensive, false, or meaningless content. Recently, low-resource methods for LLM alignment have been popular, while still facing challenges in obtaining both high-quality and aligned content. Motivated by the observation that the difficulty of generating aligned responses is concentrated at the beginning of decoding, we propose a novel framework, Weak-to-Strong Decoding (WSD), to enhance the alignment ability of base models by the guidance of a small aligned model. The small model first drafts well-aligned beginnings, followed by the large base model to continue the rest, controlled by a well-designed auto-switch mechanism. We also collect a new dataset, GenerAlign, to fine-tune a small-sized Pilot-3B as the draft model, which effectively enhances different base models under the WSD framework to outperform all baseline methods, while avoiding degradation on downstream tasks, termed as the alignment tax. Extensive experiments are further conducted to examine the impact of different settings and time efficiency, as well as analyses on the intrinsic mechanisms of WSD in depth.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2506.07434v1.pdf

GitHub:
https://github.com/F2-Song/Weak-to-Strong-Decoding

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
2
Article Title:


Article Date: 30 Sep 2022

Article Description:
Recent Vision-Language Pre-trained (VLP) models based on dual encoder have attracted extensive attention from academia and industry due to their superior performance on various cross-modal tasks and high computational efficiency. They attempt to learn cross-modal representation using contrastive learning on image-text pairs, however, the built inter-modal correlations only rely on a single view for each modality. Actually, an image or a text contains various potential views, just as humans could capture a real-world scene via diverse descriptions or photos. In this paper, we propose ERNIE-ViL 2.0, a Multi-View Contrastive learning framework to build intra-modal and inter-modal correlations between diverse views simultaneously, aiming at learning a more robust cross-modal representation. Specifically, we construct multiple views within each modality to learn the intra-modal correlation for enhancing the single-modal representation. Besides the inherent visual/textual views, we construct sequences of object tags as a special textual view to narrow the cross-modal semantic gap on noisy image-text pairs. Pre-trained with 29M publicly available datasets, ERNIE-ViL 2.0 achieves competitive results on English cross-modal retrieval. Additionally, to generalize our method to Chinese cross-modal tasks, we train ERNIE-ViL 2.0 through scaling up the pre-training datasets to 1.5B Chinese image-text pairs, resulting in significant improvements compared to previous SOTA results on Chinese cross-modal retrieval. We release our pre-trained models in https://github.com/PaddlePaddle/ERNIE.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2209.15270v1.pdf

GitHub:
https://github.com/PaddlePaddle/ERNIE

Datasets:
• COCO (Common Objects in Context)
• Flickr30k
• CC12M
• COCO-CN
==================================

For more data science resources:

https://t.iss.one/DataScienceT