Data Science | Machine Learning with Python for Researchers
31.5K subscribers
1.6K photos
102 videos
22 files
1.87K links
Admin: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
Article Title:
Zero-Shot Vision Encoder Grafting via LLM Surrogates

Article Date: 28 May 2025

Article Description:
Vision language models (VLMs) typically pair a modestly sized vision encoder with a large language model (LLM), e.g., Llama-70B, making the decoder the primary computational burden during training. To reduce costs, a potential promising strategy is to first train the vision encoder using a small language model before transferring it to the large one. We construct small "surrogate models" that share the same embedding space and representation language as the large target LLM by directly inheriting its shallow layers. Vision encoders trained on the surrogate can then be directly transferred to the larger model, a process we call zero-shot grafting -- when plugged directly into the full-size target LLM, the grafted pair surpasses the encoder-surrogate pair and, on some benchmarks, even performs on par with full decoder training with the target LLM. Furthermore, our surrogate training approach reduces overall VLM training costs by ~45% when using Llama-70B as the decoder.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.22664v1.pdf

GitHub:
https://github.com/facebookresearch/zero

Datasets:
• No datasets information available
==================================
For more data science resources:
https://t.iss.one/DataScienceT
1
Article Title:
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Article Date: 30 Mar 2021

Article Description:
Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data-driven models over vector and matrix features. They are more predictive than formulaic alphas, but are too complex to mine into a weakly correlated set. In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes. The new alphas predict returns with high accuracy and can be mined into a weakly correlated set. In addition, we propose a novel alpha mining framework based on AutoML, called AlphaEvolve, to generate the new alphas. To this end, we first propose operators for generating the new alphas and selectively injecting relational domain knowledge to model the relations between stocks. We then accelerate the alpha mining by proposing a pruning technique for redundant alphas. Experiments show that AlphaEvolve can evolve initial alphas into the new alphas with high returns and weak correlations.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2103.16196v2.pdf

GitHub:
https://github.com/codelion/openevolve

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
1👏1
Article Title:
GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models

Article Date: 16 May 2025

Article Description:
We propose the first unified adversarial attack benchmark for Genomic Foundation Models (GFMs), named GenoArmory. Unlike existing GFM benchmarks, GenoArmory offers the first comprehensive evaluation framework to systematically assess the vulnerability of GFMs to adversarial attacks. Methodologically, we evaluate the adversarial robustness of five state-of-the-art GFMs using four widely adopted attack algorithms and three defense strategies. Importantly, our benchmark provides an accessible and comprehensive framework to analyze GFM vulnerabilities with respect to model architecture, quantization schemes, and training datasets. Additionally, we introduce GenoAdv, a new adversarial sample dataset designed to improve GFM safety. Empirically, classification models exhibit greater robustness to adversarial perturbations compared to generative models, highlighting the impact of task type on model vulnerability. Moreover, adversarial attacks frequently target biologically significant genomic regions, suggesting that these models effectively capture meaningful sequence features.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.10983v1.pdf

GitHub:
https://github.com/MAGICS-LAB/GenoArmory

Datasets:
• GenoAdv
• GUE
==================================

For more data science resources:

https://t.iss.one/DataScienceT
4👏1
Article Title:
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Article Date: 6 May 2025

Article Description:
With the growing requirement for natural human-computer interaction, speech-based systems receive increasing attention as speech is one of the most common forms of daily communication. However, the existing speech models still experience high latency when generating the first audio token during streaming, which poses a significant bottleneck for deployment. To address this issue, we propose VITA-Audio, an end-to-end large speech model with fast audio-text token generation. Specifically, we introduce a lightweight Multiple Cross-modal Token Prediction (MCTP) module that efficiently generates multiple audio tokens within a single model forward pass, which not only accelerates the inference but also significantly reduces the latency for generating the first audio in streaming scenarios. In addition, a four-stage progressive training strategy is explored to achieve model acceleration with minimal loss of speech quality. To our knowledge, VITA-Audio is the first multi-modal large language model capable of generating audio output during the first forward pass, enabling real-time conversational capabilities with minimal latency. VITA-Audio is fully reproducible and is trained on open-source data only. Experimental results demonstrate that our model achieves an inference speedup of 3~5x at the 7B parameter scale, but also significantly outperforms open-source models of similar model size on multiple benchmarks for automatic speech recognition (ASR), text-to-speech (TTS), and spoken question answering (SQA) tasks.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.03739v1.pdf

GitHub:
https://github.com/vita-mllm/vita-audio

Datasets:
• LibriSpeech
• TriviaQA
• LibriTTS
• AISHELL-1
• FLEURS
• VoxPopuli
• LIMA
• GigaSpeech
• Multilingual LibriSpeech
• AISHELL-2
• WenetSpeech
• MathInstruct
==================================

For more data science resources:

https://t.iss.one/DataScienceT
4
Article Title:
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video

Article Date: 27 Mar 2025

Article Description:
This paper presents a unified approach to understanding dynamic scenes from casual videos. Large pretrained vision foundation models, such as vision-language, video depth prediction, motion tracking, and segmentation models, offer promising capabilities. However, training a single model for comprehensive 4D understanding remains challenging. We introduce Uni4D, a multi-stage optimization framework that harnesses multiple pretrained models to advance dynamic 3D modeling, including static/dynamic reconstruction, camera pose estimation, and dense 3D motion tracking. Our results show state-of-the-art performance in dynamic 4D modeling with superior visual quality. Notably, Uni4D requires no retraining or fine-tuning, highlighting the effectiveness of repurposing visual foundation models for 4D understanding.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2503.21761v1.pdf

GitHub:
https://github.com/Davidyao99/uni4d

Datasets:
• KITTI
• DAVIS
• TUM RGB-D
• MPI Sintel
• Bonn RGB-D Dynamic
==================================

For more data science resources:

https://t.iss.one/DataScienceT
3👍2
Article Title:
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

Article Date: 27 Apr 2025

Article Description:
Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transforming them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we introduce a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper's companion Python toolkit, UQLM. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2504.19254v2.pdf

GitHub:
https://github.com/cvs-health/uqlm

Datasets:
• GSM8K
• SVAMP
• PopQA
==================================

For more data science resources:

https://t.iss.one/DataScienceT
5
Article Title:
s3: You Don't Need That Much Data to Train a Search Agent via RL

Article Date: 20 May 2025

Article Description:
Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.14146v1.pdf

GitHub:
https://github.com/pat-jj/s3

Datasets:
• Natural Questions
• TriviaQA
• HotpotQA
• MedQA
• PubMedQA
==================================

For more data science resources:

https://t.iss.one/DataScienceT
4
Update Telegram, now job seekers can advertise their expertise and job opportunities without having to go back to the channel owner
Article Title:
Vision as LoRA

Article Date: 26 Mar 2025

Article Description:
We introduce Vision as LoRA (VoRA), a novel paradigm for transforming an LLM into an MLLM. Unlike prevalent MLLM architectures that rely on external vision modules for vision encoding, VoRA internalizes visual capabilities by integrating vision-specific LoRA layers directly into the LLM. This design allows the added parameters to be seamlessly merged into the LLM during inference, eliminating structural complexity and minimizing computational overhead. Moreover, inheriting the LLM's ability of handling flexible context, VoRA can process inputs at arbitrary resolutions. To further strengthen VoRA's visual capabilities, we introduce a block-wise distillation method that transfers visual priors from a pre-trained ViT into the LoRA layers, effectively accelerating training by injecting visual knowledge. Additionally, we apply bi-directional attention masks to better capture the context information of an image. We successfully demonstrate that with additional pre-training data, VoRA can perform comparably with conventional encode-based MLLMs. All training data, codes, and model weights will be released at https://github.com/Hon-Wong/VoRA.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2503.20680v1.pdf

GitHub:
https://github.com/hon-wong/vora

Datasets:
• MM-Vet
• Google Landmarks Dataset v2
==================================

For more data science resources:

https://t.iss.one/DataScienceT
7👏1
Article Title:
Harnessing the Universal Geometry of Embeddings

Article Date: 18 May 2025

Article Description:
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity across model pairs with different architectures, parameter counts, and training datasets. The ability to translate unknown embeddings into a different space while preserving their geometry has serious implications for the security of vector databases. An adversary with access only to embedding vectors can extract sensitive information about the underlying documents, sufficient for classification and attribute inference.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.12540v2.pdf

GitHub:
https://github.com/rjha18/vec2vec
https://github.com/zhaoolee/garss

Datasets:
• Natural Questions
==================================

For more data science resources:

https://t.iss.one/DataScienceT
6🔥1🎉1
Article Title:
MTGS: Multi-Traversal Gaussian Splatting

Article Date: 16 Mar 2025

Article Description:
Multi-traversal data, commonly collected through daily commutes or by self-driving fleets, provides multiple viewpoints for scene reconstruction within a road block. This data offers significant potential for high-quality novel view synthesis, which is crucial for applications such as autonomous vehicle simulators. However, inherent challenges in multi-traversal data often result in suboptimal reconstruction quality, including variations in appearance and the presence of dynamic objects. To address these issues, we propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data by modeling a shared static geometry while separately handling dynamic elements and appearance variations. Our method employs a multi-traversal dynamic scene graph with a shared static node and traversal-specific dynamic nodes, complemented by color correction nodes with learnable spherical harmonics coefficient residuals. This approach enables high-fidelity novel view synthesis and provides flexibility to navigate any viewpoint. We conduct extensive experiments on a large-scale driving dataset, nuPlan, with multi-traversal data. Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines. The code and data would be available to the public.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2503.12552v3.pdf

GitHub:
https://github.com/OpenDriveLab/MTGS

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
5🎉1
Article Title:
ImgEdit: A Unified Image Editing Dataset and Benchmark

Article Date: 26 May 2025

Article Description:
Recent advancements in generative models have enabled high-fidelity text-to-image generation. However, open-source image-editing models still lag behind their proprietary counterparts, primarily due to limited high-quality data and insufficient benchmarks. To overcome these limitations, we introduce ImgEdit, a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs, which contain both novel and complex single-turn edits, as well as challenging multi-turn tasks. To ensure the data quality, we employ a multi-stage pipeline that integrates a cutting-edge vision-language model, a detection model, a segmentation model, alongside task-specific in-painting procedures and strict post-processing. ImgEdit surpasses existing datasets in both task novelty and data quality. Using ImgEdit, we train ImgEdit-E1, an editing model using Vision Language Model to process the reference image and editing prompt, which outperforms existing open-source models on multiple tasks, highlighting the value of ImgEdit and model design. For comprehensive evaluation, we introduce ImgEdit-Bench, a benchmark designed to evaluate image editing performance in terms of instruction adherence, editing quality, and detail preservation. It includes a basic testsuite, a challenging single-turn suite, and a dedicated multi-turn suite. We evaluate both open-source and proprietary models, as well as ImgEdit-E1, providing deep analysis and actionable insights into the current behavior of image-editing models. The source data are publicly available on https://github.com/PKU-YuanGroup/ImgEdit.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.20275v1.pdf

GitHub:
https://github.com/pku-yuangroup/imgedit

Datasets:
• MagicBrush
==================================

For more data science resources:

https://t.iss.one/DataScienceT
6👏1
Article Title:
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Article Date: 24 May 2025

Article Description:
Diffusion models have advanced image stylization significantly, yet two core challenges persist: (1) maintaining consistent stylization in complex scenes, particularly identity, composition, and fine details, and (2) preventing style degradation in image-to-image pipelines with style LoRAs. GPT-4o's exceptional stylization consistency highlights the performance gap between open-source methods and proprietary models. To bridge this gap, we propose \textbf{OmniConsistency}, a universal consistency plugin leveraging large-scale Diffusion Transformers (DiTs). OmniConsistency contributes: (1) an in-context consistency learning framework trained on aligned image pairs for robust generalization; (2) a two-stage progressive learning strategy decoupling style learning from consistency preservation to mitigate style degradation; and (3) a fully plug-and-play design compatible with arbitrary style LoRAs under the Flux framework. Extensive experiments show that OmniConsistency significantly enhances visual coherence and aesthetic quality, achieving performance comparable to commercial state-of-the-art model GPT-4o.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.18445v1.pdf

GitHub:
https://github.com/showlab/omniconsistency

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
6
Article Title:
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

Article Date: 27 Apr 2025

Article Description:
Hallucinations are a persistent problem with Large Language Models (LLMs). As these models become increasingly used in high-stakes domains, such as healthcare and finance, the need for effective hallucination detection is crucial. To this end, we propose a versatile framework for zero-resource hallucination detection that practitioners can apply to real-world use cases. To achieve this, we adapt a variety of existing uncertainty quantification (UQ) techniques, including black-box UQ, white-box UQ, and LLM-as-a-Judge, transforming them as necessary into standardized response-level confidence scores ranging from 0 to 1. To enhance flexibility, we introduce a tunable ensemble approach that incorporates any combination of the individual confidence scores. This approach enables practitioners to optimize the ensemble for a specific use case for improved performance. To streamline implementation, the full suite of scorers is offered in this paper's companion Python toolkit, UQLM. To evaluate the performance of the various scorers, we conduct an extensive set of experiments using several LLM question-answering benchmarks. We find that our tunable ensemble typically surpasses its individual components and outperforms existing hallucination detection methods. Our results demonstrate the benefits of customized hallucination detection strategies for improving the accuracy and reliability of LLMs.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2504.19254v2.pdf

GitHub:
https://github.com/cvs-health/uqlm

Datasets:
• GSM8K
• SVAMP
• PopQA
==================================

For more data science resources:

https://t.iss.one/DataScienceT
6
Please open Telegram to view this post
VIEW IN TELEGRAM
6
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
3
Article Title:
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

Article Date: 21 Apr 2025

Article Description:
Camera and human motion controls have been extensively studied for video generation, but existing approaches typically address them separately, suffering from limited data with high-quality annotations for both aspects. To overcome this, we present Uni3C, a unified 3D-enhanced framework for precise control of both camera and human motion in video generation. Uni3C includes two key contributions. First, we propose a plug-and-play control module trained with a frozen video generative backbone, PCDController, which utilizes unprojected point clouds from monocular depth to achieve accurate camera control. By leveraging the strong 3D priors of point clouds and the powerful capacities of video foundational models, PCDController shows impressive generalization, performing well regardless of whether the inference backbone is frozen or fine-tuned. This flexibility enables different modules of Uni3C to be trained in specific domains, i.e., either camera control or human motion control, reducing the dependency on jointly annotated data. Second, we propose a jointly aligned 3D world guidance for the inference phase that seamlessly integrates both scenic point clouds and SMPL-X characters to unify the control signals for camera and human motion, respectively. Extensive experiments confirm that PCDController enjoys strong robustness in driving camera motion for fine-tuned backbones of video generation. Uni3C substantially outperforms competitors in both camera controllability and human motion quality. Additionally, we collect tailored validation sets featuring challenging camera movements and human actions to validate the effectiveness of our method.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2504.14899v1.pdf

GitHub:
https://github.com/ewrfcas/uni3c

Datasets:
• No datasets information available
==================================

For more data science resources:

https://t.iss.one/DataScienceT
7