Data Science | Machine Learning with Python for Researchers

🔹 Title:
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models

🔹 Publication Date: Published on Jun 6

🔹 Abstract:
A heterogeneous Mixture-of-Adapters (MoA) approach enhances parameter-efficient fine-tuning in LLMs by integrating diverse adapter experts, outperforming homogeneous MoE-LoRA methods. AI-generated summary Recent studies integrate Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) to further enhance the performance of parameter-efficient fine-tuning (PEFT) methods in Large Language Model (LLM) applications. Existing methods employ homogeneous MoE-LoRA architectures composed of LoRA experts with either similar or identical structures and capacities. However, these approaches often suffer from representation collapse and expert load imbalance , which negatively impact the potential of LLMs. To address these challenges, we propose a heterogeneous Mixture-of-Adapters (MoA) approach. This method dynamically integrates PEFT adapter experts with diverse structures, leveraging their complementary representational capabilities to foster expert specialization, thereby enhancing the effective transfer of pre-trained knowledge to downstream tasks. MoA supports two variants: (i) Soft MoA achieves fine-grained integration by performing a weighted fusion of all expert outputs; (ii) Sparse MoA activates adapter experts sparsely based on their contribution, achieving this with negligible performance degradation. Experimental results demonstrate that heterogeneous MoA outperforms homogeneous MoE-LoRA methods in both performance and parameter efficiency. Our project is available at https://github.com/DCDmllm/MoA.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.05928
• PDF: https://arxiv.org/pdf/2506.05928
• Github: https://github.com/DCDmllm/MoA

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

GitHub

GitHub - DCDmllm/MoA

Contribute to DCDmllm/MoA development by creating an account on GitHub.

❤1

970 views13:02

Data Science | Machine Learning with Python for Researchers

🔹 Title:
A Multimodal Automated Interpretability Agent

🔹 Publication Date: Published on Apr 22, 2024

🔹 Abstract:
MAIA, a multimodal automated interpretability agent, uses neural models to perform feature interpretation and failure mode discovery for other models, demonstrating comparable results to human experimenters and aiding in reducing sensitivity to spurious features and identifying potential misclassifications. AI-generated summary This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery . It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools commonly used by human interpretability researchers: for synthesizing and editing inputs, computing maximally activating exemplars from real-world datasets, and summarizing and describing experimental results. Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior. We evaluate applications of MAIA to computer vision models. We first characterize MAIA's ability to describe (neuron-level) features in learned representations of images. Across several trained models and a novel dataset of synthetic vision neurons with paired ground-truth descriptions, MAIA produces descriptions comparable to those generated by expert human experimenters. We then show that MAIA can aid in two additional interpretability tasks: reducing sensitivity to spurious features , and automatically identifying inputs likely to be mis-classified.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2404.14394
• PDF: https://arxiv.org/pdf/2404.14394

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

A Multimodal Automated Interpretability Agent

This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and...

❤5

1.09K views13:24

Data Science | Machine Learning with Python for Researchers

Forwarded from Python | Machine Learning | Coding | R

❗️ JAY HELPS EVERYONE EARN MONEY!$29,000 HE'S GIVING AWAY TODAY!

Everyone can join his channel and make money! He gives away from $200 to $5.000 every day in his channel

https://t.iss.one/+LgzKy2hA4eY0YWNl

⚡️FREE ONLY FOR THE FIRST 500 SUBSCRIBERS! FURTHER ENTRY IS PAID! 👆👇

https://t.iss.one/+LgzKy2hA4eY0YWNl

❤1

595 views14:38

Data Science | Machine Learning with Python for Researchers

🔹 Title:
Scaling Test-time Compute for LLM Agents

🔹 Publication Date: Published on Jun 15

🔹 Abstract:
Systematic exploration of test-time scaling methods in large language agents reveals that computational scaling improves performance, especially through parallel sampling, sequential revision, effective verification, and increased rollout diversity. AI-generated summary Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms ; (2) sequential revision strategies; (3) verifiers and merging methods ; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.12928
• PDF: https://arxiv.org/pdf/2506.12928

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

Scaling Test-time Compute for LLM Agents

Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying...

❤3

1.26K views13:01

Data Science | Machine Learning with Python for Researchers

Article Title:
Learning Compact Vision Tokens for Efficient Large Multimodal Models

Article Date: 8 Jun 2025

Article Description:
Large multimodal models (LMMs) suffer significant computational challenges due to the high cost of Large Language Models (LLMs) and the quadratic complexity of processing long vision token sequences. In this paper, we explore the spatial redundancy among vision tokens and shorten the length of vision token sequences for inference acceleration. Specifically, we propose a Spatial Token Fusion (STF) method to learn compact vision tokens for short vision token sequence, where spatial-adjacent tokens are fused into one. Meanwhile, weight-frozen vision encoder can not well adapt to the demand of extensive downstream vision-language tasks. To this end, we further introduce a Multi-Block Token Fusion (MBTF) module to supplement multi-granularity features for the reduced token sequence. Overall, we combine STF and MBTF module to balance token reduction and information preservation, thereby improving inference efficiency without sacrificing multimodal reasoning capabilities. Experimental results demonstrate that our method based on LLaVA-1.5 achieves comparable or even superior performance to the baseline on 8 popular vision-language benchmarks with only $25\%$ vision tokens of baseline. The source code and trained weights are available at https://github.com/visresearch/LLaVA-STF.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2506.07138v1.pdf

GitHub:
• https://github.com/visresearch/LLaVA-STF

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤1

1.11K views06:20

Data Science | Machine Learning with Python for Researchers

🔹 Title:
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

🔹 Publication Date: Published on Jun 19

🔹 Abstract:
GRPO-CARE, a reinforcement learning framework optimizing for consistency and correctness, outperforms standard GRPO on a new video understanding benchmark, SEED-Bench-R1, improving both performance and logical coherence in multimodal large language models. AI-generated summary Recent reinforcement learning approaches, such as outcome-supervised GRPO , have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1 , a benchmark with complex real-world videos requiring balanced perception and reasoning. It offers a large training set and evaluates generalization across three escalating challenges: in-distribution , cross-environment , and cross-environment-task scenarios. Using SEED-Bench-R1 , we find that standard GRPO, while improving answer accuracy , often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate. This stems from reward signals focusing solely on final answers, encouraging shortcuts , and strict KL penalties limiting exploration .To address this, we propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision. GRPO-CARE introduces a two-tiered reward : (1) a base reward for answer correctness, and (2) an adaptive consistency bonus , computed by comparing the model's reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers.This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. Replacing KL penalties with this adaptive bonus, GRPO-CARE outperforms standard GRPO on SEED-Bench-R1 , achieving a 6.7% performance gain on the hardest evaluation level and a 24.5% improvement in consistency. It also shows strong transferability , improving model performance across diverse video understanding benchmarks . Our work contributes a systematically designed benchmark and a generalizable post-training framework, advancing the development of more interpretable and robust MLLMs.

🔹 Links:
• arXiv Page: https://arxiv.org/pdf/2506.16141
• PDF: https://arxiv.org/pdf/2506.16141
• Github: https://github.com/TencentARC/GRPO-CARE

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤6

1.22K views06:25

Data Science | Machine Learning with Python for Researchers

Article Title:
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment

Article Date: 30 Mar 2021

Article Description:
Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data-driven models over vector and matrix features. They are more predictive than formulaic alphas, but are too complex to mine into a weakly correlated set. In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes. The new alphas predict returns with high accuracy and can be mined into a weakly correlated set. In addition, we propose a novel alpha mining framework based on AutoML, called AlphaEvolve, to generate the new alphas. To this end, we first propose operators for generating the new alphas and selectively injecting relational domain knowledge to model the relations between stocks. We then accelerate the alpha mining by proposing a pruning technique for redundant alphas. Experiments show that AlphaEvolve can evolve initial alphas into the new alphas with high returns and weak correlations.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2103.16196v2.pdf

GitHub:
• https://github.com/codelion/openevolve

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤4

1.22K views17:36

Data Science | Machine Learning with Python for Researchers

Article Title:
Aligning Multimodal LLM with Human Preference: A Survey

Article Date: 18 Mar 2025

Article Description:
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2503.14504v1.pdf

GitHub:
• https://github.com/bradyfu/awesome-multimodal-large-language-models

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤3

961 views09:34

Data Science | Machine Learning with Python for Researchers

Article Title:
A Survey of LLM $\times$ DATA

Article Date: 24 May 2025

Article Description:
The integration of large language model (LLM) and data management (DATA) is rapidly redefining both domains. In this survey, we comprehensively review the bidirectional relationships. On the one hand, DATA4LLM, spanning large-scale data processing, storage, and serving, feeds LLMs with high quality, diversity, and timeliness of data required for stages like pre-training, post-training, retrieval-augmented generation, and agentic workflows: (i) Data processing for LLMs includes scalable acquisition, deduplication, filtering, selection, domain mixing, and synthetic augmentation; (ii) Data Storage for LLMs focuses on efficient data and model formats, distributed and heterogeneous storage hierarchies, KV-cache management, and fault-tolerant checkpointing; (iii) Data serving for LLMs tackles challenges in RAG (e.g., knowledge post-processing), LLM inference (e.g., prompt compression, data provenance), and training strategies (e.g., data packing and shuffling). On the other hand, in LLM4DATA, LLMs are emerging as general-purpose engines for data management. We review recent advances in (i) data manipulation, including automatic data cleaning, integration, discovery; (ii) data analysis, covering reasoning over structured, semi-structured, and unstructured data, and (iii) system optimization (e.g., configuration tuning, query rewriting, anomaly diagnosis), powered by LLM techniques like retrieval-augmented prompting, task-specialized fine-tuning, and multi-agent collaboration.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2505.18458v3.pdf

GitHub:
• https://github.com/weaidb/awsome-data-llm
• https://github.com/weaidb/awesome-data-llm

Datasets:
• HumanEval
• C4
• CCNet
• MassiveText
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

872 views10:58

Data Science | Machine Learning with Python for Researchers

Article Title:
VACE: All-in-One Video Creation and Editing

Article Date: 10 Mar 2025

Article Description:
Diffusion Transformer has demonstrated powerful capability and scalability in generating high-quality images and videos. Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation. However, due to the intrinsic demands for consistency across both temporal and spatial dynamics, achieving a unified approach for video synthesis remains challenging. We introduce VACE, which enables users to perform Video tasks within an All-in-one framework for Creation and Editing. These tasks include reference-to-video generation, video-to-video editing, and masked video-to-video editing. Specifically, we effectively integrate the requirements of various tasks by organizing video task inputs, such as editing, reference, and masking, into a unified interface referred to as the Video Condition Unit (VCU). Furthermore, by utilizing a Context Adapter structure, we inject different task concepts into the model using formalized representations of temporal and spatial dimensions, allowing it to handle arbitrary video synthesis tasks flexibly. Extensive experiments demonstrate that the unified model of VACE achieves performance on par with task-specific models across various subtasks. Simultaneously, it enables diverse applications through versatile task combinations. Project page: https://ali-vilab.github.io/VACE-Page/.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2503.07598v1.pdf

GitHub:
• https://github.com/wan-video/wan2.1
• https://github.com/ali-vilab/vace

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤2

867 views12:56

Data Science | Machine Learning with Python for Researchers

Article Title:
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Article Date: Fiona Ryan

Article Description:
We address the problem of gaze target estimation, which aims to predict where a person is looking in a scene. Predicting a person's gaze target requires reasoning both about the person's appearance and the contents of the scene. Prior works have developed increasingly complex, hand-crafted pipelines for gaze target estimation that carefully fuse features from separate scene encoders, head encoders, and auxiliary models for signals like depth and pose. Motivated by the success of general-purpose feature extractors on a variety of visual tasks, we propose Gaze-LLE, a novel transformer framework that streamlines gaze target estimation by leveraging features from a frozen DINOv2 encoder. We extract a single feature representation for the scene, and apply a person-specific positional prompt to decode gaze with a lightweight module. We demonstrate state-of-the-art performance across several gaze benchmarks and provide extensive analysis to validate our design choices. Our code is available at: https://github.com/fkryan/gazelle .PDFAbstractCVPR 2025 PDFCVPR 2025 Abstract

PDF Download Link:
https://arxiv.org/pdf/2412.09586v1.pdf

GitHub:
• https://github.com/fkryan/gazelle

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

GitHub

GitHub - fkryan/gazelle: Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight) - fkryan/gazelle

❤1

917 views12:58

Data Science | Machine Learning with Python for Researchers

🔹 Title:
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models

🔹 Publication Date: Published on Jun 8

🔹 Abstract:
Frame Guidance offers a training-free method for controlling video generation using frame-level signals, reducing memory usage and enhancing globally coherent video output. AI-generated summary Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale video models for specific tasks, which becomes increasingly impractical as model sizes continue to grow. In this work, we present Frame Guidance, a training-free guidance for controllable video generation based on frame-level signals , such as keyframes , style reference images, sketches , or depth maps . For practical training-free guidance, we propose a simple latent processing method that dramatically reduces memory usage, and apply a novel latent optimization strategy designed for globally coherent video generation . Frame Guidance enables effective control across diverse tasks, including keyframe guidance , stylization , and looping , without any training, compatible with any video models . Experimental results show that Frame Guidance can produce high-quality controlled videos for a wide range of tasks and input signals.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.07177
• PDF: https://arxiv.org/pdf/2506.07177
• Github: https://frame-guidance-video.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

Frame Guidance: Training-Free Guidance for Frame-Level Control in...

Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale...

❤1

861 views13:02

Data Science | Machine Learning with Python for Researchers

🔹 Title:
PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

🔹 Publication Date: Published on Jun 26

🔹 Abstract:
A physics-based skinning and rigging framework called PhysRig uses volumetric representation and continuum mechanics for more realistic and physically plausible animations. AI-generated summary Skinning and rigging are fundamental components in animation, articulated object reconstruction, motion transfer, and 4D generation. Existing approaches predominantly rely on Linear Blend Skinning (LBS) , due to its simplicity and differentiability. However, LBS introduces artifacts such as volume loss and unnatural deformations, and it fails to model elastic materials like soft tissues, fur, and flexible appendages (e.g., elephant trunks, ears, and fatty tissues). In this work, we propose PhysRig: a differentiable physics-based skinning and rigging framework that overcomes these limitations by embedding the rigid skeleton into a volumetric representation (e.g., a tetrahedral mesh ), which is simulated as a deformable soft-body structure driven by the animated skeleton. Our method leverages continuum mechanics and discretizes the object as particles embedded in an Eulerian background grid to ensure differentiability with respect to both material properties and skeletal motion. Additionally, we introduce material prototypes , significantly reducing the learning space while maintaining high expressiveness. To evaluate our framework, we construct a comprehensive synthetic dataset using meshes from Objaverse, The Amazing Animals Zoo, and MixaMo, covering diverse object categories and motion patterns. Our method consistently outperforms traditional LBS-based approaches, generating more realistic and physically plausible results. Furthermore, we demonstrate the applicability of our framework in the pose transfer task highlighting its versatility for articulated object modeling.

🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.20936
• PDF: https://arxiv.org/pdf/2506.20936
• Project Page: https://physrig.github.io/

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

arXiv.org

PhysRig: Differentiable Physics-Based Skinning and Rigging...

Skinning and rigging are fundamental components in animation, articulated object reconstruction, motion transfer, and 4D generation. Existing approaches predominantly rely on Linear Blend Skinning...

❤4

905 views14:18

Data Science | Machine Learning with Python for Researchers

arXiv.org

Scaling Test-time Compute for LLM Agents

Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying...

❤5

1.18K views14:34

Data Science | Machine Learning with Python for Researchers

Forwarded from Python | Machine Learning | Coding | R

This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

✅

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

✅

https://t.iss.one/Codeprogrammer

Please open Telegram to view this post

VIEW IN TELEGRAM

439 views05:03

Data Science | Machine Learning with Python for Researchers

Article Title:
SymbolicAI: A framework for logic-based approaches combining generative models and solvers

Article Date: 1 Feb 2024

Article Description:
We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. SymbolicAI enables the seamless integration of generative models with a diverse range of solvers by treating large language models (LLMs) as semantic parsers that execute tasks based on both natural and formal language instructions, thus bridging the gap between symbolic reasoning and generative AI. We leverage probabilistic programming principles to tackle complex tasks, and utilize differentiable and classical programming paradigms with their respective strengths. The framework introduces a set of polymorphic, compositional, and self-referential operations for multi-modal data that connects multi-step generative processes and aligns their outputs with user objectives in complex workflows. As a result, we can transition between the capabilities of various foundation models with in-context learning capabilities and specialized, fine-tuned models or solvers proficient in addressing specific problems. Through these operations based on in-context learning our framework enables the creation and evaluation of explainable computational graphs. Finally, we introduce a quality measure and its empirical score for evaluating these computational graphs, and propose a benchmark that compares various state-of-the-art LLMs across a set of complex workflows. We refer to the empirical score as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. The framework codebase and benchmark are linked below.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2402.00854v4.pdf

GitHub:
• https://github.com/ExtensityAI/symbolicai
• https://github.com/extensityai/benchmark
• https://github.com/xpitfire/symbolicai

Datasets:
• No datasets information available
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤8

1.38K views07:54

Data Science | Machine Learning with Python for Researchers

Article Title:
Smaller But Better: Unifying Layout Generation with Smaller Large Language Models

Article Date: 19 Feb 2025

Article Description:
We propose LGGPT, an LLM-based model tailored for unified layout generation. First, we propose Arbitrary Layout Instruction (ALI) and Universal Layout Response (ULR) as the uniform I/O template. ALI accommodates arbitrary layout generation task inputs across multiple layout domains, enabling LGGPT to unify both task-generic and domain-generic layout generation hitherto unexplored. Collectively, ALI and ULR boast a succinct structure that forgoes superfluous tokens typically found in existing HTML-based formats, facilitating efficient instruction tuning and boosting unified generation performance. In addition, we propose an Interval Quantization Encoding (IQE) strategy that compresses ALI into a more condensed structure. IQE precisely preserves valid layout clues while eliminating the less informative placeholders, facilitating LGGPT to capture complex and variable layout generation conditions during the unified training process. Experimental results demonstrate that LGGPT achieves superior or on par performance compared to existing methods. Notably, LGGPT strikes a prominent balance between proficiency and efficiency with a compact 1.5B parameter LLM, which beats prior 7B or 175B models even in the most extensive and challenging unified scenario. Furthermore, we underscore the necessity of employing LLMs for unified layout generation and suggest that 1.5B could be an optimal parameter size by comparing LLMs of varying scales. Code is available at https://github.com/NiceRingNode/LGGPT.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2502.14005v1.pdf

GitHub:
• https://github.com/niceringnode/lggpt

Datasehttps://t.iss.one/DataScienceTts:
• PubLayNet
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤3

1.2K views06:15

Data Science | Machine Learning with Python for Researchers

Article Title:
Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science

Article Date: 4 Jun 2025

Article Description:
Contemporary approaches to assisted scientific discovery use language models to automatically generate large numbers of potential hypothesis to test, while also automatically generating code-based experiments to test those hypotheses. While hypotheses can be comparatively inexpensive to generate, automated experiments can be costly, particularly when run at scale (i.e. thousands of experiments). Developing the capacity to filter hypotheses based on their feasibility would allow discovery systems to run at scale, while increasing their likelihood of making significant discoveries. In this work we introduce Matter-of-Fact, a challenge dataset for determining the feasibility of hypotheses framed as claims. Matter-of-Fact includes 8.4k claims extracted from scientific articles spanning four high-impact contemporary materials science topics, including superconductors, semiconductors, batteries, and aerospace materials, while including qualitative and quantitative claims from theoretical, experimental, and code/simulation results. We show that strong baselines that include retrieval augmented generation over scientific literature and code generation fail to exceed 72% performance on this task (chance performance is 50%), while domain-expert verification suggests nearly all are solvable -- highlighting both the difficulty of this task for current models, and the potential to accelerate scientific discovery by making near-term progress.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2506.04410v1.pdf

GitHub:
• https://github.com/cognitiveailab/matter-of-fact

Datasets:
• COVID-Fact
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤4

1.09K views11:02

Data Science | Machine Learning with Python for Researchers

Article Title:
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

Article Date: 23 Aug 2023

Article Description:
Despite the progress of foundation models, knowledge-based reasoning remains a persistent challenge due to their limited capacity for knowledge recall and inference. Existing methods primarily focus on encouraging these models to plan and solve problems or extensively sample reasoning chains independently. However, these methods often overlook conceptual errors and inferential fallacies, inevitably leading to a series of notorious issues such as misleading conclusions, cognitive biases, and reduced decision quality. While explicit modeling of causality is argued to hold promise in addressing these issues, contemporary research efforts have thus far fallen short in achieving causality-based foundation models. Drawing inspiration from the orchestration of diverse specialized agents collaborating to tackle intricate tasks, we propose a framework named Causal-Consistency Chain-of-Thought (CaCo-CoT) that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models, involving a set of reasoners and evaluators. These agents collaboratively work within a reasoning-and-consensus paradigm to improve faithfulness. The reasoners are tasked with generating reasoning chains for knowledge-intensive problems by mimicking human causal reasoning. Meanwhile, the evaluator scrutinizes the causal consistency of a reasoner's reasoning chain from a non-causal and a counterfactual perspective. Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations across text-based and multi-modal knowledge reasoning tasks (e.g., science question answering and commonsense reasoning).PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2308.11914v4.pdf

GitHub:
• https://github.com/hcplab-sysu/causalvlr
• https://github.com/hcplab-sysu/causal-vlreasoning

Datasets:
• BoolQ
• ScienceQA
• Com2Sense
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤3

909 views04:03

Data Science | Machine Learning with Python for Researchers

🔹 Title:
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models

🔹 Publication Date: Published on Jun 24

🔹 Abstract:
Outlier-Safe Pre-Training improves large language model quantization performance by preventing extreme activation outliers through innovative training techniques. AI-generated summary Extreme activation outliers in Large Language Models (LLMs) critically degrade quantization performance , hindering efficient on-device deployment. While channel-wise operations and adaptive gradient scaling are recognized causes, practical mitigation remains challenging. We introduce Outlier-Safe Pre-Training (OSP), a practical guideline that proactively prevents outlier formation rather than relying on post-hoc mitigation. OSP combines three key innovations: (1) the Muon optimizer , eliminating privileged bases while maintaining training efficiency; (2) Single-Scale RMSNorm , preventing channel-wise amplification; and (3) a learnable embedding projection , redistributing activation magnitudes originating from embedding matrices. We validate OSP by training a 1.4B-parameter model on 1 trillion tokens, which is the first production-scale LLM trained without such outliers. Under aggressive 4-bit quantization, our OSP model achieves a 35.7 average score across 10 benchmarks (compared to 26.5 for an Adam-trained model), with only a 2% training overhead. Remarkably, OSP models exhibit near-zero excess kurtosis (0.04) compared to extreme values (1818.56) in standard models, fundamentally altering LLM quantization behavior. Our work demonstrates that outliers are not inherent to LLMs but are consequences of training strategies, paving the way for more efficient LLM deployment . The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Outlier-Safe-Pre-Training.

🔹 Links:
• arXiv Page: https://arxivexplained.com/papers/outlier-safe-pre-training-for-robust-4-bit-quantization-of-large-language-models
• PDF: https://arxiv.org/pdf/2506.19697
• Project Page: https://huggingface.co/papers?q=learnable%20embedding%20projection
• Github: https://github.com/dmis-lab/Outlier-Safe-Pre-Training

🔹 Datasets citing this paper:
No datasets found

🔹 Spaces citing this paper:
No spaces found
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

GitHub

GitHub - dmis-lab/Outlier-Safe-Pre-Training: [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language…

[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models - dmis-lab/Outlier-Safe-Pre-Training

❤1

818 views06:21

Data Science | Machine Learning with Python for Researchers

Article Title:
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games

Article Date: 5 Jun 2025

Article Description:
LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are inherently asynchronous. For example, in group chats, online team meetings, or social games, there is no inherent notion of turns; therefore, the decision of when to speak forms a crucial part of the participant's decision making. In this work, we develop an adaptive asynchronous LLM-agent which, in addition to determining what to say, also decides when to say it. To evaluate our agent, we collect a unique dataset of online Mafia games, including both human participants, as well as our asynchronous agent. Overall, our agent performs on par with human players, both in game performance, as well as in its ability to blend in with the other human players. Our analysis shows that the agent's behavior in deciding when to speak closely mirrors human patterns, although differences emerge in message content. We release all our data and code to support and encourage further research for more realistic asynchronous communication between LLM agents. This work paves the way for integration of LLMs into realistic human group settings, from assistance in team discussions to educational and professional environments where complex social dynamics must be navigated.PDFAbstract

PDF Download Link:
https://arxiv.org/pdf/2506.05309v1.pdf

GitHub:
• https://github.com/niveck/LLMafia

Datasets:
• LLMafia
==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

❤3

908 views07:02

About

Blog

Apps

Platform