πΉ Title:
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models
πΉ Publication Date: Published on Jun 6
πΉ Abstract:
A heterogeneous Mixture-of-Adapters (MoA) approach enhances parameter-efficient fine-tuning in LLMs by integrating diverse adapter experts, outperforming homogeneous MoE-LoRA methods. AI-generated summary Recent studies integrate Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) to further enhance the performance of parameter-efficient fine-tuning (PEFT) methods in Large Language Model (LLM) applications. Existing methods employ homogeneous MoE-LoRA architectures composed of LoRA experts with either similar or identical structures and capacities. However, these approaches often suffer from representation collapse and expert load imbalance , which negatively impact the potential of LLMs. To address these challenges, we propose a heterogeneous Mixture-of-Adapters (MoA) approach. This method dynamically integrates PEFT adapter experts with diverse structures, leveraging their complementary representational capabilities to foster expert specialization, thereby enhancing the effective transfer of pre-trained knowledge to downstream tasks. MoA supports two variants: (i) Soft MoA achieves fine-grained integration by performing a weighted fusion of all expert outputs; (ii) Sparse MoA activates adapter experts sparsely based on their contribution, achieving this with negligible performance degradation. Experimental results demonstrate that heterogeneous MoA outperforms homogeneous MoE-LoRA methods in both performance and parameter efficiency. Our project is available at https://github.com/DCDmllm/MoA.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.05928
β’ PDF: https://arxiv.org/pdf/2506.05928
β’ Github: https://github.com/DCDmllm/MoA
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models
πΉ Publication Date: Published on Jun 6
πΉ Abstract:
A heterogeneous Mixture-of-Adapters (MoA) approach enhances parameter-efficient fine-tuning in LLMs by integrating diverse adapter experts, outperforming homogeneous MoE-LoRA methods. AI-generated summary Recent studies integrate Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) to further enhance the performance of parameter-efficient fine-tuning (PEFT) methods in Large Language Model (LLM) applications. Existing methods employ homogeneous MoE-LoRA architectures composed of LoRA experts with either similar or identical structures and capacities. However, these approaches often suffer from representation collapse and expert load imbalance , which negatively impact the potential of LLMs. To address these challenges, we propose a heterogeneous Mixture-of-Adapters (MoA) approach. This method dynamically integrates PEFT adapter experts with diverse structures, leveraging their complementary representational capabilities to foster expert specialization, thereby enhancing the effective transfer of pre-trained knowledge to downstream tasks. MoA supports two variants: (i) Soft MoA achieves fine-grained integration by performing a weighted fusion of all expert outputs; (ii) Sparse MoA activates adapter experts sparsely based on their contribution, achieving this with negligible performance degradation. Experimental results demonstrate that heterogeneous MoA outperforms homogeneous MoE-LoRA methods in both performance and parameter efficiency. Our project is available at https://github.com/DCDmllm/MoA.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.05928
β’ PDF: https://arxiv.org/pdf/2506.05928
β’ Github: https://github.com/DCDmllm/MoA
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
GitHub
GitHub - DCDmllm/MoA
Contribute to DCDmllm/MoA development by creating an account on GitHub.
β€1
πΉ Title:
A Multimodal Automated Interpretability Agent
πΉ Publication Date: Published on Apr 22, 2024
πΉ Abstract:
MAIA, a multimodal automated interpretability agent, uses neural models to perform feature interpretation and failure mode discovery for other models, demonstrating comparable results to human experimenters and aiding in reducing sensitivity to spurious features and identifying potential misclassifications. AI-generated summary This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery . It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools commonly used by human interpretability researchers: for synthesizing and editing inputs, computing maximally activating exemplars from real-world datasets, and summarizing and describing experimental results. Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior. We evaluate applications of MAIA to computer vision models. We first characterize MAIA's ability to describe (neuron-level) features in learned representations of images. Across several trained models and a novel dataset of synthetic vision neurons with paired ground-truth descriptions, MAIA produces descriptions comparable to those generated by expert human experimenters. We then show that MAIA can aid in two additional interpretability tasks: reducing sensitivity to spurious features , and automatically identifying inputs likely to be mis-classified.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2404.14394
β’ PDF: https://arxiv.org/pdf/2404.14394
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
A Multimodal Automated Interpretability Agent
πΉ Publication Date: Published on Apr 22, 2024
πΉ Abstract:
MAIA, a multimodal automated interpretability agent, uses neural models to perform feature interpretation and failure mode discovery for other models, demonstrating comparable results to human experimenters and aiding in reducing sensitivity to spurious features and identifying potential misclassifications. AI-generated summary This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery . It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools commonly used by human interpretability researchers: for synthesizing and editing inputs, computing maximally activating exemplars from real-world datasets, and summarizing and describing experimental results. Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior. We evaluate applications of MAIA to computer vision models. We first characterize MAIA's ability to describe (neuron-level) features in learned representations of images. Across several trained models and a novel dataset of synthetic vision neurons with paired ground-truth descriptions, MAIA produces descriptions comparable to those generated by expert human experimenters. We then show that MAIA can aid in two additional interpretability tasks: reducing sensitivity to spurious features , and automatically identifying inputs likely to be mis-classified.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2404.14394
β’ PDF: https://arxiv.org/pdf/2404.14394
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
arXiv.org
A Multimodal Automated Interpretability Agent
This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and...
β€5
Forwarded from Python | Machine Learning | Coding | R
βοΈ JAY HELPS EVERYONE EARN MONEY!$29,000 HE'S GIVING AWAY TODAY!
Everyone can join his channel and make money! He gives away from $200 to $5.000 every day in his channel
https://t.iss.one/+LgzKy2hA4eY0YWNl
β‘οΈFREE ONLY FOR THE FIRST 500 SUBSCRIBERS! FURTHER ENTRY IS PAID! ππ
https://t.iss.one/+LgzKy2hA4eY0YWNl
Everyone can join his channel and make money! He gives away from $200 to $5.000 every day in his channel
https://t.iss.one/+LgzKy2hA4eY0YWNl
β‘οΈFREE ONLY FOR THE FIRST 500 SUBSCRIBERS! FURTHER ENTRY IS PAID! ππ
https://t.iss.one/+LgzKy2hA4eY0YWNl
β€1
πΉ Title:
Scaling Test-time Compute for LLM Agents
πΉ Publication Date: Published on Jun 15
πΉ Abstract:
Systematic exploration of test-time scaling methods in large language agents reveals that computational scaling improves performance, especially through parallel sampling, sequential revision, effective verification, and increased rollout diversity. AI-generated summary Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms ; (2) sequential revision strategies; (3) verifiers and merging methods ; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.12928
β’ PDF: https://arxiv.org/pdf/2506.12928
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Scaling Test-time Compute for LLM Agents
πΉ Publication Date: Published on Jun 15
πΉ Abstract:
Systematic exploration of test-time scaling methods in large language agents reveals that computational scaling improves performance, especially through parallel sampling, sequential revision, effective verification, and increased rollout diversity. AI-generated summary Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms ; (2) sequential revision strategies; (3) verifiers and merging methods ; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.12928
β’ PDF: https://arxiv.org/pdf/2506.12928
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
arXiv.org
Scaling Test-time Compute for LLM Agents
Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying...
β€3
Article Title:
Learning Compact Vision Tokens for Efficient Large Multimodal Models
Article Date: 8 Jun 2025
Article Description:
Large multimodal models (LMMs) suffer significant computational challenges due to the high cost of Large Language Models (LLMs) and the quadratic complexity of processing long vision token sequences. In this paper, we explore the spatial redundancy among vision tokens and shorten the length of vision token sequences for inference acceleration. Specifically, we propose a Spatial Token Fusion (STF) method to learn compact vision tokens for short vision token sequence, where spatial-adjacent tokens are fused into one. Meanwhile, weight-frozen vision encoder can not well adapt to the demand of extensive downstream vision-language tasks. To this end, we further introduce a Multi-Block Token Fusion (MBTF) module to supplement multi-granularity features for the reduced token sequence. Overall, we combine STF and MBTF module to balance token reduction and information preservation, thereby improving inference efficiency without sacrificing multimodal reasoning capabilities. Experimental results demonstrate that our method based on LLaVA-1.5 achieves comparable or even superior performance to the baseline on 8 popular vision-language benchmarks with only $25\%$ vision tokens of baseline. The source code and trained weights are available at https://github.com/visresearch/LLaVA-STF.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.07138v1.pdf
GitHub:
β’ https://github.com/visresearch/LLaVA-STF
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Learning Compact Vision Tokens for Efficient Large Multimodal Models
Article Date: 8 Jun 2025
Article Description:
Large multimodal models (LMMs) suffer significant computational challenges due to the high cost of Large Language Models (LLMs) and the quadratic complexity of processing long vision token sequences. In this paper, we explore the spatial redundancy among vision tokens and shorten the length of vision token sequences for inference acceleration. Specifically, we propose a Spatial Token Fusion (STF) method to learn compact vision tokens for short vision token sequence, where spatial-adjacent tokens are fused into one. Meanwhile, weight-frozen vision encoder can not well adapt to the demand of extensive downstream vision-language tasks. To this end, we further introduce a Multi-Block Token Fusion (MBTF) module to supplement multi-granularity features for the reduced token sequence. Overall, we combine STF and MBTF module to balance token reduction and information preservation, thereby improving inference efficiency without sacrificing multimodal reasoning capabilities. Experimental results demonstrate that our method based on LLaVA-1.5 achieves comparable or even superior performance to the baseline on 8 popular vision-language benchmarks with only $25\%$ vision tokens of baseline. The source code and trained weights are available at https://github.com/visresearch/LLaVA-STF.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.07138v1.pdf
GitHub:
β’ https://github.com/visresearch/LLaVA-STF
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€1
πΉ Title:
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
πΉ Publication Date: Published on Jun 19
πΉ Abstract:
GRPO-CARE, a reinforcement learning framework optimizing for consistency and correctness, outperforms standard GRPO on a new video understanding benchmark, SEED-Bench-R1, improving both performance and logical coherence in multimodal large language models. AI-generated summary Recent reinforcement learning approaches, such as outcome-supervised GRPO , have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1 , a benchmark with complex real-world videos requiring balanced perception and reasoning. It offers a large training set and evaluates generalization across three escalating challenges: in-distribution , cross-environment , and cross-environment-task scenarios. Using SEED-Bench-R1 , we find that standard GRPO, while improving answer accuracy , often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate. This stems from reward signals focusing solely on final answers, encouraging shortcuts , and strict KL penalties limiting exploration .To address this, we propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision. GRPO-CARE introduces a two-tiered reward : (1) a base reward for answer correctness, and (2) an adaptive consistency bonus , computed by comparing the model's reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers.This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. Replacing KL penalties with this adaptive bonus, GRPO-CARE outperforms standard GRPO on SEED-Bench-R1 , achieving a 6.7% performance gain on the hardest evaluation level and a 24.5% improvement in consistency. It also shows strong transferability , improving model performance across diverse video understanding benchmarks . Our work contributes a systematically designed benchmark and a generalizable post-training framework, advancing the development of more interpretable and robust MLLMs.
πΉ Links:
β’ arXiv Page: https://arxiv.org/pdf/2506.16141
β’ PDF: https://arxiv.org/pdf/2506.16141
β’ Github: https://github.com/TencentARC/GRPO-CARE
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
πΉ Publication Date: Published on Jun 19
πΉ Abstract:
GRPO-CARE, a reinforcement learning framework optimizing for consistency and correctness, outperforms standard GRPO on a new video understanding benchmark, SEED-Bench-R1, improving both performance and logical coherence in multimodal large language models. AI-generated summary Recent reinforcement learning approaches, such as outcome-supervised GRPO , have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1 , a benchmark with complex real-world videos requiring balanced perception and reasoning. It offers a large training set and evaluates generalization across three escalating challenges: in-distribution , cross-environment , and cross-environment-task scenarios. Using SEED-Bench-R1 , we find that standard GRPO, while improving answer accuracy , often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate. This stems from reward signals focusing solely on final answers, encouraging shortcuts , and strict KL penalties limiting exploration .To address this, we propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision. GRPO-CARE introduces a two-tiered reward : (1) a base reward for answer correctness, and (2) an adaptive consistency bonus , computed by comparing the model's reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers.This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. Replacing KL penalties with this adaptive bonus, GRPO-CARE outperforms standard GRPO on SEED-Bench-R1 , achieving a 6.7% performance gain on the hardest evaluation level and a 24.5% improvement in consistency. It also shows strong transferability , improving model performance across diverse video understanding benchmarks . Our work contributes a systematically designed benchmark and a generalizable post-training framework, advancing the development of more interpretable and robust MLLMs.
πΉ Links:
β’ arXiv Page: https://arxiv.org/pdf/2506.16141
β’ PDF: https://arxiv.org/pdf/2506.16141
β’ Github: https://github.com/TencentARC/GRPO-CARE
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€6
Article Title:
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment
Article Date: 30 Mar 2021
Article Description:
Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data-driven models over vector and matrix features. They are more predictive than formulaic alphas, but are too complex to mine into a weakly correlated set. In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes. The new alphas predict returns with high accuracy and can be mined into a weakly correlated set. In addition, we propose a novel alpha mining framework based on AutoML, called AlphaEvolve, to generate the new alphas. To this end, we first propose operators for generating the new alphas and selectively injecting relational domain knowledge to model the relations between stocks. We then accelerate the alpha mining by proposing a pruning technique for redundant alphas. Experiments show that AlphaEvolve can evolve initial alphas into the new alphas with high returns and weak correlations.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2103.16196v2.pdf
GitHub:
β’ https://github.com/codelion/openevolve
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment
Article Date: 30 Mar 2021
Article Description:
Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data-driven models over vector and matrix features. They are more predictive than formulaic alphas, but are too complex to mine into a weakly correlated set. In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes. The new alphas predict returns with high accuracy and can be mined into a weakly correlated set. In addition, we propose a novel alpha mining framework based on AutoML, called AlphaEvolve, to generate the new alphas. To this end, we first propose operators for generating the new alphas and selectively injecting relational domain knowledge to model the relations between stocks. We then accelerate the alpha mining by proposing a pruning technique for redundant alphas. Experiments show that AlphaEvolve can evolve initial alphas into the new alphas with high returns and weak correlations.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2103.16196v2.pdf
GitHub:
β’ https://github.com/codelion/openevolve
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€4
Article Title:
Aligning Multimodal LLM with Human Preference: A Survey
Article Date: 18 Mar 2025
Article Description:
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.14504v1.pdf
GitHub:
β’ https://github.com/bradyfu/awesome-multimodal-large-language-models
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Aligning Multimodal LLM with Human Preference: A Survey
Article Date: 18 Mar 2025
Article Description:
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.14504v1.pdf
GitHub:
β’ https://github.com/bradyfu/awesome-multimodal-large-language-models
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€3
Article Title:
A Survey of LLM $\times$ DATA
Article Date: 24 May 2025
Article Description:
The integration of large language model (LLM) and data management (DATA) is rapidly redefining both domains. In this survey, we comprehensively review the bidirectional relationships. On the one hand, DATA4LLM, spanning large-scale data processing, storage, and serving, feeds LLMs with high quality, diversity, and timeliness of data required for stages like pre-training, post-training, retrieval-augmented generation, and agentic workflows: (i) Data processing for LLMs includes scalable acquisition, deduplication, filtering, selection, domain mixing, and synthetic augmentation; (ii) Data Storage for LLMs focuses on efficient data and model formats, distributed and heterogeneous storage hierarchies, KV-cache management, and fault-tolerant checkpointing; (iii) Data serving for LLMs tackles challenges in RAG (e.g., knowledge post-processing), LLM inference (e.g., prompt compression, data provenance), and training strategies (e.g., data packing and shuffling). On the other hand, in LLM4DATA, LLMs are emerging as general-purpose engines for data management. We review recent advances in (i) data manipulation, including automatic data cleaning, integration, discovery; (ii) data analysis, covering reasoning over structured, semi-structured, and unstructured data, and (iii) system optimization (e.g., configuration tuning, query rewriting, anomaly diagnosis), powered by LLM techniques like retrieval-augmented prompting, task-specialized fine-tuning, and multi-agent collaboration.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18458v3.pdf
GitHub:
β’ https://github.com/weaidb/awsome-data-llm
β’ https://github.com/weaidb/awesome-data-llm
Datasets:
β’ HumanEval
β’ C4
β’ CCNet
β’ MassiveText
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
A Survey of LLM $\times$ DATA
Article Date: 24 May 2025
Article Description:
The integration of large language model (LLM) and data management (DATA) is rapidly redefining both domains. In this survey, we comprehensively review the bidirectional relationships. On the one hand, DATA4LLM, spanning large-scale data processing, storage, and serving, feeds LLMs with high quality, diversity, and timeliness of data required for stages like pre-training, post-training, retrieval-augmented generation, and agentic workflows: (i) Data processing for LLMs includes scalable acquisition, deduplication, filtering, selection, domain mixing, and synthetic augmentation; (ii) Data Storage for LLMs focuses on efficient data and model formats, distributed and heterogeneous storage hierarchies, KV-cache management, and fault-tolerant checkpointing; (iii) Data serving for LLMs tackles challenges in RAG (e.g., knowledge post-processing), LLM inference (e.g., prompt compression, data provenance), and training strategies (e.g., data packing and shuffling). On the other hand, in LLM4DATA, LLMs are emerging as general-purpose engines for data management. We review recent advances in (i) data manipulation, including automatic data cleaning, integration, discovery; (ii) data analysis, covering reasoning over structured, semi-structured, and unstructured data, and (iii) system optimization (e.g., configuration tuning, query rewriting, anomaly diagnosis), powered by LLM techniques like retrieval-augmented prompting, task-specialized fine-tuning, and multi-agent collaboration.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18458v3.pdf
GitHub:
β’ https://github.com/weaidb/awsome-data-llm
β’ https://github.com/weaidb/awesome-data-llm
Datasets:
β’ HumanEval
β’ C4
β’ CCNet
β’ MassiveText
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€2
Article Title:
VACE: All-in-One Video Creation and Editing
Article Date: 10 Mar 2025
Article Description:
Diffusion Transformer has demonstrated powerful capability and scalability in generating high-quality images and videos. Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation. However, due to the intrinsic demands for consistency across both temporal and spatial dynamics, achieving a unified approach for video synthesis remains challenging. We introduce VACE, which enables users to perform Video tasks within an All-in-one framework for Creation and Editing. These tasks include reference-to-video generation, video-to-video editing, and masked video-to-video editing. Specifically, we effectively integrate the requirements of various tasks by organizing video task inputs, such as editing, reference, and masking, into a unified interface referred to as the Video Condition Unit (VCU). Furthermore, by utilizing a Context Adapter structure, we inject different task concepts into the model using formalized representations of temporal and spatial dimensions, allowing it to handle arbitrary video synthesis tasks flexibly. Extensive experiments demonstrate that the unified model of VACE achieves performance on par with task-specific models across various subtasks. Simultaneously, it enables diverse applications through versatile task combinations. Project page: https://ali-vilab.github.io/VACE-Page/.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.07598v1.pdf
GitHub:
β’ https://github.com/wan-video/wan2.1
β’ https://github.com/ali-vilab/vace
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
VACE: All-in-One Video Creation and Editing
Article Date: 10 Mar 2025
Article Description:
Diffusion Transformer has demonstrated powerful capability and scalability in generating high-quality images and videos. Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation. However, due to the intrinsic demands for consistency across both temporal and spatial dynamics, achieving a unified approach for video synthesis remains challenging. We introduce VACE, which enables users to perform Video tasks within an All-in-one framework for Creation and Editing. These tasks include reference-to-video generation, video-to-video editing, and masked video-to-video editing. Specifically, we effectively integrate the requirements of various tasks by organizing video task inputs, such as editing, reference, and masking, into a unified interface referred to as the Video Condition Unit (VCU). Furthermore, by utilizing a Context Adapter structure, we inject different task concepts into the model using formalized representations of temporal and spatial dimensions, allowing it to handle arbitrary video synthesis tasks flexibly. Extensive experiments demonstrate that the unified model of VACE achieves performance on par with task-specific models across various subtasks. Simultaneously, it enables diverse applications through versatile task combinations. Project page: https://ali-vilab.github.io/VACE-Page/.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.07598v1.pdf
GitHub:
β’ https://github.com/wan-video/wan2.1
β’ https://github.com/ali-vilab/vace
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€2
Article Title:
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Article Date: Fiona Ryan
Article Description:
We address the problem of gaze target estimation, which aims to predict where a person is looking in a scene. Predicting a person's gaze target requires reasoning both about the person's appearance and the contents of the scene. Prior works have developed increasingly complex, hand-crafted pipelines for gaze target estimation that carefully fuse features from separate scene encoders, head encoders, and auxiliary models for signals like depth and pose. Motivated by the success of general-purpose feature extractors on a variety of visual tasks, we propose Gaze-LLE, a novel transformer framework that streamlines gaze target estimation by leveraging features from a frozen DINOv2 encoder. We extract a single feature representation for the scene, and apply a person-specific positional prompt to decode gaze with a lightweight module. We demonstrate state-of-the-art performance across several gaze benchmarks and provide extensive analysis to validate our design choices. Our code is available at: https://github.com/fkryan/gazelle .PDFAbstractCVPR 2025 PDFCVPR 2025 Abstract
PDF Download Link:
https://arxiv.org/pdf/2412.09586v1.pdf
GitHub:
β’ https://github.com/fkryan/gazelle
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Article Date: Fiona Ryan
Article Description:
We address the problem of gaze target estimation, which aims to predict where a person is looking in a scene. Predicting a person's gaze target requires reasoning both about the person's appearance and the contents of the scene. Prior works have developed increasingly complex, hand-crafted pipelines for gaze target estimation that carefully fuse features from separate scene encoders, head encoders, and auxiliary models for signals like depth and pose. Motivated by the success of general-purpose feature extractors on a variety of visual tasks, we propose Gaze-LLE, a novel transformer framework that streamlines gaze target estimation by leveraging features from a frozen DINOv2 encoder. We extract a single feature representation for the scene, and apply a person-specific positional prompt to decode gaze with a lightweight module. We demonstrate state-of-the-art performance across several gaze benchmarks and provide extensive analysis to validate our design choices. Our code is available at: https://github.com/fkryan/gazelle .PDFAbstractCVPR 2025 PDFCVPR 2025 Abstract
PDF Download Link:
https://arxiv.org/pdf/2412.09586v1.pdf
GitHub:
β’ https://github.com/fkryan/gazelle
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
GitHub
GitHub - fkryan/gazelle: Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight)
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders (CVPR 2025, Highlight) - fkryan/gazelle
β€1
πΉ Title:
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
πΉ Publication Date: Published on Jun 8
πΉ Abstract:
Frame Guidance offers a training-free method for controlling video generation using frame-level signals, reducing memory usage and enhancing globally coherent video output. AI-generated summary Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale video models for specific tasks, which becomes increasingly impractical as model sizes continue to grow. In this work, we present Frame Guidance, a training-free guidance for controllable video generation based on frame-level signals , such as keyframes , style reference images, sketches , or depth maps . For practical training-free guidance, we propose a simple latent processing method that dramatically reduces memory usage, and apply a novel latent optimization strategy designed for globally coherent video generation . Frame Guidance enables effective control across diverse tasks, including keyframe guidance , stylization , and looping , without any training, compatible with any video models . Experimental results show that Frame Guidance can produce high-quality controlled videos for a wide range of tasks and input signals.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.07177
β’ PDF: https://arxiv.org/pdf/2506.07177
β’ Github: https://frame-guidance-video.github.io/
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
πΉ Publication Date: Published on Jun 8
πΉ Abstract:
Frame Guidance offers a training-free method for controlling video generation using frame-level signals, reducing memory usage and enhancing globally coherent video output. AI-generated summary Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale video models for specific tasks, which becomes increasingly impractical as model sizes continue to grow. In this work, we present Frame Guidance, a training-free guidance for controllable video generation based on frame-level signals , such as keyframes , style reference images, sketches , or depth maps . For practical training-free guidance, we propose a simple latent processing method that dramatically reduces memory usage, and apply a novel latent optimization strategy designed for globally coherent video generation . Frame Guidance enables effective control across diverse tasks, including keyframe guidance , stylization , and looping , without any training, compatible with any video models . Experimental results show that Frame Guidance can produce high-quality controlled videos for a wide range of tasks and input signals.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.07177
β’ PDF: https://arxiv.org/pdf/2506.07177
β’ Github: https://frame-guidance-video.github.io/
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
arXiv.org
Frame Guidance: Training-Free Guidance for Frame-Level Control in...
Advancements in diffusion models have significantly improved video quality, directing attention to fine-grained controllability. However, many existing methods depend on fine-tuning large-scale...
β€1
πΉ Title:
PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling
πΉ Publication Date: Published on Jun 26
πΉ Abstract:
A physics-based skinning and rigging framework called PhysRig uses volumetric representation and continuum mechanics for more realistic and physically plausible animations. AI-generated summary Skinning and rigging are fundamental components in animation, articulated object reconstruction, motion transfer, and 4D generation. Existing approaches predominantly rely on Linear Blend Skinning (LBS) , due to its simplicity and differentiability. However, LBS introduces artifacts such as volume loss and unnatural deformations, and it fails to model elastic materials like soft tissues, fur, and flexible appendages (e.g., elephant trunks, ears, and fatty tissues). In this work, we propose PhysRig: a differentiable physics-based skinning and rigging framework that overcomes these limitations by embedding the rigid skeleton into a volumetric representation (e.g., a tetrahedral mesh ), which is simulated as a deformable soft-body structure driven by the animated skeleton. Our method leverages continuum mechanics and discretizes the object as particles embedded in an Eulerian background grid to ensure differentiability with respect to both material properties and skeletal motion. Additionally, we introduce material prototypes , significantly reducing the learning space while maintaining high expressiveness. To evaluate our framework, we construct a comprehensive synthetic dataset using meshes from Objaverse, The Amazing Animals Zoo, and MixaMo, covering diverse object categories and motion patterns. Our method consistently outperforms traditional LBS-based approaches, generating more realistic and physically plausible results. Furthermore, we demonstrate the applicability of our framework in the pose transfer task highlighting its versatility for articulated object modeling.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.20936
β’ PDF: https://arxiv.org/pdf/2506.20936
β’ Project Page: https://physrig.github.io/
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling
πΉ Publication Date: Published on Jun 26
πΉ Abstract:
A physics-based skinning and rigging framework called PhysRig uses volumetric representation and continuum mechanics for more realistic and physically plausible animations. AI-generated summary Skinning and rigging are fundamental components in animation, articulated object reconstruction, motion transfer, and 4D generation. Existing approaches predominantly rely on Linear Blend Skinning (LBS) , due to its simplicity and differentiability. However, LBS introduces artifacts such as volume loss and unnatural deformations, and it fails to model elastic materials like soft tissues, fur, and flexible appendages (e.g., elephant trunks, ears, and fatty tissues). In this work, we propose PhysRig: a differentiable physics-based skinning and rigging framework that overcomes these limitations by embedding the rigid skeleton into a volumetric representation (e.g., a tetrahedral mesh ), which is simulated as a deformable soft-body structure driven by the animated skeleton. Our method leverages continuum mechanics and discretizes the object as particles embedded in an Eulerian background grid to ensure differentiability with respect to both material properties and skeletal motion. Additionally, we introduce material prototypes , significantly reducing the learning space while maintaining high expressiveness. To evaluate our framework, we construct a comprehensive synthetic dataset using meshes from Objaverse, The Amazing Animals Zoo, and MixaMo, covering diverse object categories and motion patterns. Our method consistently outperforms traditional LBS-based approaches, generating more realistic and physically plausible results. Furthermore, we demonstrate the applicability of our framework in the pose transfer task highlighting its versatility for articulated object modeling.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.20936
β’ PDF: https://arxiv.org/pdf/2506.20936
β’ Project Page: https://physrig.github.io/
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
arXiv.org
PhysRig: Differentiable Physics-Based Skinning and Rigging...
Skinning and rigging are fundamental components in animation, articulated object reconstruction, motion transfer, and 4D generation. Existing approaches predominantly rely on Linear Blend Skinning...
β€4
πΉ Title:
Scaling Test-time Compute for LLM Agents
πΉ Publication Date: Published on Jun 15
πΉ Abstract:
Systematic exploration of test-time scaling methods in large language agents reveals that computational scaling improves performance, especially through parallel sampling, sequential revision, effective verification, and increased rollout diversity. AI-generated summary Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms ; (2) sequential revision strategies; (3) verifiers and merging methods ; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.12928
β’ PDF: https://arxiv.org/pdf/2506.12928
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Scaling Test-time Compute for LLM Agents
πΉ Publication Date: Published on Jun 15
πΉ Abstract:
Systematic exploration of test-time scaling methods in large language agents reveals that computational scaling improves performance, especially through parallel sampling, sequential revision, effective verification, and increased rollout diversity. AI-generated summary Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms ; (2) sequential revision strategies; (3) verifiers and merging methods ; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.
πΉ Links:
β’ arXiv Page: https://arxiv.org/abs/2506.12928
β’ PDF: https://arxiv.org/pdf/2506.12928
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
arXiv.org
Scaling Test-time Compute for LLM Agents
Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying...
β€5
Forwarded from Python | Machine Learning | Coding | R
This channels is for Programmers, Coders, Software Engineers.
0οΈβ£ Python
1οΈβ£ Data Science
2οΈβ£ Machine Learning
3οΈβ£ Data Visualization
4οΈβ£ Artificial Intelligence
5οΈβ£ Data Analysis
6οΈβ£ Statistics
7οΈβ£ Deep Learning
8οΈβ£ programming Languages
β
https://t.iss.one/addlist/8_rRW2scgfRhOTc0
β
https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
Article Title:
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Article Date: 1 Feb 2024
Article Description:
We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. SymbolicAI enables the seamless integration of generative models with a diverse range of solvers by treating large language models (LLMs) as semantic parsers that execute tasks based on both natural and formal language instructions, thus bridging the gap between symbolic reasoning and generative AI. We leverage probabilistic programming principles to tackle complex tasks, and utilize differentiable and classical programming paradigms with their respective strengths. The framework introduces a set of polymorphic, compositional, and self-referential operations for multi-modal data that connects multi-step generative processes and aligns their outputs with user objectives in complex workflows. As a result, we can transition between the capabilities of various foundation models with in-context learning capabilities and specialized, fine-tuned models or solvers proficient in addressing specific problems. Through these operations based on in-context learning our framework enables the creation and evaluation of explainable computational graphs. Finally, we introduce a quality measure and its empirical score for evaluating these computational graphs, and propose a benchmark that compares various state-of-the-art LLMs across a set of complex workflows. We refer to the empirical score as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. The framework codebase and benchmark are linked below.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2402.00854v4.pdf
GitHub:
β’ https://github.com/ExtensityAI/symbolicai
β’ https://github.com/extensityai/benchmark
β’ https://github.com/xpitfire/symbolicai
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Article Date: 1 Feb 2024
Article Description:
We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. SymbolicAI enables the seamless integration of generative models with a diverse range of solvers by treating large language models (LLMs) as semantic parsers that execute tasks based on both natural and formal language instructions, thus bridging the gap between symbolic reasoning and generative AI. We leverage probabilistic programming principles to tackle complex tasks, and utilize differentiable and classical programming paradigms with their respective strengths. The framework introduces a set of polymorphic, compositional, and self-referential operations for multi-modal data that connects multi-step generative processes and aligns their outputs with user objectives in complex workflows. As a result, we can transition between the capabilities of various foundation models with in-context learning capabilities and specialized, fine-tuned models or solvers proficient in addressing specific problems. Through these operations based on in-context learning our framework enables the creation and evaluation of explainable computational graphs. Finally, we introduce a quality measure and its empirical score for evaluating these computational graphs, and propose a benchmark that compares various state-of-the-art LLMs across a set of complex workflows. We refer to the empirical score as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. The framework codebase and benchmark are linked below.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2402.00854v4.pdf
GitHub:
β’ https://github.com/ExtensityAI/symbolicai
β’ https://github.com/extensityai/benchmark
β’ https://github.com/xpitfire/symbolicai
Datasets:
β’ No datasets information available
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€8
Article Title:
Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
Article Date: 19 Feb 2025
Article Description:
We propose LGGPT, an LLM-based model tailored for unified layout generation. First, we propose Arbitrary Layout Instruction (ALI) and Universal Layout Response (ULR) as the uniform I/O template. ALI accommodates arbitrary layout generation task inputs across multiple layout domains, enabling LGGPT to unify both task-generic and domain-generic layout generation hitherto unexplored. Collectively, ALI and ULR boast a succinct structure that forgoes superfluous tokens typically found in existing HTML-based formats, facilitating efficient instruction tuning and boosting unified generation performance. In addition, we propose an Interval Quantization Encoding (IQE) strategy that compresses ALI into a more condensed structure. IQE precisely preserves valid layout clues while eliminating the less informative placeholders, facilitating LGGPT to capture complex and variable layout generation conditions during the unified training process. Experimental results demonstrate that LGGPT achieves superior or on par performance compared to existing methods. Notably, LGGPT strikes a prominent balance between proficiency and efficiency with a compact 1.5B parameter LLM, which beats prior 7B or 175B models even in the most extensive and challenging unified scenario. Furthermore, we underscore the necessity of employing LLMs for unified layout generation and suggest that 1.5B could be an optimal parameter size by comparing LLMs of varying scales. Code is available at https://github.com/NiceRingNode/LGGPT.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2502.14005v1.pdf
GitHub:
β’ https://github.com/niceringnode/lggpt
Datasehttps://t.iss.one/DataScienceTts:
β’ PubLayNet
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
Article Date: 19 Feb 2025
Article Description:
We propose LGGPT, an LLM-based model tailored for unified layout generation. First, we propose Arbitrary Layout Instruction (ALI) and Universal Layout Response (ULR) as the uniform I/O template. ALI accommodates arbitrary layout generation task inputs across multiple layout domains, enabling LGGPT to unify both task-generic and domain-generic layout generation hitherto unexplored. Collectively, ALI and ULR boast a succinct structure that forgoes superfluous tokens typically found in existing HTML-based formats, facilitating efficient instruction tuning and boosting unified generation performance. In addition, we propose an Interval Quantization Encoding (IQE) strategy that compresses ALI into a more condensed structure. IQE precisely preserves valid layout clues while eliminating the less informative placeholders, facilitating LGGPT to capture complex and variable layout generation conditions during the unified training process. Experimental results demonstrate that LGGPT achieves superior or on par performance compared to existing methods. Notably, LGGPT strikes a prominent balance between proficiency and efficiency with a compact 1.5B parameter LLM, which beats prior 7B or 175B models even in the most extensive and challenging unified scenario. Furthermore, we underscore the necessity of employing LLMs for unified layout generation and suggest that 1.5B could be an optimal parameter size by comparing LLMs of varying scales. Code is available at https://github.com/NiceRingNode/LGGPT.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2502.14005v1.pdf
GitHub:
β’ https://github.com/niceringnode/lggpt
Datasehttps://t.iss.one/DataScienceTts:
β’ PubLayNet
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€3
Article Title:
Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science
Article Date: 4 Jun 2025
Article Description:
Contemporary approaches to assisted scientific discovery use language models to automatically generate large numbers of potential hypothesis to test, while also automatically generating code-based experiments to test those hypotheses. While hypotheses can be comparatively inexpensive to generate, automated experiments can be costly, particularly when run at scale (i.e. thousands of experiments). Developing the capacity to filter hypotheses based on their feasibility would allow discovery systems to run at scale, while increasing their likelihood of making significant discoveries. In this work we introduce Matter-of-Fact, a challenge dataset for determining the feasibility of hypotheses framed as claims. Matter-of-Fact includes 8.4k claims extracted from scientific articles spanning four high-impact contemporary materials science topics, including superconductors, semiconductors, batteries, and aerospace materials, while including qualitative and quantitative claims from theoretical, experimental, and code/simulation results. We show that strong baselines that include retrieval augmented generation over scientific literature and code generation fail to exceed 72% performance on this task (chance performance is 50%), while domain-expert verification suggests nearly all are solvable -- highlighting both the difficulty of this task for current models, and the potential to accelerate scientific discovery by making near-term progress.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.04410v1.pdf
GitHub:
β’ https://github.com/cognitiveailab/matter-of-fact
Datasets:
β’ COVID-Fact
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science
Article Date: 4 Jun 2025
Article Description:
Contemporary approaches to assisted scientific discovery use language models to automatically generate large numbers of potential hypothesis to test, while also automatically generating code-based experiments to test those hypotheses. While hypotheses can be comparatively inexpensive to generate, automated experiments can be costly, particularly when run at scale (i.e. thousands of experiments). Developing the capacity to filter hypotheses based on their feasibility would allow discovery systems to run at scale, while increasing their likelihood of making significant discoveries. In this work we introduce Matter-of-Fact, a challenge dataset for determining the feasibility of hypotheses framed as claims. Matter-of-Fact includes 8.4k claims extracted from scientific articles spanning four high-impact contemporary materials science topics, including superconductors, semiconductors, batteries, and aerospace materials, while including qualitative and quantitative claims from theoretical, experimental, and code/simulation results. We show that strong baselines that include retrieval augmented generation over scientific literature and code generation fail to exceed 72% performance on this task (chance performance is 50%), while domain-expert verification suggests nearly all are solvable -- highlighting both the difficulty of this task for current models, and the potential to accelerate scientific discovery by making near-term progress.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.04410v1.pdf
GitHub:
β’ https://github.com/cognitiveailab/matter-of-fact
Datasets:
β’ COVID-Fact
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€4
Article Title:
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Article Date: 23 Aug 2023
Article Description:
Despite the progress of foundation models, knowledge-based reasoning remains a persistent challenge due to their limited capacity for knowledge recall and inference. Existing methods primarily focus on encouraging these models to plan and solve problems or extensively sample reasoning chains independently. However, these methods often overlook conceptual errors and inferential fallacies, inevitably leading to a series of notorious issues such as misleading conclusions, cognitive biases, and reduced decision quality. While explicit modeling of causality is argued to hold promise in addressing these issues, contemporary research efforts have thus far fallen short in achieving causality-based foundation models. Drawing inspiration from the orchestration of diverse specialized agents collaborating to tackle intricate tasks, we propose a framework named Causal-Consistency Chain-of-Thought (CaCo-CoT) that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models, involving a set of reasoners and evaluators. These agents collaboratively work within a reasoning-and-consensus paradigm to improve faithfulness. The reasoners are tasked with generating reasoning chains for knowledge-intensive problems by mimicking human causal reasoning. Meanwhile, the evaluator scrutinizes the causal consistency of a reasoner's reasoning chain from a non-causal and a counterfactual perspective. Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations across text-based and multi-modal knowledge reasoning tasks (e.g., science question answering and commonsense reasoning).PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2308.11914v4.pdf
GitHub:
β’ https://github.com/hcplab-sysu/causalvlr
β’ https://github.com/hcplab-sysu/causal-vlreasoning
Datasets:
β’ BoolQ
β’ ScienceQA
β’ Com2Sense
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs
Article Date: 23 Aug 2023
Article Description:
Despite the progress of foundation models, knowledge-based reasoning remains a persistent challenge due to their limited capacity for knowledge recall and inference. Existing methods primarily focus on encouraging these models to plan and solve problems or extensively sample reasoning chains independently. However, these methods often overlook conceptual errors and inferential fallacies, inevitably leading to a series of notorious issues such as misleading conclusions, cognitive biases, and reduced decision quality. While explicit modeling of causality is argued to hold promise in addressing these issues, contemporary research efforts have thus far fallen short in achieving causality-based foundation models. Drawing inspiration from the orchestration of diverse specialized agents collaborating to tackle intricate tasks, we propose a framework named Causal-Consistency Chain-of-Thought (CaCo-CoT) that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models, involving a set of reasoners and evaluators. These agents collaboratively work within a reasoning-and-consensus paradigm to improve faithfulness. The reasoners are tasked with generating reasoning chains for knowledge-intensive problems by mimicking human causal reasoning. Meanwhile, the evaluator scrutinizes the causal consistency of a reasoner's reasoning chain from a non-causal and a counterfactual perspective. Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations across text-based and multi-modal knowledge reasoning tasks (e.g., science question answering and commonsense reasoning).PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2308.11914v4.pdf
GitHub:
β’ https://github.com/hcplab-sysu/causalvlr
β’ https://github.com/hcplab-sysu/causal-vlreasoning
Datasets:
β’ BoolQ
β’ ScienceQA
β’ Com2Sense
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€3
πΉ Title:
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
πΉ Publication Date: Published on Jun 24
πΉ Abstract:
Outlier-Safe Pre-Training improves large language model quantization performance by preventing extreme activation outliers through innovative training techniques. AI-generated summary Extreme activation outliers in Large Language Models (LLMs) critically degrade quantization performance , hindering efficient on-device deployment. While channel-wise operations and adaptive gradient scaling are recognized causes, practical mitigation remains challenging. We introduce Outlier-Safe Pre-Training (OSP), a practical guideline that proactively prevents outlier formation rather than relying on post-hoc mitigation. OSP combines three key innovations: (1) the Muon optimizer , eliminating privileged bases while maintaining training efficiency; (2) Single-Scale RMSNorm , preventing channel-wise amplification; and (3) a learnable embedding projection , redistributing activation magnitudes originating from embedding matrices. We validate OSP by training a 1.4B-parameter model on 1 trillion tokens, which is the first production-scale LLM trained without such outliers. Under aggressive 4-bit quantization, our OSP model achieves a 35.7 average score across 10 benchmarks (compared to 26.5 for an Adam-trained model), with only a 2% training overhead. Remarkably, OSP models exhibit near-zero excess kurtosis (0.04) compared to extreme values (1818.56) in standard models, fundamentally altering LLM quantization behavior. Our work demonstrates that outliers are not inherent to LLMs but are consequences of training strategies, paving the way for more efficient LLM deployment . The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Outlier-Safe-Pre-Training.
πΉ Links:
β’ arXiv Page: https://arxivexplained.com/papers/outlier-safe-pre-training-for-robust-4-bit-quantization-of-large-language-models
β’ PDF: https://arxiv.org/pdf/2506.19697
β’ Project Page: https://huggingface.co/papers?q=learnable%20embedding%20projection
β’ Github: https://github.com/dmis-lab/Outlier-Safe-Pre-Training
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
πΉ Publication Date: Published on Jun 24
πΉ Abstract:
Outlier-Safe Pre-Training improves large language model quantization performance by preventing extreme activation outliers through innovative training techniques. AI-generated summary Extreme activation outliers in Large Language Models (LLMs) critically degrade quantization performance , hindering efficient on-device deployment. While channel-wise operations and adaptive gradient scaling are recognized causes, practical mitigation remains challenging. We introduce Outlier-Safe Pre-Training (OSP), a practical guideline that proactively prevents outlier formation rather than relying on post-hoc mitigation. OSP combines three key innovations: (1) the Muon optimizer , eliminating privileged bases while maintaining training efficiency; (2) Single-Scale RMSNorm , preventing channel-wise amplification; and (3) a learnable embedding projection , redistributing activation magnitudes originating from embedding matrices. We validate OSP by training a 1.4B-parameter model on 1 trillion tokens, which is the first production-scale LLM trained without such outliers. Under aggressive 4-bit quantization, our OSP model achieves a 35.7 average score across 10 benchmarks (compared to 26.5 for an Adam-trained model), with only a 2% training overhead. Remarkably, OSP models exhibit near-zero excess kurtosis (0.04) compared to extreme values (1818.56) in standard models, fundamentally altering LLM quantization behavior. Our work demonstrates that outliers are not inherent to LLMs but are consequences of training strategies, paving the way for more efficient LLM deployment . The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Outlier-Safe-Pre-Training.
πΉ Links:
β’ arXiv Page: https://arxivexplained.com/papers/outlier-safe-pre-training-for-robust-4-bit-quantization-of-large-language-models
β’ PDF: https://arxiv.org/pdf/2506.19697
β’ Project Page: https://huggingface.co/papers?q=learnable%20embedding%20projection
β’ Github: https://github.com/dmis-lab/Outlier-Safe-Pre-Training
πΉ Datasets citing this paper:
No datasets found
πΉ Spaces citing this paper:
No spaces found
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
GitHub
GitHub - dmis-lab/Outlier-Safe-Pre-Training: [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Languageβ¦
[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models - dmis-lab/Outlier-Safe-Pre-Training
β€1
Article Title:
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games
Article Date: 5 Jun 2025
Article Description:
LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are inherently asynchronous. For example, in group chats, online team meetings, or social games, there is no inherent notion of turns; therefore, the decision of when to speak forms a crucial part of the participant's decision making. In this work, we develop an adaptive asynchronous LLM-agent which, in addition to determining what to say, also decides when to say it. To evaluate our agent, we collect a unique dataset of online Mafia games, including both human participants, as well as our asynchronous agent. Overall, our agent performs on par with human players, both in game performance, as well as in its ability to blend in with the other human players. Our analysis shows that the agent's behavior in deciding when to speak closely mirrors human patterns, although differences emerge in message content. We release all our data and code to support and encourage further research for more realistic asynchronous communication between LLM agents. This work paves the way for integration of LLMs into realistic human group settings, from assistance in team discussions to educational and professional environments where complex social dynamics must be navigated.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.05309v1.pdf
GitHub:
β’ https://github.com/niveck/LLMafia
Datasets:
β’ LLMafia
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games
Article Date: 5 Jun 2025
Article Description:
LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are inherently asynchronous. For example, in group chats, online team meetings, or social games, there is no inherent notion of turns; therefore, the decision of when to speak forms a crucial part of the participant's decision making. In this work, we develop an adaptive asynchronous LLM-agent which, in addition to determining what to say, also decides when to say it. To evaluate our agent, we collect a unique dataset of online Mafia games, including both human participants, as well as our asynchronous agent. Overall, our agent performs on par with human players, both in game performance, as well as in its ability to blend in with the other human players. Our analysis shows that the agent's behavior in deciding when to speak closely mirrors human patterns, although differences emerge in message content. We release all our data and code to support and encourage further research for more realistic asynchronous communication between LLM agents. This work paves the way for integration of LLMs into realistic human group settings, from assistance in team discussions to educational and professional environments where complex social dynamics must be navigated.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.05309v1.pdf
GitHub:
β’ https://github.com/niveck/LLMafia
Datasets:
β’ LLMafia
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
β€3