Article Title:
MNN: A Universal and Efficient Inference Engine
Article Date: 27 Feb 2020
Article Description:
Deploying deep learning models on mobile devices draws more and more attention recently. However, designing an efficient inference engine on devices is under the great challenges of model compatibility, device diversity, and resource limitation. To deal with these challenges, we propose Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications. In this paper, the contributions of MNN include: (1) presenting a mechanism called pre-inference that manages to conduct runtime optimization; (2)deliveringthorough kernel optimization on operators to achieve optimal computation performance; (3) introducing backend abstraction module which enables hybrid scheduling and keeps the engine lightweight. Extensive benchmark experiments demonstrate that MNN performs favorably against other popular lightweight deep learning frameworks. MNN is available to public at: https://github.com/alibaba/MNN.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2002.12418v1.pdf
GitHub:
• https://github.com/alibaba/MNN
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
MNN: A Universal and Efficient Inference Engine
Article Date: 27 Feb 2020
Article Description:
Deploying deep learning models on mobile devices draws more and more attention recently. However, designing an efficient inference engine on devices is under the great challenges of model compatibility, device diversity, and resource limitation. To deal with these challenges, we propose Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications. In this paper, the contributions of MNN include: (1) presenting a mechanism called pre-inference that manages to conduct runtime optimization; (2)deliveringthorough kernel optimization on operators to achieve optimal computation performance; (3) introducing backend abstraction module which enables hybrid scheduling and keeps the engine lightweight. Extensive benchmark experiments demonstrate that MNN performs favorably against other popular lightweight deep learning frameworks. MNN is available to public at: https://github.com/alibaba/MNN.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2002.12418v1.pdf
GitHub:
• https://github.com/alibaba/MNN
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
🔥 The coolest AI bot on Telegram
💢 Completely free and knows everything, from simple questions to complex problems.
☕️ Helps you with anything in the easiest and fastest way possible.
♨️ You can even choose girlfriend or boyfriend mode and chat as if you’re talking to a real person 😋
💵 Includes weekly and monthly airdrops!❗️
😵💫 Bot ID: @chatgpt_officialbot
💎 The best part is, even group admins can use it right inside their groups! ✨
📺 Try now:
• Type
• Type
• Type
Or just say
💢 Completely free and knows everything, from simple questions to complex problems.
☕️ Helps you with anything in the easiest and fastest way possible.
♨️ You can even choose girlfriend or boyfriend mode and chat as if you’re talking to a real person 😋
💵 Includes weekly and monthly airdrops!❗️
😵💫 Bot ID: @chatgpt_officialbot
💎 The best part is, even group admins can use it right inside their groups! ✨
📺 Try now:
• Type
FunFact! for a jaw-dropping AI trivia.• Type
RecipePlease! for a quick, tasty meal idea.• Type
JokeTime! for an instant laugh.Or just say
Surprise me! and I'll pick something awesome for you. 🤖✨Article Title:
Efficient Part-level 3D Object Generation via Dual Volume Packing
Article Date: 11 Jun 2025
Article Description:
Recent progress in 3D object generation has greatly improved both the quality and efficiency. However, most existing methods generate a single mesh with all parts fused together, which limits the ability to edit or manipulate individual parts. A key challenge is that different objects may have a varying number of parts. To address this, we propose a new end-to-end framework for part-level 3D object generation. Given a single input image, our method generates high-quality 3D objects with an arbitrary number of complete and semantically meaningful parts. We introduce a dual volume packing strategy that organizes all parts into two complementary volumes, allowing for the creation of complete and interleaved parts that assemble into the final object. Experiments show that our model achieves better quality, diversity, and generalization than previous image-based part-level generation methods.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.09980v1.pdf
GitHub:
• https://github.com/nvlabs/partpacker
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Efficient Part-level 3D Object Generation via Dual Volume Packing
Article Date: 11 Jun 2025
Article Description:
Recent progress in 3D object generation has greatly improved both the quality and efficiency. However, most existing methods generate a single mesh with all parts fused together, which limits the ability to edit or manipulate individual parts. A key challenge is that different objects may have a varying number of parts. To address this, we propose a new end-to-end framework for part-level 3D object generation. Given a single input image, our method generates high-quality 3D objects with an arbitrary number of complete and semantically meaningful parts. We introduce a dual volume packing strategy that organizes all parts into two complementary volumes, allowing for the creation of complete and interleaved parts that assemble into the final object. Experiments show that our model achieves better quality, diversity, and generalization than previous image-based part-level generation methods.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.09980v1.pdf
GitHub:
• https://github.com/nvlabs/partpacker
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title:
Transformers without Normalization
🔹 Publication Date: Published on Mar 13
🔹 Abstract:
Dynamic Tanh (DyT) replaces normalization layers in Transformers, achieving equivalent or superior performance without hyperparameter tuning across various tasks. AI-generated summary Normalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple technique. We introduce Dynamic Tanh ( DyT ), an element-wise operation DyT (x) = tanh(alpha x), as a drop-in replacement for normalization layers in Transformers . DyT is inspired by the observation that layer normalization in Transformers often produces tanh-like, S-shaped input-output mappings. By incorporating DyT , Transformers without normalization can match or exceed the performance of their normalized counterparts, mostly without hyperparameter tuning . We validate the effectiveness of Transformers with DyT across diverse settings, ranging from recognition to generation , supervised to self-supervised learning , and computer vision to language models . These findings challenge the conventional understanding that normalization layers are indispensable in modern neural networks, and offer new insights into their role in deep networks.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2503.10622
• PDF: https://arxiv.org/pdf/2503.10622
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Transformers without Normalization
🔹 Publication Date: Published on Mar 13
🔹 Abstract:
Dynamic Tanh (DyT) replaces normalization layers in Transformers, achieving equivalent or superior performance without hyperparameter tuning across various tasks. AI-generated summary Normalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple technique. We introduce Dynamic Tanh ( DyT ), an element-wise operation DyT (x) = tanh(alpha x), as a drop-in replacement for normalization layers in Transformers . DyT is inspired by the observation that layer normalization in Transformers often produces tanh-like, S-shaped input-output mappings. By incorporating DyT , Transformers without normalization can match or exceed the performance of their normalized counterparts, mostly without hyperparameter tuning . We validate the effectiveness of Transformers with DyT across diverse settings, ranging from recognition to generation , supervised to self-supervised learning , and computer vision to language models . These findings challenge the conventional understanding that normalization layers are indispensable in modern neural networks, and offer new insights into their role in deep networks.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2503.10622
• PDF: https://arxiv.org/pdf/2503.10622
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
Forwarded from Python | Machine Learning | Coding | R
This channels is for Programmers, Coders, Software Engineers.
0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages
✅ https://t.iss.one/addlist/8_rRW2scgfRhOTc0
✅ https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
❤1
🔹 Title:
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
🔹 Publication Date: Published on Jun 19
🔹 Abstract:
GRPO-CARE, a reinforcement learning framework optimizing for consistency and correctness, outperforms standard GRPO on a new video understanding benchmark, SEED-Bench-R1, improving both performance and logical coherence in multimodal large language models. AI-generated summary Recent reinforcement learning approaches, such as outcome-supervised GRPO , have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1 , a benchmark with complex real-world videos requiring balanced perception and reasoning. It offers a large training set and evaluates generalization across three escalating challenges: in-distribution , cross-environment , and cross-environment-task scenarios. Using SEED-Bench-R1 , we find that standard GRPO, while improving answer accuracy , often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate. This stems from reward signals focusing solely on final answers, encouraging shortcuts , and strict KL penalties limiting exploration .To address this, we propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision. GRPO-CARE introduces a two-tiered reward : (1) a base reward for answer correctness, and (2) an adaptive consistency bonus , computed by comparing the model's reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers.This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. Replacing KL penalties with this adaptive bonus, GRPO-CARE outperforms standard GRPO on SEED-Bench-R1 , achieving a 6.7% performance gain on the hardest evaluation level and a 24.5% improvement in consistency. It also shows strong transferability , improving model performance across diverse video understanding benchmarks . Our work contributes a systematically designed benchmark and a generalizable post-training framework, advancing the development of more interpretable and robust MLLMs.
🔹 Links:
• arXiv Page: https://arxiv.org/pdf/2506.16141
• PDF: https://arxiv.org/pdf/2506.16141
• Github: https://github.com/TencentARC/GRPO-CARE
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
🔹 Publication Date: Published on Jun 19
🔹 Abstract:
GRPO-CARE, a reinforcement learning framework optimizing for consistency and correctness, outperforms standard GRPO on a new video understanding benchmark, SEED-Bench-R1, improving both performance and logical coherence in multimodal large language models. AI-generated summary Recent reinforcement learning approaches, such as outcome-supervised GRPO , have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1 , a benchmark with complex real-world videos requiring balanced perception and reasoning. It offers a large training set and evaluates generalization across three escalating challenges: in-distribution , cross-environment , and cross-environment-task scenarios. Using SEED-Bench-R1 , we find that standard GRPO, while improving answer accuracy , often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate. This stems from reward signals focusing solely on final answers, encouraging shortcuts , and strict KL penalties limiting exploration .To address this, we propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision. GRPO-CARE introduces a two-tiered reward : (1) a base reward for answer correctness, and (2) an adaptive consistency bonus , computed by comparing the model's reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers.This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. Replacing KL penalties with this adaptive bonus, GRPO-CARE outperforms standard GRPO on SEED-Bench-R1 , achieving a 6.7% performance gain on the hardest evaluation level and a 24.5% improvement in consistency. It also shows strong transferability , improving model performance across diverse video understanding benchmarks . Our work contributes a systematically designed benchmark and a generalizable post-training framework, advancing the development of more interpretable and robust MLLMs.
🔹 Links:
• arXiv Page: https://arxiv.org/pdf/2506.16141
• PDF: https://arxiv.org/pdf/2506.16141
• Github: https://github.com/TencentARC/GRPO-CARE
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
Article Title:
MoonCast: High-Quality Zero-Shot Podcast Generation
Article Date: 18 Mar 2025
Article Description:
Recent advances in text-to-speech synthesis have achieved notable success in generating high-quality short utterances for individual speakers. However, these systems still face challenges when extending their capabilities to long, multi-speaker, and spontaneous dialogues, typical of real-world scenarios such as podcasts. These limitations arise from two primary challenges: 1) long speech: podcasts typically span several minutes, exceeding the upper limit of most existing work; 2) spontaneity: podcasts are marked by their spontaneous, oral nature, which sharply contrasts with formal, written contexts; existing works often fall short in capturing this spontaneity. In this paper, we propose MoonCast, a solution for high-quality zero-shot podcast generation, aiming to synthesize natural podcast-style speech from text-only sources (e.g., stories, technical reports, news in TXT, PDF, or Web URL formats) using the voices of unseen speakers. To generate long audio, we adopt a long-context language model-based audio modeling approach utilizing large-scale long-context speech data. To enhance spontaneity, we utilize a podcast generation module to generate scripts with spontaneous details, which have been empirically shown to be as crucial as the text-to-speech modeling itself. Experiments demonstrate that MoonCast outperforms baselines, with particularly notable improvements in spontaneity and coherence.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.14345v2.pdf
GitHub:
• https://github.com/jzq2000/mooncast
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
MoonCast: High-Quality Zero-Shot Podcast Generation
Article Date: 18 Mar 2025
Article Description:
Recent advances in text-to-speech synthesis have achieved notable success in generating high-quality short utterances for individual speakers. However, these systems still face challenges when extending their capabilities to long, multi-speaker, and spontaneous dialogues, typical of real-world scenarios such as podcasts. These limitations arise from two primary challenges: 1) long speech: podcasts typically span several minutes, exceeding the upper limit of most existing work; 2) spontaneity: podcasts are marked by their spontaneous, oral nature, which sharply contrasts with formal, written contexts; existing works often fall short in capturing this spontaneity. In this paper, we propose MoonCast, a solution for high-quality zero-shot podcast generation, aiming to synthesize natural podcast-style speech from text-only sources (e.g., stories, technical reports, news in TXT, PDF, or Web URL formats) using the voices of unseen speakers. To generate long audio, we adopt a long-context language model-based audio modeling approach utilizing large-scale long-context speech data. To enhance spontaneity, we utilize a podcast generation module to generate scripts with spontaneous details, which have been empirically shown to be as crucial as the text-to-speech modeling itself. Experiments demonstrate that MoonCast outperforms baselines, with particularly notable improvements in spontaneity and coherence.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.14345v2.pdf
GitHub:
• https://github.com/jzq2000/mooncast
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
Article Title:
Article Date: 9 Jun 2025
Article Description:
Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantial manual effort and domain expertise. We present CausalPFN, a single transformer that amortizes this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, it infers causal effects for new observational datasets out-of-the-box. CausalPFN combines ideas from Bayesian causal inference with the large-scale training protocol of prior-fitted networks (PFNs), learning to map raw observations directly to causal effects without any task-specific adjustment. Our approach achieves superior average performance on heterogeneous and average treatment effect estimation benchmarks (IHDP, Lalonde, ACIC). Moreover, it shows competitive performance for real-world policy making on uplift modeling tasks. CausalPFN provides calibrated uncertainty estimates to support reliable decision-making based on Bayesian principles. This ready-to-use model does not require any further training or tuning and takes a step toward automated causal inference (https://github.com/vdblm/CausalPFN).PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.07918v1.pdf
GitHub:
• https://github.com/vdblm/CausalPFN
Datasets:
• IHDP
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Article Date: 9 Jun 2025
Article Description:
Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantial manual effort and domain expertise. We present CausalPFN, a single transformer that amortizes this workflow: trained once on a large library of simulated data-generating processes that satisfy ignorability, it infers causal effects for new observational datasets out-of-the-box. CausalPFN combines ideas from Bayesian causal inference with the large-scale training protocol of prior-fitted networks (PFNs), learning to map raw observations directly to causal effects without any task-specific adjustment. Our approach achieves superior average performance on heterogeneous and average treatment effect estimation benchmarks (IHDP, Lalonde, ACIC). Moreover, it shows competitive performance for real-world policy making on uplift modeling tasks. CausalPFN provides calibrated uncertainty estimates to support reliable decision-making based on Bayesian principles. This ready-to-use model does not require any further training or tuning and takes a step toward automated causal inference (https://github.com/vdblm/CausalPFN).PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.07918v1.pdf
GitHub:
• https://github.com/vdblm/CausalPFN
Datasets:
• IHDP
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
GitHub
GitHub - vdblm/CausalPFN: CausalPFN: Amortized Causal Effect Estimation via In-Context Learning
CausalPFN: Amortized Causal Effect Estimation via In-Context Learning - vdblm/CausalPFN
❤1
🔹 Title:
Use Property-Based Testing to Bridge LLM Code Generation and Validation
🔹 Publication Date: Published on Jun 23
🔹 Abstract:
A novel framework using Property-Based Testing and collaborative LLM-based agents improves code generation correctness and generalization. AI-generated summary Large Language Models (LLMs) excel at code generation , but ensuring their outputs to be functionally correct, especially in complex programming tasks, is a persistent challenge. While traditional Test-Driven Development (TDD) offers a path for code refinement, its efficacy with LLMs is often undermined by the scarcity of high-quality test cases or the pitfalls of automated test generation, including biased tests or inaccurate output predictions that can misdirect the correction process. This paper introduces Property-Generated Solver, a novel framework that leverages Property-Based Testing ( PBT ) to validate high-level program properties or invariants, instead of relying on specific input-output examples. These properties are often simpler to define and verify than directly predicting exhaustive test oracles, breaking the "cycle of self-deception" where tests might share flaws with the code they are meant to validate. Property-Generated Solver employs two collaborative LLM-based agents: a Generator dedicated to code generation and iterative refinement, and a Tester that manages the PBT life-cycle and formulate semantically rich feedback from property violations . The resulting comprehensive and actionable feedback then guides the Generator in its refinement efforts. By establishing PBT as the core validation engine within this iterative, closed-loop paradigm, Property-Generated Solver provides a robust mechanism for steering LLMs towards more correct and generalizable code. Extensive experimental results on multiple code generation benchmarks demonstrate that Property-Generated Solver achieves substantial pass@1 improvements, ranging from 23.1% to 37.3% relative gains over established TDD methods.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.18315
• PDF: https://arxiv.org/pdf/2506.18315
• Github: https://github.com/HeLeHanPrivate/PBTwithCodeGen
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Use Property-Based Testing to Bridge LLM Code Generation and Validation
🔹 Publication Date: Published on Jun 23
🔹 Abstract:
A novel framework using Property-Based Testing and collaborative LLM-based agents improves code generation correctness and generalization. AI-generated summary Large Language Models (LLMs) excel at code generation , but ensuring their outputs to be functionally correct, especially in complex programming tasks, is a persistent challenge. While traditional Test-Driven Development (TDD) offers a path for code refinement, its efficacy with LLMs is often undermined by the scarcity of high-quality test cases or the pitfalls of automated test generation, including biased tests or inaccurate output predictions that can misdirect the correction process. This paper introduces Property-Generated Solver, a novel framework that leverages Property-Based Testing ( PBT ) to validate high-level program properties or invariants, instead of relying on specific input-output examples. These properties are often simpler to define and verify than directly predicting exhaustive test oracles, breaking the "cycle of self-deception" where tests might share flaws with the code they are meant to validate. Property-Generated Solver employs two collaborative LLM-based agents: a Generator dedicated to code generation and iterative refinement, and a Tester that manages the PBT life-cycle and formulate semantically rich feedback from property violations . The resulting comprehensive and actionable feedback then guides the Generator in its refinement efforts. By establishing PBT as the core validation engine within this iterative, closed-loop paradigm, Property-Generated Solver provides a robust mechanism for steering LLMs towards more correct and generalizable code. Extensive experimental results on multiple code generation benchmarks demonstrate that Property-Generated Solver achieves substantial pass@1 improvements, ranging from 23.1% to 37.3% relative gains over established TDD methods.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.18315
• PDF: https://arxiv.org/pdf/2506.18315
• Github: https://github.com/HeLeHanPrivate/PBTwithCodeGen
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Use Property-Based Testing to Bridge LLM Code Generation and Validation
Large Language Models (LLMs) excel at code generation, but ensuring their outputs to be functionally correct, especially in complex programming tasks, is a persistent challenge. While traditional...
Article Title:
TinyLlama: An Open-Source Small Language Model
Article Date: 4 Jan 2024
Article Description:
We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2401.02385v2.pdf
GitHub:
• https://github.com/Lightning-AI/lit-gpt
• https://github.com/jzhang38/tinyllama
Datasets:
• MML
• HellaSwag
• PIQA
• BoolQ
• WinoGrande
• DROP
• BIG-bench
• BBH
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
TinyLlama: An Open-Source Small Language Model
Article Date: 4 Jan 2024
Article Description:
We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2401.02385v2.pdf
GitHub:
• https://github.com/Lightning-AI/lit-gpt
• https://github.com/jzhang38/tinyllama
Datasets:
• MML
• HellaSwag
• PIQA
• BoolQ
• WinoGrande
• DROP
• BIG-bench
• BBH
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
🔹 Title:
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models
🔹 Publication Date: Published on Jun 6
🔹 Abstract:
A heterogeneous Mixture-of-Adapters (MoA) approach enhances parameter-efficient fine-tuning in LLMs by integrating diverse adapter experts, outperforming homogeneous MoE-LoRA methods. AI-generated summary Recent studies integrate Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) to further enhance the performance of parameter-efficient fine-tuning (PEFT) methods in Large Language Model (LLM) applications. Existing methods employ homogeneous MoE-LoRA architectures composed of LoRA experts with either similar or identical structures and capacities. However, these approaches often suffer from representation collapse and expert load imbalance , which negatively impact the potential of LLMs. To address these challenges, we propose a heterogeneous Mixture-of-Adapters (MoA) approach. This method dynamically integrates PEFT adapter experts with diverse structures, leveraging their complementary representational capabilities to foster expert specialization, thereby enhancing the effective transfer of pre-trained knowledge to downstream tasks. MoA supports two variants: (i) Soft MoA achieves fine-grained integration by performing a weighted fusion of all expert outputs; (ii) Sparse MoA activates adapter experts sparsely based on their contribution, achieving this with negligible performance degradation. Experimental results demonstrate that heterogeneous MoA outperforms homogeneous MoE-LoRA methods in both performance and parameter efficiency. Our project is available at https://github.com/DCDmllm/MoA.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.05928
• PDF: https://arxiv.org/pdf/2506.05928
• Github: https://github.com/DCDmllm/MoA
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models
🔹 Publication Date: Published on Jun 6
🔹 Abstract:
A heterogeneous Mixture-of-Adapters (MoA) approach enhances parameter-efficient fine-tuning in LLMs by integrating diverse adapter experts, outperforming homogeneous MoE-LoRA methods. AI-generated summary Recent studies integrate Low-Rank Adaptation (LoRA) and Mixture-of-Experts (MoE) to further enhance the performance of parameter-efficient fine-tuning (PEFT) methods in Large Language Model (LLM) applications. Existing methods employ homogeneous MoE-LoRA architectures composed of LoRA experts with either similar or identical structures and capacities. However, these approaches often suffer from representation collapse and expert load imbalance , which negatively impact the potential of LLMs. To address these challenges, we propose a heterogeneous Mixture-of-Adapters (MoA) approach. This method dynamically integrates PEFT adapter experts with diverse structures, leveraging their complementary representational capabilities to foster expert specialization, thereby enhancing the effective transfer of pre-trained knowledge to downstream tasks. MoA supports two variants: (i) Soft MoA achieves fine-grained integration by performing a weighted fusion of all expert outputs; (ii) Sparse MoA activates adapter experts sparsely based on their contribution, achieving this with negligible performance degradation. Experimental results demonstrate that heterogeneous MoA outperforms homogeneous MoE-LoRA methods in both performance and parameter efficiency. Our project is available at https://github.com/DCDmllm/MoA.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.05928
• PDF: https://arxiv.org/pdf/2506.05928
• Github: https://github.com/DCDmllm/MoA
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
GitHub
GitHub - DCDmllm/MoA
Contribute to DCDmllm/MoA development by creating an account on GitHub.
❤1
🔹 Title:
A Multimodal Automated Interpretability Agent
🔹 Publication Date: Published on Apr 22, 2024
🔹 Abstract:
MAIA, a multimodal automated interpretability agent, uses neural models to perform feature interpretation and failure mode discovery for other models, demonstrating comparable results to human experimenters and aiding in reducing sensitivity to spurious features and identifying potential misclassifications. AI-generated summary This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery . It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools commonly used by human interpretability researchers: for synthesizing and editing inputs, computing maximally activating exemplars from real-world datasets, and summarizing and describing experimental results. Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior. We evaluate applications of MAIA to computer vision models. We first characterize MAIA's ability to describe (neuron-level) features in learned representations of images. Across several trained models and a novel dataset of synthetic vision neurons with paired ground-truth descriptions, MAIA produces descriptions comparable to those generated by expert human experimenters. We then show that MAIA can aid in two additional interpretability tasks: reducing sensitivity to spurious features , and automatically identifying inputs likely to be mis-classified.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2404.14394
• PDF: https://arxiv.org/pdf/2404.14394
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
A Multimodal Automated Interpretability Agent
🔹 Publication Date: Published on Apr 22, 2024
🔹 Abstract:
MAIA, a multimodal automated interpretability agent, uses neural models to perform feature interpretation and failure mode discovery for other models, demonstrating comparable results to human experimenters and aiding in reducing sensitivity to spurious features and identifying potential misclassifications. AI-generated summary This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery . It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools commonly used by human interpretability researchers: for synthesizing and editing inputs, computing maximally activating exemplars from real-world datasets, and summarizing and describing experimental results. Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior. We evaluate applications of MAIA to computer vision models. We first characterize MAIA's ability to describe (neuron-level) features in learned representations of images. Across several trained models and a novel dataset of synthetic vision neurons with paired ground-truth descriptions, MAIA produces descriptions comparable to those generated by expert human experimenters. We then show that MAIA can aid in two additional interpretability tasks: reducing sensitivity to spurious features , and automatically identifying inputs likely to be mis-classified.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2404.14394
• PDF: https://arxiv.org/pdf/2404.14394
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
A Multimodal Automated Interpretability Agent
This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and...
❤5
Forwarded from Python | Machine Learning | Coding | R
❗️ JAY HELPS EVERYONE EARN MONEY!$29,000 HE'S GIVING AWAY TODAY!
Everyone can join his channel and make money! He gives away from $200 to $5.000 every day in his channel
https://t.iss.one/+LgzKy2hA4eY0YWNl
⚡️FREE ONLY FOR THE FIRST 500 SUBSCRIBERS! FURTHER ENTRY IS PAID! 👆👇
https://t.iss.one/+LgzKy2hA4eY0YWNl
Everyone can join his channel and make money! He gives away from $200 to $5.000 every day in his channel
https://t.iss.one/+LgzKy2hA4eY0YWNl
⚡️FREE ONLY FOR THE FIRST 500 SUBSCRIBERS! FURTHER ENTRY IS PAID! 👆👇
https://t.iss.one/+LgzKy2hA4eY0YWNl
❤1
🔹 Title:
Scaling Test-time Compute for LLM Agents
🔹 Publication Date: Published on Jun 15
🔹 Abstract:
Systematic exploration of test-time scaling methods in large language agents reveals that computational scaling improves performance, especially through parallel sampling, sequential revision, effective verification, and increased rollout diversity. AI-generated summary Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms ; (2) sequential revision strategies; (3) verifiers and merging methods ; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.12928
• PDF: https://arxiv.org/pdf/2506.12928
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Scaling Test-time Compute for LLM Agents
🔹 Publication Date: Published on Jun 15
🔹 Abstract:
Systematic exploration of test-time scaling methods in large language agents reveals that computational scaling improves performance, especially through parallel sampling, sequential revision, effective verification, and increased rollout diversity. AI-generated summary Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying test-time scaling methods to language agents and investigate the extent to which it improves their effectiveness. Specifically, we explore different test-time scaling strategies, including: (1) parallel sampling algorithms ; (2) sequential revision strategies; (3) verifiers and merging methods ; (4)strategies for diversifying rollouts.We carefully analyze and ablate the impact of different design strategies on applying test-time scaling on language agents, and have follow findings: 1. Scaling test time compute could improve the performance of agents. 2. Knowing when to reflect is important for agents. 3. Among different verification and result merging approaches, the list-wise method performs best. 4. Increasing diversified rollouts exerts a positive effect on the agent's task performance.
🔹 Links:
• arXiv Page: https://arxiv.org/abs/2506.12928
• PDF: https://arxiv.org/pdf/2506.12928
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
arXiv.org
Scaling Test-time Compute for LLM Agents
Scaling test time compute has shown remarkable success in improving the reasoning abilities of large language models (LLMs). In this work, we conduct the first systematic exploration of applying...
❤3
Article Title:
Learning Compact Vision Tokens for Efficient Large Multimodal Models
Article Date: 8 Jun 2025
Article Description:
Large multimodal models (LMMs) suffer significant computational challenges due to the high cost of Large Language Models (LLMs) and the quadratic complexity of processing long vision token sequences. In this paper, we explore the spatial redundancy among vision tokens and shorten the length of vision token sequences for inference acceleration. Specifically, we propose a Spatial Token Fusion (STF) method to learn compact vision tokens for short vision token sequence, where spatial-adjacent tokens are fused into one. Meanwhile, weight-frozen vision encoder can not well adapt to the demand of extensive downstream vision-language tasks. To this end, we further introduce a Multi-Block Token Fusion (MBTF) module to supplement multi-granularity features for the reduced token sequence. Overall, we combine STF and MBTF module to balance token reduction and information preservation, thereby improving inference efficiency without sacrificing multimodal reasoning capabilities. Experimental results demonstrate that our method based on LLaVA-1.5 achieves comparable or even superior performance to the baseline on 8 popular vision-language benchmarks with only $25\%$ vision tokens of baseline. The source code and trained weights are available at https://github.com/visresearch/LLaVA-STF.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.07138v1.pdf
GitHub:
• https://github.com/visresearch/LLaVA-STF
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Learning Compact Vision Tokens for Efficient Large Multimodal Models
Article Date: 8 Jun 2025
Article Description:
Large multimodal models (LMMs) suffer significant computational challenges due to the high cost of Large Language Models (LLMs) and the quadratic complexity of processing long vision token sequences. In this paper, we explore the spatial redundancy among vision tokens and shorten the length of vision token sequences for inference acceleration. Specifically, we propose a Spatial Token Fusion (STF) method to learn compact vision tokens for short vision token sequence, where spatial-adjacent tokens are fused into one. Meanwhile, weight-frozen vision encoder can not well adapt to the demand of extensive downstream vision-language tasks. To this end, we further introduce a Multi-Block Token Fusion (MBTF) module to supplement multi-granularity features for the reduced token sequence. Overall, we combine STF and MBTF module to balance token reduction and information preservation, thereby improving inference efficiency without sacrificing multimodal reasoning capabilities. Experimental results demonstrate that our method based on LLaVA-1.5 achieves comparable or even superior performance to the baseline on 8 popular vision-language benchmarks with only $25\%$ vision tokens of baseline. The source code and trained weights are available at https://github.com/visresearch/LLaVA-STF.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2506.07138v1.pdf
GitHub:
• https://github.com/visresearch/LLaVA-STF
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤1
🔹 Title:
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
🔹 Publication Date: Published on Jun 19
🔹 Abstract:
GRPO-CARE, a reinforcement learning framework optimizing for consistency and correctness, outperforms standard GRPO on a new video understanding benchmark, SEED-Bench-R1, improving both performance and logical coherence in multimodal large language models. AI-generated summary Recent reinforcement learning approaches, such as outcome-supervised GRPO , have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1 , a benchmark with complex real-world videos requiring balanced perception and reasoning. It offers a large training set and evaluates generalization across three escalating challenges: in-distribution , cross-environment , and cross-environment-task scenarios. Using SEED-Bench-R1 , we find that standard GRPO, while improving answer accuracy , often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate. This stems from reward signals focusing solely on final answers, encouraging shortcuts , and strict KL penalties limiting exploration .To address this, we propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision. GRPO-CARE introduces a two-tiered reward : (1) a base reward for answer correctness, and (2) an adaptive consistency bonus , computed by comparing the model's reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers.This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. Replacing KL penalties with this adaptive bonus, GRPO-CARE outperforms standard GRPO on SEED-Bench-R1 , achieving a 6.7% performance gain on the hardest evaluation level and a 24.5% improvement in consistency. It also shows strong transferability , improving model performance across diverse video understanding benchmarks . Our work contributes a systematically designed benchmark and a generalizable post-training framework, advancing the development of more interpretable and robust MLLMs.
🔹 Links:
• arXiv Page: https://arxiv.org/pdf/2506.16141
• PDF: https://arxiv.org/pdf/2506.16141
• Github: https://github.com/TencentARC/GRPO-CARE
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
🔹 Publication Date: Published on Jun 19
🔹 Abstract:
GRPO-CARE, a reinforcement learning framework optimizing for consistency and correctness, outperforms standard GRPO on a new video understanding benchmark, SEED-Bench-R1, improving both performance and logical coherence in multimodal large language models. AI-generated summary Recent reinforcement learning approaches, such as outcome-supervised GRPO , have advanced Chain-of-Thought reasoning in large language models (LLMs), yet their adaptation to multimodal LLMs (MLLMs) is unexplored. To address the lack of rigorous evaluation for MLLM post-training methods, we introduce SEED-Bench-R1 , a benchmark with complex real-world videos requiring balanced perception and reasoning. It offers a large training set and evaluates generalization across three escalating challenges: in-distribution , cross-environment , and cross-environment-task scenarios. Using SEED-Bench-R1 , we find that standard GRPO, while improving answer accuracy , often reduces logical coherence between reasoning steps and answers, with only a 57.9% consistency rate. This stems from reward signals focusing solely on final answers, encouraging shortcuts , and strict KL penalties limiting exploration .To address this, we propose GRPO-CARE, a consistency-aware RL framework optimizing both answer correctness and reasoning coherence without explicit supervision. GRPO-CARE introduces a two-tiered reward : (1) a base reward for answer correctness, and (2) an adaptive consistency bonus , computed by comparing the model's reasoning-to-answer likelihood (via a slowly-evolving reference model) against group peers.This dual mechanism amplifies rewards for reasoning paths that are both correct and logically consistent. Replacing KL penalties with this adaptive bonus, GRPO-CARE outperforms standard GRPO on SEED-Bench-R1 , achieving a 6.7% performance gain on the hardest evaluation level and a 24.5% improvement in consistency. It also shows strong transferability , improving model performance across diverse video understanding benchmarks . Our work contributes a systematically designed benchmark and a generalizable post-training framework, advancing the development of more interpretable and robust MLLMs.
🔹 Links:
• arXiv Page: https://arxiv.org/pdf/2506.16141
• PDF: https://arxiv.org/pdf/2506.16141
• Github: https://github.com/TencentARC/GRPO-CARE
🔹 Datasets citing this paper:
No datasets found
🔹 Spaces citing this paper:
No spaces found
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤6
Article Title:
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment
Article Date: 30 Mar 2021
Article Description:
Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data-driven models over vector and matrix features. They are more predictive than formulaic alphas, but are too complex to mine into a weakly correlated set. In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes. The new alphas predict returns with high accuracy and can be mined into a weakly correlated set. In addition, we propose a novel alpha mining framework based on AutoML, called AlphaEvolve, to generate the new alphas. To this end, we first propose operators for generating the new alphas and selectively injecting relational domain knowledge to model the relations between stocks. We then accelerate the alpha mining by proposing a pruning technique for redundant alphas. Experiments show that AlphaEvolve can evolve initial alphas into the new alphas with high returns and weak correlations.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2103.16196v2.pdf
GitHub:
• https://github.com/codelion/openevolve
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
AlphaEvolve: A Learning Framework to Discover Novel Alphas in Quantitative Investment
Article Date: 30 Mar 2021
Article Description:
Alphas are stock prediction models capturing trading signals in a stock market. A set of effective alphas can generate weakly correlated high returns to diversify the risk. Existing alphas can be categorized into two classes: Formulaic alphas are simple algebraic expressions of scalar features, and thus can generalize well and be mined into a weakly correlated set. Machine learning alphas are data-driven models over vector and matrix features. They are more predictive than formulaic alphas, but are too complex to mine into a weakly correlated set. In this paper, we introduce a new class of alphas to model scalar, vector, and matrix features which possess the strengths of these two existing classes. The new alphas predict returns with high accuracy and can be mined into a weakly correlated set. In addition, we propose a novel alpha mining framework based on AutoML, called AlphaEvolve, to generate the new alphas. To this end, we first propose operators for generating the new alphas and selectively injecting relational domain knowledge to model the relations between stocks. We then accelerate the alpha mining by proposing a pruning technique for redundant alphas. Experiments show that AlphaEvolve can evolve initial alphas into the new alphas with high returns and weak correlations.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2103.16196v2.pdf
GitHub:
• https://github.com/codelion/openevolve
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤4
Article Title:
Aligning Multimodal LLM with Human Preference: A Survey
Article Date: 18 Mar 2025
Article Description:
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.14504v1.pdf
GitHub:
• https://github.com/bradyfu/awesome-multimodal-large-language-models
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
Aligning Multimodal LLM with Human Preference: A Survey
Article Date: 18 Mar 2025
Article Description:
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms, each targeting different application scenarios and optimization goals. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of alignment algorithms for MLLMs. Specifically, we explore four key aspects: (1) the application scenarios covered by alignment algorithms, including general image understanding, multi-image, video, and audio, and extended multimodal applications; (2) the core factors in constructing alignment datasets, including data sources, model responses, and preference annotations; (3) the benchmarks used to evaluate alignment algorithms; and (4) a discussion of potential future directions for the development of alignment algorithms. This work seeks to help researchers organize current advancements in the field and inspire better alignment methods. The project page of this paper is available at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.14504v1.pdf
GitHub:
• https://github.com/bradyfu/awesome-multimodal-large-language-models
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤3
Article Title:
A Survey of LLM $\times$ DATA
Article Date: 24 May 2025
Article Description:
The integration of large language model (LLM) and data management (DATA) is rapidly redefining both domains. In this survey, we comprehensively review the bidirectional relationships. On the one hand, DATA4LLM, spanning large-scale data processing, storage, and serving, feeds LLMs with high quality, diversity, and timeliness of data required for stages like pre-training, post-training, retrieval-augmented generation, and agentic workflows: (i) Data processing for LLMs includes scalable acquisition, deduplication, filtering, selection, domain mixing, and synthetic augmentation; (ii) Data Storage for LLMs focuses on efficient data and model formats, distributed and heterogeneous storage hierarchies, KV-cache management, and fault-tolerant checkpointing; (iii) Data serving for LLMs tackles challenges in RAG (e.g., knowledge post-processing), LLM inference (e.g., prompt compression, data provenance), and training strategies (e.g., data packing and shuffling). On the other hand, in LLM4DATA, LLMs are emerging as general-purpose engines for data management. We review recent advances in (i) data manipulation, including automatic data cleaning, integration, discovery; (ii) data analysis, covering reasoning over structured, semi-structured, and unstructured data, and (iii) system optimization (e.g., configuration tuning, query rewriting, anomaly diagnosis), powered by LLM techniques like retrieval-augmented prompting, task-specialized fine-tuning, and multi-agent collaboration.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18458v3.pdf
GitHub:
• https://github.com/weaidb/awsome-data-llm
• https://github.com/weaidb/awesome-data-llm
Datasets:
• HumanEval
• C4
• CCNet
• MassiveText
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
A Survey of LLM $\times$ DATA
Article Date: 24 May 2025
Article Description:
The integration of large language model (LLM) and data management (DATA) is rapidly redefining both domains. In this survey, we comprehensively review the bidirectional relationships. On the one hand, DATA4LLM, spanning large-scale data processing, storage, and serving, feeds LLMs with high quality, diversity, and timeliness of data required for stages like pre-training, post-training, retrieval-augmented generation, and agentic workflows: (i) Data processing for LLMs includes scalable acquisition, deduplication, filtering, selection, domain mixing, and synthetic augmentation; (ii) Data Storage for LLMs focuses on efficient data and model formats, distributed and heterogeneous storage hierarchies, KV-cache management, and fault-tolerant checkpointing; (iii) Data serving for LLMs tackles challenges in RAG (e.g., knowledge post-processing), LLM inference (e.g., prompt compression, data provenance), and training strategies (e.g., data packing and shuffling). On the other hand, in LLM4DATA, LLMs are emerging as general-purpose engines for data management. We review recent advances in (i) data manipulation, including automatic data cleaning, integration, discovery; (ii) data analysis, covering reasoning over structured, semi-structured, and unstructured data, and (iii) system optimization (e.g., configuration tuning, query rewriting, anomaly diagnosis), powered by LLM techniques like retrieval-augmented prompting, task-specialized fine-tuning, and multi-agent collaboration.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2505.18458v3.pdf
GitHub:
• https://github.com/weaidb/awsome-data-llm
• https://github.com/weaidb/awesome-data-llm
Datasets:
• HumanEval
• C4
• CCNet
• MassiveText
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2
Article Title:
VACE: All-in-One Video Creation and Editing
Article Date: 10 Mar 2025
Article Description:
Diffusion Transformer has demonstrated powerful capability and scalability in generating high-quality images and videos. Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation. However, due to the intrinsic demands for consistency across both temporal and spatial dynamics, achieving a unified approach for video synthesis remains challenging. We introduce VACE, which enables users to perform Video tasks within an All-in-one framework for Creation and Editing. These tasks include reference-to-video generation, video-to-video editing, and masked video-to-video editing. Specifically, we effectively integrate the requirements of various tasks by organizing video task inputs, such as editing, reference, and masking, into a unified interface referred to as the Video Condition Unit (VCU). Furthermore, by utilizing a Context Adapter structure, we inject different task concepts into the model using formalized representations of temporal and spatial dimensions, allowing it to handle arbitrary video synthesis tasks flexibly. Extensive experiments demonstrate that the unified model of VACE achieves performance on par with task-specific models across various subtasks. Simultaneously, it enables diverse applications through versatile task combinations. Project page: https://ali-vilab.github.io/VACE-Page/.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.07598v1.pdf
GitHub:
• https://github.com/wan-video/wan2.1
• https://github.com/ali-vilab/vace
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
VACE: All-in-One Video Creation and Editing
Article Date: 10 Mar 2025
Article Description:
Diffusion Transformer has demonstrated powerful capability and scalability in generating high-quality images and videos. Further pursuing the unification of generation and editing tasks has yielded significant progress in the domain of image content creation. However, due to the intrinsic demands for consistency across both temporal and spatial dynamics, achieving a unified approach for video synthesis remains challenging. We introduce VACE, which enables users to perform Video tasks within an All-in-one framework for Creation and Editing. These tasks include reference-to-video generation, video-to-video editing, and masked video-to-video editing. Specifically, we effectively integrate the requirements of various tasks by organizing video task inputs, such as editing, reference, and masking, into a unified interface referred to as the Video Condition Unit (VCU). Furthermore, by utilizing a Context Adapter structure, we inject different task concepts into the model using formalized representations of temporal and spatial dimensions, allowing it to handle arbitrary video synthesis tasks flexibly. Extensive experiments demonstrate that the unified model of VACE achieves performance on par with task-specific models across various subtasks. Simultaneously, it enables diverse applications through versatile task combinations. Project page: https://ali-vilab.github.io/VACE-Page/.PDFAbstract
PDF Download Link:
https://arxiv.org/pdf/2503.07598v1.pdf
GitHub:
• https://github.com/wan-video/wan2.1
• https://github.com/ali-vilab/vace
Datasets:
• No datasets information available
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
❤2