✨Residual Stream Duality in Modern Transformer Architectures
📝 Summary:
The residual stream in Transformers can be viewed through a two-axis framework where sequence position and layer depth provide different pathways for information flow, with causal depth-wise residual ...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16039
• PDF: https://arxiv.org/pdf/2603.16039
• Project Page: https://github.com/yifanzhang-pro/residual-stream-duality
• Github: https://github.com/yifanzhang-pro/residual-stream-duality
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
The residual stream in Transformers can be viewed through a two-axis framework where sequence position and layer depth provide different pathways for information flow, with causal depth-wise residual ...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16039
• PDF: https://arxiv.org/pdf/2603.16039
• Project Page: https://github.com/yifanzhang-pro/residual-stream-duality
• Github: https://github.com/yifanzhang-pro/residual-stream-duality
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning
📝 Summary:
A hierarchical reinforcement learning framework named ARISE employs a skill management system to improve mathematical reasoning in language models through reusable strategies and structured skill libr...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16060
• PDF: https://arxiv.org/pdf/2603.16060
• Github: https://github.com/Skylanding/ARISE
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A hierarchical reinforcement learning framework named ARISE employs a skill management system to improve mathematical reasoning in language models through reusable strategies and structured skill libr...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16060
• PDF: https://arxiv.org/pdf/2603.16060
• Github: https://github.com/Skylanding/ARISE
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models
📝 Summary:
MDM-Prime-v2 enhances masked diffusion language models with Binary Encoding and Index Shuffling. It is 21.8 times more compute-efficient than autoregressive models, achieving significantly better perplexity and zero-shot accuracy.
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16077
• PDF: https://arxiv.org/pdf/2603.16077
• Project Page: https://chen-hao-chao.github.io/mdm-prime-v2/
• Github: https://github.com/chen-hao-chao/mdm-prime-v2
🔹 Models citing this paper:
• https://huggingface.co/chen-hao-chao/mdm-prime-v2-c4
• https://huggingface.co/chen-hao-chao/mdm-prime-v2-slimpajama
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MDM-Prime-v2 enhances masked diffusion language models with Binary Encoding and Index Shuffling. It is 21.8 times more compute-efficient than autoregressive models, achieving significantly better perplexity and zero-shot accuracy.
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16077
• PDF: https://arxiv.org/pdf/2603.16077
• Project Page: https://chen-hao-chao.github.io/mdm-prime-v2/
• Github: https://github.com/chen-hao-chao/mdm-prime-v2
🔹 Models citing this paper:
• https://huggingface.co/chen-hao-chao/mdm-prime-v2-c4
• https://huggingface.co/chen-hao-chao/mdm-prime-v2-slimpajama
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Mamba: Linear-Time Sequence Modeling with Selective State Spaces
📝 Summary:
Mamba is a novel SSM that outperforms Transformers by enabling content-based reasoning through selective state spaces. It offers 5x faster inference, linear scaling, and achieves state-of-the-art results across language, audio, and genomics, even matching larger Transformers.
🔹 Publication Date: Published on Dec 1, 2023
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/mamba-linear-time-sequence-modeling-with-selective-state-spaces
• PDF: https://arxiv.org/pdf/2312.00752
• Github: https://github.com/state-spaces/mamba
🔹 Models citing this paper:
• https://huggingface.co/tiiuae/falcon-mamba-7b
• https://huggingface.co/state-spaces/mamba-2.8b-slimpj
• https://huggingface.co/tiiuae/falcon-mamba-7b-instruct
✨ Datasets citing this paper:
• https://huggingface.co/datasets/huaXiaKyrie/up
• https://huggingface.co/datasets/Sherirto/BD4UI
✨ Spaces citing this paper:
• https://huggingface.co/spaces/FallnAI/Quantize-HF-Models
• https://huggingface.co/spaces/openfree/LLM_Quantization
• https://huggingface.co/spaces/seawolf2357/LLM_Quantization
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Mamba is a novel SSM that outperforms Transformers by enabling content-based reasoning through selective state spaces. It offers 5x faster inference, linear scaling, and achieves state-of-the-art results across language, audio, and genomics, even matching larger Transformers.
🔹 Publication Date: Published on Dec 1, 2023
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/mamba-linear-time-sequence-modeling-with-selective-state-spaces
• PDF: https://arxiv.org/pdf/2312.00752
• Github: https://github.com/state-spaces/mamba
🔹 Models citing this paper:
• https://huggingface.co/tiiuae/falcon-mamba-7b
• https://huggingface.co/state-spaces/mamba-2.8b-slimpj
• https://huggingface.co/tiiuae/falcon-mamba-7b-instruct
✨ Datasets citing this paper:
• https://huggingface.co/datasets/huaXiaKyrie/up
• https://huggingface.co/datasets/Sherirto/BD4UI
✨ Spaces citing this paper:
• https://huggingface.co/spaces/FallnAI/Quantize-HF-Models
• https://huggingface.co/spaces/openfree/LLM_Quantization
• https://huggingface.co/spaces/seawolf2357/LLM_Quantization
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Arxivexplained
Mamba: Linear-Time Sequence Modeling with Selective State Spaces - Explained Simply
By Albert Gu, Tri Dao. # Mamba: The AI Architecture That Could Replace Transformers
**The Problem:** Today's most powerful...
**The Problem:** Today's most powerful...
Forwarded from Machine Learning with Python
Follow the Machine Learning with Python channel on WhatsApp: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
👍1
✨Unified Spatio-Temporal Token Scoring for Efficient Video VLMs
📝 Summary:
STTS is a lightweight module for efficiently pruning vision tokens across vision transformer and language models in video VLMs. It achieves 62% efficiency gains with only a 0.7% performance drop by learning spatio-temporal token scoring without text conditioning.
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18004
• PDF: https://arxiv.org/pdf/2603.18004
• Github: https://github.com/allenai/STTS
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
STTS is a lightweight module for efficiently pruning vision tokens across vision transformer and language models in video VLMs. It achieves 62% efficiency gains with only a 0.7% performance drop by learning spatio-temporal token scoring without text conditioning.
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18004
• PDF: https://arxiv.org/pdf/2603.18004
• Github: https://github.com/allenai/STTS
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨MosaicMem: Hybrid Spatial Memory for Controllable Video World Models
📝 Summary:
Video diffusion models use hybrid spatial memory to maintain consistency under camera motion and enable long-term scene editing and navigation. AI-generated summary Video diffusion models are moving b...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17117
• PDF: https://arxiv.org/pdf/2603.17117
• Project Page: https://mosaicmem.github.io/mosaicmem/
• Github: https://mosaicmem.github.io/mosaicmem/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Video diffusion models use hybrid spatial memory to maintain consistency under camera motion and enable long-term scene editing and navigation. AI-generated summary Video diffusion models are moving b...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17117
• PDF: https://arxiv.org/pdf/2603.17117
• Project Page: https://mosaicmem.github.io/mosaicmem/
• Github: https://mosaicmem.github.io/mosaicmem/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Stereo World Model: Camera-Guided Stereo Video Generation
📝 Summary:
StereoWorld is a camera-conditioned stereo world model that generates stereo videos end-to-end using RGB modality while maintaining geometric consistency and efficiency through novel attention mechani...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17375
• PDF: https://arxiv.org/pdf/2603.17375
• Project Page: https://sunyangtian.github.io/StereoWorld-web/
• Github: https://github.com/SunYangtian/StereoWorld
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
StereoWorld is a camera-conditioned stereo world model that generates stereo videos end-to-end using RGB modality while maintaining geometric consistency and efficiency through novel attention mechani...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17375
• PDF: https://arxiv.org/pdf/2603.17375
• Project Page: https://sunyangtian.github.io/StereoWorld-web/
• Github: https://github.com/SunYangtian/StereoWorld
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨When AI Navigates the Fog of War
📝 Summary:
Large language models demonstrate varying capabilities in reasoning about unfolding geopolitical conflicts, showing strategic realism in structured settings but inconsistent performance in complex pol...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16642
• PDF: https://arxiv.org/pdf/2603.16642
• Project Page: https://www.war-forecast-arena.com/
• Github: https://github.com/xirui-li/war-test
✨ Datasets citing this paper:
• https://huggingface.co/datasets/AIcell/war-test-dataset
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Large language models demonstrate varying capabilities in reasoning about unfolding geopolitical conflicts, showing strategic realism in structured settings but inconsistent performance in complex pol...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16642
• PDF: https://arxiv.org/pdf/2603.16642
• Project Page: https://www.war-forecast-arena.com/
• Github: https://github.com/xirui-li/war-test
✨ Datasets citing this paper:
• https://huggingface.co/datasets/AIcell/war-test-dataset
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents
📝 Summary:
AdaMem is an adaptive memory framework for dialogue agents that organizes conversation history into multiple memory types and uses conditional retrieval to improve long-horizon reasoning and user mode...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16496
• PDF: https://arxiv.org/pdf/2603.16496
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
AdaMem is an adaptive memory framework for dialogue agents that organizes conversation history into multiple memory types and uses conditional retrieval to improve long-horizon reasoning and user mode...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16496
• PDF: https://arxiv.org/pdf/2603.16496
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition
📝 Summary:
LaDe is a latent diffusion framework that generates layered media designs with flexible layer counts and semantic meaning from natural language prompts, supporting text-to-image, text-to-layers, and m...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17965
• PDF: https://arxiv.org/pdf/2603.17965
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
LaDe is a latent diffusion framework that generates layered media designs with flexible layer counts and semantic meaning from natural language prompts, supporting text-to-image, text-to-layers, and m...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17965
• PDF: https://arxiv.org/pdf/2603.17965
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Complementary Reinforcement Learning
📝 Summary:
Complementary RL enables efficient agent learning by synchronizing experience extraction with policy optimization through dual objectives that evolve together during training. AI-generated summary Rei...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17621
• PDF: https://arxiv.org/pdf/2603.17621
• Github: https://github.com/pUmpKin-Co/ComplementaryRL
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Complementary RL enables efficient agent learning by synchronizing experience extraction with policy optimization through dual objectives that evolve together during training. AI-generated summary Rei...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17621
• PDF: https://arxiv.org/pdf/2603.17621
• Github: https://github.com/pUmpKin-Co/ComplementaryRL
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
📝 Summary:
MetaClaw is a continual meta-learning framework for LLM agents that evolves policies and reusable skills. It enables zero-downtime skill adaptation and opportunistic policy optimization during inactive periods. This boosts agent accuracy and robustness, scaling to production LLMs.
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17187
• PDF: https://arxiv.org/pdf/2603.17187
• Github: https://github.com/aiming-lab/MetaClaw
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MetaClaw is a continual meta-learning framework for LLM agents that evolves policies and reusable skills. It enables zero-downtime skill adaptation and opportunistic policy optimization during inactive periods. This boosts agent accuracy and robustness, scaling to production LLMs.
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17187
• PDF: https://arxiv.org/pdf/2603.17187
• Github: https://github.com/aiming-lab/MetaClaw
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Efficient Exploration at Scale
📝 Summary:
An online learning algorithm for reinforcement learning from human feedback that achieves significant data efficiency improvements through incremental model updates, reward uncertainty modeling, and i...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17378
• PDF: https://arxiv.org/pdf/2603.17378
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
An online learning algorithm for reinforcement learning from human feedback that achieves significant data efficiency improvements through incremental model updates, reward uncertainty modeling, and i...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17378
• PDF: https://arxiv.org/pdf/2603.17378
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨LoST: Level of Semantics Tokenization for 3D Shapes
📝 Summary:
Level-of-Semantics Tokenization (LoST) improves 3D shape generation by ordering tokens based on semantic salience and using a novel relational alignment loss for better reconstruction and efficiency. ...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17995
• PDF: https://arxiv.org/pdf/2603.17995
• Project Page: https://lost3d.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Level-of-Semantics Tokenization (LoST) improves 3D shape generation by ordering tokens based on semantic salience and using a novel relational alignment loss for better reconstruction and efficiency. ...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17995
• PDF: https://arxiv.org/pdf/2603.17995
• Project Page: https://lost3d.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference
📝 Summary:
Reinforcement learning-based mixed precision quantization method achieves superior compression efficiency and model performance for large language models through adaptive bit width assignment and nove...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17891
• PDF: https://arxiv.org/pdf/2603.17891
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Reinforcement learning-based mixed precision quantization method achieves superior compression efficiency and model performance for large language models through adaptive bit width assignment and nove...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17891
• PDF: https://arxiv.org/pdf/2603.17891
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨PRISM: Demystifying Retention and Interaction in Mid-Training
📝 Summary:
Mid-training design choices significantly improve reasoning performance in large language models, with optimal results achieved when reinforcement learning is applied to models that have been pre-trai...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17074
• PDF: https://arxiv.org/pdf/2603.17074
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Mid-training design choices significantly improve reasoning performance in large language models, with optimal results achieved when reinforcement learning is applied to models that have been pre-trai...
🔹 Publication Date: Published on Mar 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17074
• PDF: https://arxiv.org/pdf/2603.17074
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models
📝 Summary:
ESPIRE is a diagnostic benchmark for embodied spatial reasoning that evaluates vision-language models on robotic tasks through a decomposed localization and execution framework, enabling fine-grained ...
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13033
• PDF: https://arxiv.org/pdf/2603.13033
• Project Page: https://spatigen.github.io/espire.io/
• Github: https://github.com/spatigen/espire
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
ESPIRE is a diagnostic benchmark for embodied spatial reasoning that evaluates vision-language models on robotic tasks through a decomposed localization and execution framework, enabling fine-grained ...
🔹 Publication Date: Published on Mar 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13033
• PDF: https://arxiv.org/pdf/2603.13033
• Project Page: https://spatigen.github.io/espire.io/
• Github: https://github.com/spatigen/espire
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models
📝 Summary:
Video-supervised fine-tuning in multimodal large language models consistently enhances video performance while often degrading static image benchmarks, with frame sampling frequency determining the ex...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17541
• PDF: https://arxiv.org/pdf/2603.17541
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Video-supervised fine-tuning in multimodal large language models consistently enhances video performance while often degrading static image benchmarks, with frame sampling frequency determining the ex...
🔹 Publication Date: Published on Mar 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17541
• PDF: https://arxiv.org/pdf/2603.17541
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research