✨MedGemma 1.5 Technical Report
📝 Summary:
MedGemma 1.5 4B enhances medical AI capabilities through expanded multimodal support and improved performance across medical imaging, document understanding, and clinical reasoning tasks. AI-generated...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05081
• PDF: https://arxiv.org/pdf/2604.05081
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MedGemma 1.5 4B enhances medical AI capabilities through expanded multimodal support and improved performance across medical imaging, document understanding, and clinical reasoning tasks. AI-generated...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05081
• PDF: https://arxiv.org/pdf/2604.05081
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
🔥1
✨ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement
📝 Summary:
ThinkTwice is a two-phase framework that jointly optimizes large language models for reasoning and self-refinement using Group Relative Policy Optimization, demonstrating improved performance on mathe...
🔹 Publication Date: Published on Apr 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01591
• PDF: https://arxiv.org/pdf/2604.01591
• Github: https://github.com/CSSLab/ThinkTwice
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
ThinkTwice is a two-phase framework that jointly optimizes large language models for reasoning and self-refinement using Group Relative Policy Optimization, demonstrating improved performance on mathe...
🔹 Publication Date: Published on Apr 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01591
• PDF: https://arxiv.org/pdf/2604.01591
• Github: https://github.com/CSSLab/ThinkTwice
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
📝 Summary:
A multi-agent system called Paper Circle is presented that automates the discovery and analysis of scientific literature through integrated retrieval and knowledge graph construction capabilities. AI-...
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06170
• PDF: https://arxiv.org/pdf/2604.06170
• Project Page: https://papercircle.vercel.app/
• Github: https://github.com/MAXNORM8650/papercircle
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A multi-agent system called Paper Circle is presented that automates the discovery and analysis of scientific literature through integrated retrieval and knowledge graph construction capabilities. AI-...
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06170
• PDF: https://arxiv.org/pdf/2604.06170
• Project Page: https://papercircle.vercel.app/
• Github: https://github.com/MAXNORM8650/papercircle
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings
📝 Summary:
Research demonstrates that skill utilization in LLM-based agents degrades significantly under realistic conditions where skills must be retrieved and refined rather than handcrafted, though targeted r...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04323
• PDF: https://arxiv.org/pdf/2604.04323
• Project Page: https://github.com/UCSB-NLP-Chang/Skill-Usage
• Github: https://github.com/UCSB-NLP-Chang/Skill-Usage
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #LLMAgents #Benchmarking #NLP
📝 Summary:
Research demonstrates that skill utilization in LLM-based agents degrades significantly under realistic conditions where skills must be retrieved and refined rather than handcrafted, though targeted r...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04323
• PDF: https://arxiv.org/pdf/2604.04323
• Project Page: https://github.com/UCSB-NLP-Chang/Skill-Usage
• Github: https://github.com/UCSB-NLP-Chang/Skill-Usage
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #LLMAgents #Benchmarking #NLP
✨MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
📝 Summary:
MegaTrain trains large language models with over 100 billion parameters on a single GPU. It stores parameters in host memory and streams them to the GPU using pipelined execution and stateless layer templates to overcome bandwidth. This enables 120 billion parameter training and outperforms other...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05091
• PDF: https://arxiv.org/pdf/2604.05091
• Github: https://github.com/DLYuanGod/MegaTrain
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MegaTrain trains large language models with over 100 billion parameters on a single GPU. It stores parameters in host memory and streams them to the GPU using pipelined execution and stateless layer templates to overcome bandwidth. This enables 120 billion parameter training and outperforms other...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05091
• PDF: https://arxiv.org/pdf/2604.05091
• Github: https://github.com/DLYuanGod/MegaTrain
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers
📝 Summary:
Large language models struggle with autonomous bug discovery in complex runtime environments, as demonstrated by a new game development benchmark that reveals limited effectiveness of current approach...
🔹 Publication Date: Published on Apr 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02648
• PDF: https://arxiv.org/pdf/2604.02648
• Github: https://github.com/camel-ai/GBQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Large language models struggle with autonomous bug discovery in complex runtime environments, as demonstrated by a new game development benchmark that reveals limited effectiveness of current approach...
🔹 Publication Date: Published on Apr 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02648
• PDF: https://arxiv.org/pdf/2604.02648
• Github: https://github.com/camel-ai/GBQA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
🚀 Sber has released two open-source MoE models: GigaChat-3.1 Ultra and Lightning
Both code and weights are available under the MIT license on HuggingFace.
👉 Key details:
• Trained from scratch (not a finetune) on proprietary data and infrastructure
• Mixture-of-Experts (MoE) architecture
Models:
🧠 GigaChat-3.1 Ultra
• 702B MoE model for high-performance environments
• Outperforms DeepSeek-V3-0324 and Qwen3-235B on math and reasoning benchmarks
• Supports FP8 training and MTP
⚡️ GigaChat-3.1 Lightning
• 10B model (1.8B active parameters)
• Outperforms Qwen3-4B and Gemma-3-4B on Sber benchmarks
• Efficient local inference
• Up to 256k context
Engineering highlights:
• Custom metric to detect and reduce generation loops
• DPO training moved to native FP8
• Improvements in post-training pipeline
• Identified and fixed a critical issue affecting evaluation quality
🌍 Trained on 14 languages (optimized for English and Russian)
Use cases:
• chatbots
• AI assistants
• copilots
• internal ML systems
Sber provides a solid open foundation for developers to build production-ready AI systems with lower infrastructure costs.
Both code and weights are available under the MIT license on HuggingFace.
👉 Key details:
• Trained from scratch (not a finetune) on proprietary data and infrastructure
• Mixture-of-Experts (MoE) architecture
Models:
🧠 GigaChat-3.1 Ultra
• 702B MoE model for high-performance environments
• Outperforms DeepSeek-V3-0324 and Qwen3-235B on math and reasoning benchmarks
• Supports FP8 training and MTP
⚡️ GigaChat-3.1 Lightning
• 10B model (1.8B active parameters)
• Outperforms Qwen3-4B and Gemma-3-4B on Sber benchmarks
• Efficient local inference
• Up to 256k context
Engineering highlights:
• Custom metric to detect and reduce generation loops
• DPO training moved to native FP8
• Improvements in post-training pipeline
• Identified and fixed a critical issue affecting evaluation quality
🌍 Trained on 14 languages (optimized for English and Russian)
Use cases:
• chatbots
• AI assistants
• copilots
• internal ML systems
Sber provides a solid open foundation for developers to build production-ready AI systems with lower infrastructure costs.
❤1
✨QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization
📝 Summary:
PRepair tackles over-editing in AI program repair by maximizing correct code reuse. It combines controlled bug injection and edit-aware policy optimization using an edit-aware reward. This framework significantly improves repair precision and decoding throughput.
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05963
• PDF: https://arxiv.org/pdf/2604.05963
• Github: https://github.com/kcxain/QiMeng-PRepair
🔹 Models citing this paper:
• https://huggingface.co/kcxain/Prepair-Python-7B-EA
• https://huggingface.co/kcxain/Prepair-Verilog-7B-EA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ProgramRepair #AI #MachineLearning #ReinforcementLearning #SoftwareEngineering
📝 Summary:
PRepair tackles over-editing in AI program repair by maximizing correct code reuse. It combines controlled bug injection and edit-aware policy optimization using an edit-aware reward. This framework significantly improves repair precision and decoding throughput.
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05963
• PDF: https://arxiv.org/pdf/2604.05963
• Github: https://github.com/kcxain/QiMeng-PRepair
🔹 Models citing this paper:
• https://huggingface.co/kcxain/Prepair-Python-7B-EA
• https://huggingface.co/kcxain/Prepair-Verilog-7B-EA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ProgramRepair #AI #MachineLearning #ReinforcementLearning #SoftwareEngineering
✨Watch Before You Answer: Learning from Visually Grounded Post-Training
📝 Summary:
VLMs struggle with video understanding due to text biases in benchmarks and training data. VidGround uses only visually grounded questions for post-training to eliminate these biases. This improves VLM performance and emphasizes the need for high-quality, visually grounded data.
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05117
• PDF: https://arxiv.org/pdf/2604.05117
• Project Page: https://vidground.etuagi.com
• Github: https://github.com/reacher-z/vidground
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLMs #VideoUnderstanding #AI #MachineLearning #ComputerVision
📝 Summary:
VLMs struggle with video understanding due to text biases in benchmarks and training data. VidGround uses only visually grounded questions for post-training to eliminate these biases. This improves VLM performance and emphasizes the need for high-quality, visually grounded data.
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05117
• PDF: https://arxiv.org/pdf/2604.05117
• Project Page: https://vidground.etuagi.com
• Github: https://github.com/reacher-z/vidground
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLMs #VideoUnderstanding #AI #MachineLearning #ComputerVision
✨DARE: Diffusion Large Language Models Alignment and Reinforcement Executor
📝 Summary:
Diffusion large language models are gaining attention as alternatives to autoregressive models, utilizing iterative denoising and parallel generation instead of sequential token processing, yet their ...
🔹 Publication Date: Published on Apr 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04215
• PDF: https://arxiv.org/pdf/2604.04215
• Github: https://github.com/yjyddq/DARE
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Diffusion large language models are gaining attention as alternatives to autoregressive models, utilizing iterative denoising and parallel generation instead of sequential token processing, yet their ...
🔹 Publication Date: Published on Apr 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04215
• PDF: https://arxiv.org/pdf/2604.04215
• Github: https://github.com/yjyddq/DARE
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Context-Value-Action Architecture for Value-Driven Large Language Model Agents
📝 Summary:
LLMs show rigid, polarized behavior worsening with reasoning. The Context-Value-Action CVA architecture decouples actions from reasoning using a human-data Value Verifier, mitigating polarization and improving behavioral fidelity.
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05939
• PDF: https://arxiv.org/pdf/2604.05939
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
LLMs show rigid, polarized behavior worsening with reasoning. The Context-Value-Action CVA architecture decouples actions from reasoning using a human-data Value Verifier, mitigating polarization and improving behavioral fidelity.
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05939
• PDF: https://arxiv.org/pdf/2604.05939
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling?
📝 Summary:
TABLeT uses a 2D natural image autoencoder to tokenize fMRI volumes into compact continuous tokens, enabling efficient long-sequence spatiotemporal modeling with a simple Transformer encoder while mai...
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.03619
• PDF: https://arxiv.org/pdf/2604.03619
• Project Page: https://concarne2.github.io/tablet_project_page/
• Github: https://github.com/beotborry/TABLeT
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
TABLeT uses a 2D natural image autoencoder to tokenize fMRI volumes into compact continuous tokens, enabling efficient long-sequence spatiotemporal modeling with a simple Transformer encoder while mai...
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.03619
• PDF: https://arxiv.org/pdf/2604.03619
• Project Page: https://concarne2.github.io/tablet_project_page/
• Github: https://github.com/beotborry/TABLeT
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents
📝 Summary:
A task-conditioned tool-output pruning model effectively reduces input tokens for coding agents. It achieves 0.86 recall and 0.80 F1, removing 92% of tokens, outperforming larger zero-shot models and heuristic baselines.
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04979
• PDF: https://arxiv.org/pdf/2604.04979
• Github: https://github.com/KRLabsOrg/squeez
🔹 Models citing this paper:
• https://huggingface.co/KRLabsOrg/squeez-2b
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CodingAgents #LLM #TokenPruning #AI #MachineLearning
📝 Summary:
A task-conditioned tool-output pruning model effectively reduces input tokens for coding agents. It achieves 0.86 recall and 0.80 F1, removing 92% of tokens, outperforming larger zero-shot models and heuristic baselines.
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04979
• PDF: https://arxiv.org/pdf/2604.04979
• Github: https://github.com/KRLabsOrg/squeez
🔹 Models citing this paper:
• https://huggingface.co/KRLabsOrg/squeez-2b
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#CodingAgents #LLM #TokenPruning #AI #MachineLearning
✨General Multimodal Protein Design Enables DNA-Encoding of Chemistry
📝 Summary:
DISCO is a multimodal deep generative model that co-designs protein sequences and 3D structures to create novel heme enzymes with unprecedented catalytic capabilities. AI-generated summary Evolution i...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05181
• PDF: https://arxiv.org/pdf/2604.05181
• Project Page: https://disco-design.github.io/
• Github: https://github.com/DISCO-design/DISCO
🔹 Models citing this paper:
• https://huggingface.co/DISCO-Design/DISCO
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
DISCO is a multimodal deep generative model that co-designs protein sequences and 3D structures to create novel heme enzymes with unprecedented catalytic capabilities. AI-generated summary Evolution i...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05181
• PDF: https://arxiv.org/pdf/2604.05181
• Project Page: https://disco-design.github.io/
• Github: https://github.com/DISCO-design/DISCO
🔹 Models citing this paper:
• https://huggingface.co/DISCO-Design/DISCO
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
📝 Summary:
Expert-choice routing improves diffusion language model mixture-of-experts by providing deterministic load balancing and adaptive computation allocation based on denoising steps. AI-generated summary ...
🔹 Publication Date: Published on Apr 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01622
• PDF: https://arxiv.org/pdf/2604.01622
• Github: https://github.com/zhangshuibai/EC-DLM
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Expert-choice routing improves diffusion language model mixture-of-experts by providing deterministic load balancing and adaptive computation allocation based on denoising steps. AI-generated summary ...
🔹 Publication Date: Published on Apr 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01622
• PDF: https://arxiv.org/pdf/2604.01622
• Github: https://github.com/zhangshuibai/EC-DLM
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
📝 Summary:
ClawsBench evaluates LLM productivity agents in realistic workflows with mock services, assessing capability and safety. It shows agents achieve 39-64% task success but also 7-33% unsafe actions, identifying recurring patterns.
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05172
• PDF: https://arxiv.org/pdf/2604.05172
• Project Page: https://clawsbench.com/
• Github: https://github.com/benchflow-ai/ClawsBench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/benchflow/ClawsBench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AIAgents #AISafety #Benchmarking #AIResearch
📝 Summary:
ClawsBench evaluates LLM productivity agents in realistic workflows with mock services, assessing capability and safety. It shows agents achieve 39-64% task success but also 7-33% unsafe actions, identifying recurring patterns.
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05172
• PDF: https://arxiv.org/pdf/2604.05172
• Project Page: https://clawsbench.com/
• Github: https://github.com/benchflow-ai/ClawsBench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/benchflow/ClawsBench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AIAgents #AISafety #Benchmarking #AIResearch
✨CUE-R: Beyond the Final Answer in Retrieval-Augmented Generation
📝 Summary:
Researchers developed a framework to measure the operational utility of individual retrieved items in retrieval-augmented generation systems by perturbing evidence and analyzing changes in correctness...
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05467
• PDF: https://arxiv.org/pdf/2604.05467
• Github: https://github.com/jainsid24/cue-r
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Researchers developed a framework to measure the operational utility of individual retrieved items in retrieval-augmented generation systems by perturbing evidence and analyzing changes in correctness...
🔹 Publication Date: Published on Apr 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05467
• PDF: https://arxiv.org/pdf/2604.05467
• Github: https://github.com/jainsid24/cue-r
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨REAM: Merging Improves Pruning of Experts in LLMs
📝 Summary:
Router-weighted Expert Activation Merging (REAM) is proposed as a novel method for reducing memory requirements in Mixture-of-Experts large language models by grouping and merging expert weights inste...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04356
• PDF: https://arxiv.org/pdf/2604.04356
• Project Page: https://bknyaz.github.io/blog/2026/moe/
• Github: https://github.com/SamsungSAILMontreal/ream
🔹 Models citing this paper:
• https://huggingface.co/bknyaz/Qwen3-Coder-Next-REAM
• https://huggingface.co/SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAM
• https://huggingface.co/bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Router-weighted Expert Activation Merging (REAM) is proposed as a novel method for reducing memory requirements in Mixture-of-Experts large language models by grouping and merging expert weights inste...
🔹 Publication Date: Published on Apr 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04356
• PDF: https://arxiv.org/pdf/2604.04356
• Project Page: https://bknyaz.github.io/blog/2026/moe/
• Github: https://github.com/SamsungSAILMontreal/ream
🔹 Models citing this paper:
• https://huggingface.co/bknyaz/Qwen3-Coder-Next-REAM
• https://huggingface.co/SamsungSAILMontreal/Qwen3-30B-A3B-Instruct-2507-REAM
• https://huggingface.co/bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
📝 Summary:
Personalized RewardBench evaluates reward models' ability to capture individual user preferences, revealing significant challenges in current models and demonstrating superior correlation with downstr...
🔹 Publication Date: Published on Apr 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07343
• PDF: https://arxiv.org/pdf/2604.07343
• Project Page: https://huggingface.co/datasets/QiyaoMa/Personalized-RewardBench
• Github: https://github.com/Martin-qyma/Personalized-RewardBench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/QiyaoMa/Personalized-RewardBench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Personalized RewardBench evaluates reward models' ability to capture individual user preferences, revealing significant challenges in current models and demonstrating superior correlation with downstr...
🔹 Publication Date: Published on Apr 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07343
• PDF: https://arxiv.org/pdf/2604.07343
• Project Page: https://huggingface.co/datasets/QiyaoMa/Personalized-RewardBench
• Github: https://github.com/Martin-qyma/Personalized-RewardBench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/QiyaoMa/Personalized-RewardBench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MARS: Enabling Autoregressive Models Multi-Token Generation
📝 Summary:
MARS is a fine-tuning method that enables autoregressive language models to predict multiple tokens per forward pass without architectural changes, maintaining accuracy while improving throughput and ...
🔹 Publication Date: Published on Apr 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07023
• PDF: https://arxiv.org/pdf/2604.07023
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MARS is a fine-tuning method that enables autoregressive language models to predict multiple tokens per forward pass without architectural changes, maintaining accuracy while improving throughput and ...
🔹 Publication Date: Published on Apr 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07023
• PDF: https://arxiv.org/pdf/2604.07023
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research