ML Research Hub
32.5K subscribers
6K photos
385 videos
24 files
6.49K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Alignment Makes Language Models Normative, Not Descriptive

📝 Summary:
Aligned language models excel at normative, rule-based behavior prediction but struggle with complex descriptive human strategic interactions. Base models predict real human choices in these games better. This reveals a trade-off in model optimization.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17218
• PDF: https://arxiv.org/pdf/2603.17218

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AIAlignment #NormativeAI #GameTheory #AIBehavior
ACE-LoRA: Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models

📝 Summary:
ACE-LoRA parameter-efficiently adapts medical VLMs, enhancing zero-shot generalization. It integrates LoRA and attention-based context enhancement to capture fine-grained diagnostic cues. This outperforms state-of-the-art models across diverse medical tasks.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17079
• PDF: https://arxiv.org/pdf/2603.17079
• Github: https://github.com/icon-lab/ACE-LoRA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MedicalAI #VisionLanguageModels #LoRA #DeepLearning #EfficientAI
FINER: MLLMs Hallucinate under Fine-grained Negative Queries

📝 Summary:
Multimodal language models hallucinate under fine-grained negative queries, a gap in existing benchmarks. This paper introduces FINER benchmarks and FINER-Tuning, a DPO method, to address this. It significantly reduces hallucinations and boosts general MLLM capabilities.

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17662
• PDF: https://arxiv.org/pdf/2603.17662
• Project Page: https://explainableml.github.io/finer-project/
• Github: https://github.com/ExplainableML/finer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MLLMs #AIHallucinations #Benchmarking #DeepLearning #AIResearch
HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models

📝 Summary:
HeBA introduces a heterogeneous bottleneck adapter framework for Vision-Language Models. It uses modality-specific processing like convolutions for images and linear projections for text, combined with a compression bottleneck and active gradient initialization. This design improves few-shot lear...

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16653
• PDF: https://arxiv.org/pdf/2603.16653
• Project Page: https://huggingface.co/papers?q=dense%20linear%20projections
• Github: https://github.com/Jahid12012021/VLM-HeBA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #DeepLearning #AIResearch #ModelAdapters #FewShotLearning
This media is not supported in your browser
VIEW IN TELEGRAM
Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass

📝 Summary:
CHROMM is a unified framework that jointly reconstructs cameras, scene point clouds, and human meshes from multi-person multi-view videos. It integrates strong priors, handles scale discrepancies, and uses multi-view fusion for faster, more robust human-scene reconstruction.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12789
• PDF: https://arxiv.org/pdf/2603.12789
• Project Page: https://nstar1125.github.io/chromm
• Github: https://nstar1125.github.io/chromm/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #HumanSceneReconstruction #MultiViewVideo #AIResearch
Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

📝 Summary:
Fanar-Sadiq is a bilingual multi-agent Islamic assistant addressing LLM inaccuracies in religious QA. It uses a tool-using architecture with specialized modules for diverse queries like scripture, fiqh, and calculations, ensuring grounded, accurate, and deterministic answers.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08501
• PDF: https://arxiv.org/pdf/2603.08501

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

📝 Summary:
PPTAgent, a two-stage approach, improves presentation generation by analyzing reference presentations and ensuring structural and content consistency, outperforming traditional methods across content,...

🔹 Publication Date: Published on Jan 7, 2025

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/ICIP/pptagent
• PDF: https://arxiv.org/pdf/2501.03936
• Project Page: https://github.com/icip-cas/PPTAgent
• Github: https://github.com/icip-cas/PPTAgent

Datasets citing this paper:
https://huggingface.co/datasets/Forceless/Zenodo10K

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Expert Threshold Routing for Autoregressive Language Modeling with Dynamic Computation Allocation and Load Balancing

📝 Summary:
Expert Threshold ET routing dynamically allocates computation in MoE models. Tokens route to experts based on individual scores exceeding EMA thresholds, achieving load balance without auxiliary losses. ET lowers cross-entropy loss by 0.067 compared to Token-choice MoE.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11535
• PDF: https://arxiv.org/pdf/2603.11535
• Github: https://github.com/MasterGodzilla/Expert-Threshold-Routing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

📝 Summary:
V-JEPA 2.1 is a self-supervised model learning dense visual representations for images and videos. It combines dense predictive loss, deep self-supervision, multi-modal tokenizers, and scaling to achieve state-of-the-art performance across various benchmarks, significantly advancing visual unders...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14482
• PDF: https://arxiv.org/pdf/2603.14482
• Project Page: https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks/
• Github: https://github.com/facebookresearch/vjepa2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SelfSupervisedLearning #ComputerVision #DeepLearning #AI #VideoUnderstanding
From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning

📝 Summary:
DICE-RL refines pretrained generative robot policies via reinforcement learning distribution contraction. It boosts high-success behaviors, leading to stable, sample-efficient mastery of complex manipulation from pixels on real robots.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10263
• PDF: https://arxiv.org/pdf/2603.10263
• Project Page: https://zhanyisun.github.io/dice.rl.2026/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers

📝 Summary:
AdapterTune introduces zero-initialized low-rank adapters for Vision Transformers, addressing optimization instability and capacity issues. This method prevents representation drift and significantly improves accuracy, often outperforming full fine-tuning with fewer parameters.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14706
• PDF: https://arxiv.org/pdf/2603.14706
• Github: https://github.com/salimkhazem/adaptertune

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OSM-based Domain Adaptation for Remote Sensing VLMs

📝 Summary:
A self-contained domain adaptation framework for vision-language models in remote sensing uses OpenStreetMap data and optical character recognition to generate captions without requiring external teac...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11804
• PDF: https://arxiv.org/pdf/2603.11804

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

📝 Summary:
A video diffusion model is repurposed as a latent world simulator to enhance multimodal large language models with implicit 3D structural priors and physical laws through spatiotemporal feature extrac...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19235
• PDF: https://arxiv.org/pdf/2603.19235
• Project Page: https://github.com/H-EmbodVis/VEGA-3D
• Github: https://github.com/H-EmbodVis/VEGA-3D

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

📝 Summary:
SAMA presents a factorized approach to video editing that separates semantic anchoring from motion modeling, enabling instruction-guided edits with preserved motion through pre-trained motion restorat...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19228
• PDF: https://arxiv.org/pdf/2603.19228
• Project Page: https://cynthiazxy123.github.io/SAMA/
• Github: https://github.com/Cynthiazxy123/SAMA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

📝 Summary:
CubiD is a discrete generation model for high-dimensional representations that enables fine-grained masking and learns rich correlations across spatial positions while maintaining fixed generation ste...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19232
• PDF: https://arxiv.org/pdf/2603.19232
• Github: https://github.com/YuqingWang1029/CubiD

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Memento-Skills: Let Agents Design Agents

📝 Summary:
A generalist language model agent system autonomously designs and improves task-specific agents through memory-based reinforcement learning with stateful prompts and skill libraries. AI-generated summ...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18743
• PDF: https://arxiv.org/pdf/2603.18743
• Project Page: https://memento.run/
• Github: https://github.com/Memento-Teams/Memento-Skills

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

📝 Summary:
F2LLM-v2 is a multilingual embedding model family trained on 60 million samples across 200+ languages, achieving superior performance through LLM-based training, matryoshka learning, pruning, and dist...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19223
• PDF: https://arxiv.org/pdf/2603.19223
• Project Page: https://huggingface.co/collections/codefuse-ai/f2llm

🔹 Models citing this paper:
https://huggingface.co/codefuse-ai/F2LLM-v2-8B-Preview
https://huggingface.co/codefuse-ai/F2LLM-v2-0.6B-Preview
https://huggingface.co/codefuse-ai/F2LLM-v2-1.7B

Datasets citing this paper:
https://huggingface.co/datasets/codefuse-ai/F2LLM-v2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

📝 Summary:
Long-form audio-visual comprehension benchmark reveals significant challenges for current omnimodal large language models in handling extended multi-modal inputs. AI-generated summary Recent advanceme...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19217
• PDF: https://arxiv.org/pdf/2603.19217
• Project Page: https://kd-tao.github.io/LVOmniBench/
• Github: https://github.com/KD-TAO/LVOmniBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
FASTER: Rethinking Real-Time Flow VLAs

📝 Summary:
Fast Action Sampling for ImmediaTE Reaction (FASTER) reduces real-time reaction latency in Vision-Language-Action models by adapting sampling schedules to prioritize immediate actions while maintainin...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19199
• PDF: https://arxiv.org/pdf/2603.19199
• Project Page: https://innovator-zero.github.io/FASTER
• Github: https://github.com/innovator-zero/FASTER

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

📝 Summary:
Nemotron-Cascade 2 is a 30B parameter Mixture-of-Experts model with 3B activated parameters that achieves exceptional reasoning and agentic capabilities, matching frontier open models despite its comp...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19220
• PDF: https://arxiv.org/pdf/2603.19220

🔹 Models citing this paper:
https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

📝 Summary:
Modulated Hazard-aware Policy Optimization introduces a Log-Fidelity Modulator and Decoupled Hazard Penalty to stabilize reinforcement learning by controlling importance ratios and regulating asymmetr...

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16929
• PDF: https://arxiv.org/pdf/2603.16929

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research