ML Research Hub – Telegram

ML Research Hub

32.5K subscribers

6K photos

385 videos

24 files

6.49K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.5K subscribers

ML Research Hub

✨Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models

📝 Summary:
VLA models struggle to integrate visual detail for action generation. DeepVision-VLA enhances visual representations via multi-level feature injection and action-guided pruning. This significantly boosts performance on robotic tasks.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15618
• PDF: https://arxiv.org/pdf/2603.15618
• Project Page: https://deepvision-vla.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VLAModels #ComputerVision #Robotics #DeepLearning #FoundationModels

159 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GigaWorld-Policy: An Efficient Action-Centered World--Action Model

📝 Summary:
GigaWorld-Policy is an action-centered World-Action Model that significantly improves robotic policy learning. It decouples visual and motion representations, using dual supervision from action prediction and video generation. This allows for 9x faster inference and 7% higher task success rates c...

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17240
• PDF: https://arxiv.org/pdf/2603.17240
• Project Page: https://gigaai-research.github.io/GigaWorld-Policy/
• Github: https://github.com/open-gigaai/giga-world-policy

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Robotics #MachineLearning #WorldModels #DeepLearning #PolicyLearning

156 views09:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Video-CoE: Reinforcing Video Event Prediction via Chain of Events

📝 Summary:
Video-CoE introduces a Chain of Events CoE paradigm to improve video event prediction. It addresses MLLM limitations in logical reasoning and visual utilization by constructing temporal event chains and using enhanced training. CoE achieves state-of-the-art performance on VEP benchmarks.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14935
• PDF: https://arxiv.org/pdf/2603.14935

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoEventPrediction #ChainOfEvents #MLLM #ComputerVision #AI

123 views09:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Alignment Makes Language Models Normative, Not Descriptive

📝 Summary:
Aligned language models excel at normative, rule-based behavior prediction but struggle with complex descriptive human strategic interactions. Base models predict real human choices in these games better. This reveals a trade-off in model optimization.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17218
• PDF: https://arxiv.org/pdf/2603.17218

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AIAlignment #NormativeAI #GameTheory #AIBehavior

157 views09:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ACE-LoRA: Graph-Attentive Context Enhancement for Parameter-Efficient Adaptation of Medical Vision-Language Models

📝 Summary:
ACE-LoRA parameter-efficiently adapts medical VLMs, enhancing zero-shot generalization. It integrates LoRA and attention-based context enhancement to capture fine-grained diagnostic cues. This outperforms state-of-the-art models across diverse medical tasks.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17079
• PDF: https://arxiv.org/pdf/2603.17079
• Github: https://github.com/icon-lab/ACE-LoRA

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MedicalAI #VisionLanguageModels #LoRA #DeepLearning #EfficientAI

212 views10:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FINER: MLLMs Hallucinate under Fine-grained Negative Queries

📝 Summary:
Multimodal language models hallucinate under fine-grained negative queries, a gap in existing benchmarks. This paper introduces FINER benchmarks and FINER-Tuning, a DPO method, to address this. It significantly reduces hallucinations and boosts general MLLM capabilities.

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17662
• PDF: https://arxiv.org/pdf/2603.17662
• Project Page: https://explainableml.github.io/finer-project/
• Github: https://github.com/ExplainableML/finer

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MLLMs #AIHallucinations #Benchmarking #DeepLearning #AIResearch

208 views11:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HeBA: Heterogeneous Bottleneck Adapters for Robust Vision-Language Models

📝 Summary:
HeBA introduces a heterogeneous bottleneck adapter framework for Vision-Language Models. It uses modality-specific processing like convolutions for images and linear projections for text, combined with a compression bottleneck and active gradient initialization. This design improves few-shot lear...

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16653
• PDF: https://arxiv.org/pdf/2603.16653
• Project Page: https://huggingface.co/papers?q=dense%20linear%20projections
• Github: https://github.com/Jahid12012021/VLM-HeBA

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModels #DeepLearning #AIResearch #ModelAdapters #FewShotLearning

192 views14:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass

📝 Summary:
CHROMM is a unified framework that jointly reconstructs cameras, scene point clouds, and human meshes from multi-person multi-view videos. It integrates strong priors, handles scale discrepancies, and uses multi-view fusion for faster, more robust human-scene reconstruction.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12789
• PDF: https://arxiv.org/pdf/2603.12789
• Project Page: https://nstar1125.github.io/chromm
• Github: https://nstar1125.github.io/chromm/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #HumanSceneReconstruction #MultiViewVideo #AIResearch

140 views16:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

📝 Summary:
Fanar-Sadiq is a bilingual multi-agent Islamic assistant addressing LLM inaccuracies in religious QA. It uses a tool-using architecture with specialized modules for diverse queries like scripture, fiqh, and calculations, ensuring grounded, accurate, and deterministic answers.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08501
• PDF: https://arxiv.org/pdf/2603.08501

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

133 views16:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides

📝 Summary:
PPTAgent, a two-stage approach, improves presentation generation by analyzing reference presentations and ensuring structural and content consistency, outperforming traditional methods across content,...

🔹 Publication Date: Published on Jan 7, 2025

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/ICIP/pptagent
• PDF: https://arxiv.org/pdf/2501.03936
• Project Page: https://github.com/icip-cas/PPTAgent
• Github: https://github.com/icip-cas/PPTAgent

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Forceless/Zenodo10K

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

183 views16:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Expert Threshold Routing for Autoregressive Language Modeling with Dynamic Computation Allocation and Load Balancing

📝 Summary:
Expert Threshold ET routing dynamically allocates computation in MoE models. Tokens route to experts based on individual scores exceeding EMA thresholds, achieving load balance without auxiliary losses. ET lowers cross-entropy loss by 0.067 compared to Token-choice MoE.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11535
• PDF: https://arxiv.org/pdf/2603.11535
• Github: https://github.com/MasterGodzilla/Expert-Threshold-Routing

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

194 views18:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

📝 Summary:
V-JEPA 2.1 is a self-supervised model learning dense visual representations for images and videos. It combines dense predictive loss, deep self-supervision, multi-modal tokenizers, and scaling to achieve state-of-the-art performance across various benchmarks, significantly advancing visual unders...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14482
• PDF: https://arxiv.org/pdf/2603.14482
• Project Page: https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks/
• Github: https://github.com/facebookresearch/vjepa2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SelfSupervisedLearning #ComputerVision #DeepLearning #AI #VideoUnderstanding

181 views19:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning

📝 Summary:
DICE-RL refines pretrained generative robot policies via reinforcement learning distribution contraction. It boosts high-success behaviors, leading to stable, sample-efficient mastery of complex manipulation from pixels on real robots.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10263
• PDF: https://arxiv.org/pdf/2603.10263
• Project Page: https://zhanyisun.github.io/dice.rl.2026/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

142 views23:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AdapterTune: Zero-Initialized Low-Rank Adapters for Frozen Vision Transformers

📝 Summary:
AdapterTune introduces zero-initialized low-rank adapters for Vision Transformers, addressing optimization instability and capacity issues. This method prevents representation drift and significantly improves accuracy, often outperforming full fine-tuning with fewer parameters.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14706
• PDF: https://arxiv.org/pdf/2603.14706
• Github: https://github.com/salimkhazem/adaptertune

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

114 views01:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OSM-based Domain Adaptation for Remote Sensing VLMs

📝 Summary:
A self-contained domain adaptation framework for vision-language models in remote sensing uses OpenStreetMap data and optical character recognition to generate captions without requiring external teac...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11804
• PDF: https://arxiv.org/pdf/2603.11804

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

104 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

📝 Summary:
A video diffusion model is repurposed as a latent world simulator to enhance multimodal large language models with implicit 3D structural priors and physical laws through spatiotemporal feature extrac...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19235
• PDF: https://arxiv.org/pdf/2603.19235
• Project Page: https://github.com/H-EmbodVis/VEGA-3D
• Github: https://github.com/H-EmbodVis/VEGA-3D

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

101 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

📝 Summary:
SAMA presents a factorized approach to video editing that separates semantic anchoring from motion modeling, enabling instruction-guided edits with preserved motion through pre-trained motion restorat...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19228
• PDF: https://arxiv.org/pdf/2603.19228
• Project Page: https://cynthiazxy123.github.io/SAMA/
• Github: https://github.com/Cynthiazxy123/SAMA

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

62 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

📝 Summary:
CubiD is a discrete generation model for high-dimensional representations that enables fine-grained masking and learns rich correlations across spatial positions while maintaining fixed generation ste...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19232
• PDF: https://arxiv.org/pdf/2603.19232
• Github: https://github.com/YuqingWang1029/CubiD

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

49 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Memento-Skills: Let Agents Design Agents

📝 Summary:
A generalist language model agent system autonomously designs and improves task-specific agents through memory-based reinforcement learning with stateful prompts and skill libraries. AI-generated summ...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18743
• PDF: https://arxiv.org/pdf/2603.18743
• Project Page: https://memento.run/
• Github: https://github.com/Memento-Teams/Memento-Skills

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

54 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

📝 Summary:
F2LLM-v2 is a multilingual embedding model family trained on 60 million samples across 200+ languages, achieving superior performance through LLM-based training, matryoshka learning, pruning, and dist...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19223
• PDF: https://arxiv.org/pdf/2603.19223
• Project Page: https://huggingface.co/collections/codefuse-ai/f2llm

🔹 Models citing this paper:
• https://huggingface.co/codefuse-ai/F2LLM-v2-8B-Preview
• https://huggingface.co/codefuse-ai/F2LLM-v2-0.6B-Preview
• https://huggingface.co/codefuse-ai/F2LLM-v2-1.7B

✨ Datasets citing this paper:
• https://huggingface.co/datasets/codefuse-ai/F2LLM-v2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

41 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs

📝 Summary:
Long-form audio-visual comprehension benchmark reveals significant challenges for current omnimodal large language models in handling extended multi-modal inputs. AI-generated summary Recent advanceme...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19217
• PDF: https://arxiv.org/pdf/2603.19217
• Project Page: https://kd-tao.github.io/LVOmniBench/
• Github: https://github.com/KD-TAO/LVOmniBench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

56 views03:01

✨ Explore Data Science 📝 Write your paper