ML Research Hub – Telegram

ML Research Hub

32.5K subscribers

5.88K photos

377 videos

24 files

6.36K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.5K subscribers

ML Research Hub

✨Visual-ERM: Reward Modeling for Visual Equivalence

📝 Summary:
Visual-ERM is a multimodal generative reward model providing fine-grained visual feedback for vision-to-code tasks. It significantly improves reinforcement learning performance for chart, table, and SVG parsing, demonstrating that fine-grained visual supervision is essential.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13224
• PDF: https://arxiv.org/pdf/2603.13224
• Github: https://github.com/InternLM/Visual-ERM

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #ComputerVision #GenerativeAI #AI #DataScience

126 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

📝 Summary:
SimRecon reconstructs cluttered scenes from real videos using a Perception-Generation-Simulation pipeline. It employs Active Viewpoint Optimization for visual fidelity and a Scene Graph Synthesizer for physical plausibility. This enables superior compositional scene representations for simulation...

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02133
• PDF: https://arxiv.org/pdf/2603.02133
• Project Page: https://xiac20.github.io/SimRecon/
• Github: https://github.com/xiac20/SimRecon

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SceneReconstruction #ComputerVision #AI #Simulation #3DReconstruction

120 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

📝 Summary:
LookaheadKV enhances KV cache eviction in LLMs by accurately predicting future importance scores. It uses parameter-efficient modules, avoiding costly draft generation while maintaining high accuracy. This lightweight method significantly reduces eviction overhead and speeds up inference.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10899
• PDF: https://arxiv.org/pdf/2603.10899
• Github: https://github.com/SamsungLabs/LookaheadKV

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #KVCache #ModelOptimization #DeepLearning #AI

134 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

📝 Summary:
Cheers is a unified multimodal model that decouples visual details from semantic representations for efficient joint optimization of understanding and generation. It employs a vision tokenizer, LLM-based Transformer, and cascaded flow matching. Cheers achieves state-of-the-art performance with 4x...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12793
• PDF: https://arxiv.org/pdf/2603.12793
• Project Page: https://huggingface.co/ai9stars/Cheers
• Github: https://github.com/AI9Stars/Cheers

🔹 Models citing this paper:
• https://huggingface.co/ai9stars/Cheers

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #LLM #ComputerVision #GenerativeAI #AIResearch

109 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

📝 Summary:
OmniForcing transforms slow bidirectional audio-visual diffusion models into fast, real-time streaming generators. It tackles training instability and synchronization by using asymmetric alignment, a global prefix, and an audio sink token. This enables high-fidelity, synchronized generation at 25...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11647
• PDF: https://arxiv.org/pdf/2603.11647
• Project Page: https://omniforcing.com/
• Github: https://github.com/OmniForcing/OmniForcing

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GenerativeAI #AudioVisual #RealtimeAI #DiffusionModels #DeepLearning

102 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration

📝 Summary:
HybridStitch accelerates text-to-image generation by intelligently combining large and small diffusion models. It uses the large model for complex image regions and the smaller model for simpler parts, even within a single denoising step. This approach speeds up generation by 1.83x on Stable Diff...

🔹 Publication Date: Published on Mar 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.07815
• PDF: https://arxiv.org/pdf/2603.07815

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

109 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

📝 Summary:
MM-CondChain benchmark evaluates multimodal large language models on deep compositional visual reasoning through multi-layer conditional workflows with mechanically verifiable conditions. AI-generated...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12266
• PDF: https://arxiv.org/pdf/2603.12266
• Project Page: https://accio-lab.github.io/MM-CondChain
• Github: https://accio-lab.github.io/MM-CondChain

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

109 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

📝 Summary:
A self-evolving framework for open-world embodied agents that couples execution diagnosis with knowledge distillation to improve long-horizon task performance through structured experience organizatio...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13131
• PDF: https://arxiv.org/pdf/2603.13131
• Github: https://github.com/xzw-ustc/Steve-Evolving

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

119 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

📝 Summary:
Video generative models can be adapted for image restoration tasks with minimal training data by treating restoration as a progressive generative process. AI-generated summary Large-scale video genera...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13089
• PDF: https://arxiv.org/pdf/2603.13089
• Project Page: https://zhengsh123.github.io/V-Bridge/
• Github: https://github.com/Zhengsh123/V-Bridge

🔹 Models citing this paper:
• https://huggingface.co/desimfj/V-Bridge

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

116 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Multimodal OCR: Parse Anything from Documents

📝 Summary:
MOCR is a multimodal OCR approach that jointly parses text and graphics into unified representations, enabling structured document reconstruction and supporting end-to-end training with semantic relat...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13032
• PDF: https://arxiv.org/pdf/2603.13032

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

142 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

📝 Summary:
A novel detection framework called UCIP uses quantum statistical mechanics-inspired methods to distinguish between autonomous agents with genuine continuation objectives versus those pursuing continua...

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11382
• PDF: https://arxiv.org/pdf/2603.11382
• Project Page: https://lab.christopheraltman.com/
• Github: https://github.com/christopher-altman/persistence-signal-detector

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

185 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨Can Vision-Language Models Solve the Shell Game?

📝 Summary:
Vision-Language Models struggle with tracking identical visual entities, performing poorly on the VET-Bench testbed. Researchers propose Spatiotemporal Grounded Chain-of-Thought SGCoT to generate object trajectories as intermediate states. This method achieves over 90% accuracy, showing VLMs can ...

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08436
• PDF: https://arxiv.org/pdf/2603.08436
• Project Page: https://vetbench.github.io/
• Github: https://github.com/liutiedong/shellgame

🔹 Models citing this paper:
• https://huggingface.co/tiedong/Molmo2-SGCoT

✨ Datasets citing this paper:
• https://huggingface.co/datasets/tiedong/vetbench
• https://huggingface.co/datasets/tiedong/Molmo2-SGCoT

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

208 views05:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

📝 Summary:
HomeSafe-Bench presents a benchmark for vision-language models to detect unsafe actions by embodied agents in household settings. It also introduces HD-Guard, a hierarchical dual-brain architecture balancing real-time safety monitoring with detection accuracy.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11975
• PDF: https://arxiv.org/pdf/2603.11975
• Project Page: https://pujiayue.github.io/homesafe-bench.github.io/
• Github: https://github.com/pujiayue/HomeSafe-Bench

✨ Spaces citing this paper:
• https://huggingface.co/spaces/pujiayue/HomeSafe-Bench-Leaderboard

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModels #EmbodiedAI #AISafety #Robotics #Benchmark

❤1

138 views08:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval

📝 Summary:
NanoVDR improves visual document retrieval by distilling a large VLM teacher into a small 70M text-only query encoder. This decouples document indexing from query processing, achieving 50x lower latency and 32x fewer parameters with nearly identical quality.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12824
• PDF: https://arxiv.org/pdf/2603.12824
• Project Page: https://huggingface.co/nanovdr

🔹 Models citing this paper:
• https://huggingface.co/nanovdr/NanoVDR-L
• https://huggingface.co/nanovdr/NanoVDR-S-Multi
• https://huggingface.co/nanovdr/NanoVDR-S

✨ Spaces citing this paper:
• https://huggingface.co/spaces/nanovdr/NanoVDR-Demo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisualDocumentRetrieval #ModelDistillation #VLM #InformationRetrieval #DeepLearning

❤1

172 views08:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction

📝 Summary:
This paper presents a novel text-motion retrieval method. It maps joint-angle motion features into Vision Transformer-compatible pseudo-images and uses an enhanced late interaction mechanism. This achieves superior performance and offers interpretable fine-grained text-motion alignments.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09930
• PDF: https://arxiv.org/pdf/2603.09930

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MotionRetrieval #DeepLearning #ComputerVision #AIResearch #NLP

193 views08:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

📝 Summary:
Video Streaming Thinking VST is a novel paradigm for real-time video understanding, enabling AI to think while watching during streaming playback. It optimizes VideoLLMs for responsive, low-latency interaction, showing significantly faster responses and strong performance on various benchmarks.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12262
• PDF: https://arxiv.org/pdf/2603.12262
• Project Page: https://1ranguan.github.io/VST/
• Github: https://github.com/1ranGuan/VST

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoLLMs #RealTimeAI #VideoUnderstanding #AIResearch #MachineLearning

140 views11:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

📝 Summary:
Researchers introduced CreativeBench, a benchmark for evaluating machine creativity in code generation using a quality-novelty metric. They found scaling improves combinatorial creativity but yields diminishing returns for exploration. They also proposed EvoRePE, an inference-time strategy to enh...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11863
• PDF: https://arxiv.org/pdf/2603.11863
• Project Page: https://zethwang.github.io/creativebench.github.io/
• Github: https://github.com/ZethWang/CreativeBench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MachineCreativity #CodeGeneration #AIBenchmark #GenerativeAI #AIResearch

133 views11:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models

📝 Summary:
Think While Watching is a memory-anchored framework enabling multimodal large language models to perform continuous multi-turn video reasoning. It maintains long-range dependencies and boosts efficiency for streaming, significantly outperforming existing benchmarks.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11896
• PDF: https://arxiv.org/pdf/2603.11896
• Github: https://github.com/wl666hhh/Think_While_Watching

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MLLM #VideoReasoning #StreamingAI #AIMemory #AIResearch

166 views11:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

📝 Summary:
AxonAD is an unsupervised anomaly detector for multivariate time series. It detects structural dependency shifts by analyzing predictable multi-head attention query evolution, combining reconstruction with a query mismatch score. It outperforms existing methods.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12916
• PDF: https://arxiv.org/pdf/2603.12916
• Github: https://github.com/iis-esslingen/AxonAD

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AnomalyDetection #TimeSeries #MachineLearning #DeepLearning #UnsupervisedLearning

150 views12:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ECoLAD: Deployment-Oriented Evaluation for Automotive Time-Series Anomaly Detection

📝 Summary:
ECoLaD is a new framework evaluating time-series anomaly detection under compute constraints, critical for in-vehicle systems. It uses efficiency reductions to assess feasibility. Findings show classical methods sustain performance, but deep learning often becomes infeasible before losing accuracy.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10926
• PDF: https://arxiv.org/pdf/2603.10926

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AnomalyDetection #TimeSeries #AutomotiveAI #EdgeAI #DeepLearning

178 views12:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Can Fairness Be Prompted? Prompt-Based Debiasing Strategies in High-Stakes Recommendations

📝 Summary:
This study investigates prompt-based debiasing strategies for LLM recommenders to improve group fairness. It finds that instructing LLMs to be fair can boost fairness by up to 74% while maintaining recommendation effectiveness.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12935
• PDF: https://arxiv.org/pdf/2603.12935

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AI #Fairness #Debiasing #RecommenderSystems

161 views13:06

✨ Explore Data Science 📝 Write your paper