ML Research Hub
32.5K subscribers
5.87K photos
377 videos
24 files
6.35K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration

📝 Summary:
HybridStitch accelerates text-to-image generation by intelligently combining large and small diffusion models. It uses the large model for complex image regions and the smaller model for simpler parts, even within a single denoising step. This approach speeds up generation by 1.83x on Stable Diff...

🔹 Publication Date: Published on Mar 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.07815
• PDF: https://arxiv.org/pdf/2603.07815

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning

📝 Summary:
MM-CondChain benchmark evaluates multimodal large language models on deep compositional visual reasoning through multi-layer conditional workflows with mechanically verifiable conditions. AI-generated...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12266
• PDF: https://arxiv.org/pdf/2603.12266
• Project Page: https://accio-lab.github.io/MM-CondChain
• Github: https://accio-lab.github.io/MM-CondChain

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation

📝 Summary:
A self-evolving framework for open-world embodied agents that couples execution diagnosis with knowledge distillation to improve long-horizon task performance through structured experience organizatio...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13131
• PDF: https://arxiv.org/pdf/2603.13131
• Github: https://github.com/xzw-ustc/Steve-Evolving

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration

📝 Summary:
Video generative models can be adapted for image restoration tasks with minimal training data by treating restoration as a progressive generative process. AI-generated summary Large-scale video genera...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13089
• PDF: https://arxiv.org/pdf/2603.13089
• Project Page: https://zhengsh123.github.io/V-Bridge/
• Github: https://github.com/Zhengsh123/V-Bridge

🔹 Models citing this paper:
https://huggingface.co/desimfj/V-Bridge

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Multimodal OCR: Parse Anything from Documents

📝 Summary:
MOCR is a multimodal OCR approach that jointly parses text and graphics into unified representations, enabling structured document reconstruction and supporting end-to-end training with semantic relat...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13032
• PDF: https://arxiv.org/pdf/2603.13032

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

📝 Summary:
A novel detection framework called UCIP uses quantum statistical mechanics-inspired methods to distinguish between autonomous agents with genuine continuation objectives versus those pursuing continua...

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11382
• PDF: https://arxiv.org/pdf/2603.11382
• Project Page: https://lab.christopheraltman.com/
• Github: https://github.com/christopher-altman/persistence-signal-detector

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Can Vision-Language Models Solve the Shell Game?

📝 Summary:
Vision-Language Models struggle with tracking identical visual entities, performing poorly on the VET-Bench testbed. Researchers propose Spatiotemporal Grounded Chain-of-Thought SGCoT to generate object trajectories as intermediate states. This method achieves over 90% accuracy, showing VLMs can ...

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08436
• PDF: https://arxiv.org/pdf/2603.08436
• Project Page: https://vetbench.github.io/
• Github: https://github.com/liutiedong/shellgame

🔹 Models citing this paper:
https://huggingface.co/tiedong/Molmo2-SGCoT

Datasets citing this paper:
https://huggingface.co/datasets/tiedong/vetbench
https://huggingface.co/datasets/tiedong/Molmo2-SGCoT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios

📝 Summary:
HomeSafe-Bench presents a benchmark for vision-language models to detect unsafe actions by embodied agents in household settings. It also introduces HD-Guard, a hierarchical dual-brain architecture balancing real-time safety monitoring with detection accuracy.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11975
• PDF: https://arxiv.org/pdf/2603.11975
• Project Page: https://pujiayue.github.io/homesafe-bench.github.io/
• Github: https://github.com/pujiayue/HomeSafe-Bench

Spaces citing this paper:
https://huggingface.co/spaces/pujiayue/HomeSafe-Bench-Leaderboard

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #EmbodiedAI #AISafety #Robotics #Benchmark
1
NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval

📝 Summary:
NanoVDR improves visual document retrieval by distilling a large VLM teacher into a small 70M text-only query encoder. This decouples document indexing from query processing, achieving 50x lower latency and 32x fewer parameters with nearly identical quality.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12824
• PDF: https://arxiv.org/pdf/2603.12824
• Project Page: https://huggingface.co/nanovdr

🔹 Models citing this paper:
https://huggingface.co/nanovdr/NanoVDR-L
https://huggingface.co/nanovdr/NanoVDR-S-Multi
https://huggingface.co/nanovdr/NanoVDR-S

Spaces citing this paper:
https://huggingface.co/spaces/nanovdr/NanoVDR-Demo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisualDocumentRetrieval #ModelDistillation #VLM #InformationRetrieval #DeepLearning
1
Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction

📝 Summary:
This paper presents a novel text-motion retrieval method. It maps joint-angle motion features into Vision Transformer-compatible pseudo-images and uses an enhanced late interaction mechanism. This achieves superior performance and offers interpretable fine-grained text-motion alignments.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09930
• PDF: https://arxiv.org/pdf/2603.09930

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MotionRetrieval #DeepLearning #ComputerVision #AIResearch #NLP
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

📝 Summary:
Video Streaming Thinking VST is a novel paradigm for real-time video understanding, enabling AI to think while watching during streaming playback. It optimizes VideoLLMs for responsive, low-latency interaction, showing significantly faster responses and strong performance on various benchmarks.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12262
• PDF: https://arxiv.org/pdf/2603.12262
• Project Page: https://1ranguan.github.io/VST/
• Github: https://github.com/1ranGuan/VST

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoLLMs #RealTimeAI #VideoUnderstanding #AIResearch #MachineLearning
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

📝 Summary:
Researchers introduced CreativeBench, a benchmark for evaluating machine creativity in code generation using a quality-novelty metric. They found scaling improves combinatorial creativity but yields diminishing returns for exploration. They also proposed EvoRePE, an inference-time strategy to enh...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11863
• PDF: https://arxiv.org/pdf/2603.11863
• Project Page: https://zethwang.github.io/creativebench.github.io/
• Github: https://github.com/ZethWang/CreativeBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MachineCreativity #CodeGeneration #AIBenchmark #GenerativeAI #AIResearch
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models

📝 Summary:
Think While Watching is a memory-anchored framework enabling multimodal large language models to perform continuous multi-turn video reasoning. It maintains long-range dependencies and boosts efficiency for streaming, significantly outperforming existing benchmarks.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11896
• PDF: https://arxiv.org/pdf/2603.11896
• Github: https://github.com/wl666hhh/Think_While_Watching

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MLLM #VideoReasoning #StreamingAI #AIMemory #AIResearch
Surprised by Attention: Predictable Query Dynamics for Time Series Anomaly Detection

📝 Summary:
AxonAD is an unsupervised anomaly detector for multivariate time series. It detects structural dependency shifts by analyzing predictable multi-head attention query evolution, combining reconstruction with a query mismatch score. It outperforms existing methods.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12916
• PDF: https://arxiv.org/pdf/2603.12916
• Github: https://github.com/iis-esslingen/AxonAD

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AnomalyDetection #TimeSeries #MachineLearning #DeepLearning #UnsupervisedLearning
ECoLAD: Deployment-Oriented Evaluation for Automotive Time-Series Anomaly Detection

📝 Summary:
ECoLaD is a new framework evaluating time-series anomaly detection under compute constraints, critical for in-vehicle systems. It uses efficiency reductions to assess feasibility. Findings show classical methods sustain performance, but deep learning often becomes infeasible before losing accuracy.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10926
• PDF: https://arxiv.org/pdf/2603.10926

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AnomalyDetection #TimeSeries #AutomotiveAI #EdgeAI #DeepLearning
Can Fairness Be Prompted? Prompt-Based Debiasing Strategies in High-Stakes Recommendations

📝 Summary:
This study investigates prompt-based debiasing strategies for LLM recommenders to improve group fairness. It finds that instructing LLMs to be fair can boost fairness by up to 74% while maintaining recommendation effectiveness.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12935
• PDF: https://arxiv.org/pdf/2603.12935

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AI #Fairness #Debiasing #RecommenderSystems
EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery

📝 Summary:
EvoScientist is an evolving multi-agent AI framework that enhances scientific discovery. It uses persistent memory to continuously learn from past interactions, improving scientific idea generation and experimental execution success rates. Experiments show it outperforms state-of-the-art systems ...

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08127
• PDF: https://arxiv.org/pdf/2603.08127
• Github: https://github.com/EvoScientist/EvoScientist

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #MultiAgentSystems #ScientificDiscovery #EvolutionaryAI #AIResearch
SDF-Net: Structure-Aware Disentangled Feature Learning for Opticall-SAR Ship Re-identification

📝 Summary:
SDF-Net improves optical-SAR ship re-identification by leveraging stable ship geometry despite radiometric differences. It extracts scale-invariant structural features and disentangles modality-invariant and modality-specific cues to enhance discrimination.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12588
• PDF: https://arxiv.org/pdf/2603.12588
• Github: https://github.com/cfrfree/SDF-Net

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Do You See What I Am Pointing At? Gesture-Based Egocentric Video Question Answering

📝 Summary:
EgoPointVQA presents a dataset and benchmark for gesture-grounded egocentric question answering, along with Hand Intent Tokens (HINT) that encode 3D hand keypoints to improve pointing intent interpret...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12533
• PDF: https://arxiv.org/pdf/2603.12533
• Project Page: https://yuuraa.github.io/papers/choi2026egovqa/
• Github: https://github.com/Yuuraa/EgoPointVQA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

📝 Summary:
Language models prefer correct information from a 'Compression-Consistency Principle': next-token prediction favors shorter, more internally consistent data. Truth bias is a compression side effect, not inherent truth-seeking, emerging when false alternatives are hard to compress.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11749
• PDF: https://arxiv.org/pdf/2603.11749
• Github: https://github.com/Rai220/compression-drives-truth/blob/master/paper_v2.md

Datasets citing this paper:
https://huggingface.co/datasets/krestnikov/compression-drives-truth

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Taking Shortcuts for Categorical VQA Using Super Neurons

📝 Summary:
This paper introduces Super Neurons SNs, scalar activations replacing Sparse Attention Vectors SAVs for Vision Language Model classification. SNs enable extreme early exiting from shallow layers, improving classification performance and achieving up to 5.10x speedup.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10781
• PDF: https://arxiv.org/pdf/2603.10781

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research