ML Research Hub – Telegram

ML Research Hub

32.5K subscribers

5.97K photos

384 videos

24 files

6.46K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.5K subscribers

ML Research Hub

✨A Mixed Diet Makes DINO An Omnivorous Vision Encoder

📝 Summary:
The Omnivorous Vision Encoder learns modality-agnostic features by aligning multi-modal scene inputs and distilling semantics from a frozen teacher model. This resolves poor cross-modal alignment in existing encoders, yielding consistent, powerful embeddings for various modalities.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24181
• PDF: https://arxiv.org/pdf/2602.24181

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #ComputerVision #DeepLearning #SelfSupervisedLearning #AIResearch

❤1

178 views10:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HyPER-GAN: Hybrid Patch-Based Image-to-Image Translation for Real-Time Photorealism Enhancement

📝 Summary:
HyPER-GAN is a lightweight U-Net based model for real-time photorealism enhancement. Its hybrid training strategy, using real-world patches, improves visual realism, semantic consistency, and inference speed over state-of-the-art methods.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10604
• PDF: https://arxiv.org/pdf/2603.10604
• Github: https://github.com/stefanos50/HyPER-GAN

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GAN #ComputerVision #DeepLearning #ImageProcessing #Photorealism

257 views11:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Visual-ERM: Reward Modeling for Visual Equivalence

📝 Summary:
Visual-ERM is a multimodal generative reward model providing fine-grained visual feedback for vision-to-code tasks. It significantly improves reinforcement learning performance for chart, table, and SVG parsing, demonstrating that fine-grained visual supervision is essential.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13224
• PDF: https://arxiv.org/pdf/2603.13224
• Github: https://github.com/InternLM/Visual-ERM

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #ComputerVision #GenerativeAI #AI #DataScience

143 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

📝 Summary:
SimRecon reconstructs cluttered scenes from real videos using a Perception-Generation-Simulation pipeline. It employs Active Viewpoint Optimization for visual fidelity and a Scene Graph Synthesizer for physical plausibility. This enables superior compositional scene representations for simulation...

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02133
• PDF: https://arxiv.org/pdf/2603.02133
• Project Page: https://xiac20.github.io/SimRecon/
• Github: https://github.com/xiac20/SimRecon

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SceneReconstruction #ComputerVision #AI #Simulation #3DReconstruction

138 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

📝 Summary:
Cheers is a unified multimodal model that decouples visual details from semantic representations for efficient joint optimization of understanding and generation. It employs a vision tokenizer, LLM-based Transformer, and cascaded flow matching. Cheers achieves state-of-the-art performance with 4x...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12793
• PDF: https://arxiv.org/pdf/2603.12793
• Project Page: https://huggingface.co/ai9stars/Cheers
• Github: https://github.com/AI9Stars/Cheers

🔹 Models citing this paper:
• https://huggingface.co/ai9stars/Cheers

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #LLM #ComputerVision #GenerativeAI #AIResearch

129 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction

📝 Summary:
This paper presents a novel text-motion retrieval method. It maps joint-angle motion features into Vision Transformer-compatible pseudo-images and uses an enhanced late interaction mechanism. This achieves superior performance and offers interpretable fine-grained text-motion alignments.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09930
• PDF: https://arxiv.org/pdf/2603.09930

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MotionRetrieval #DeepLearning #ComputerVision #AIResearch #NLP

221 views08:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

📝 Summary:
SNCE is a novel training objective for large-codebook discrete image generators. It supervises models with a soft categorical distribution over neighboring tokens, based on embedding proximity, instead of hard one-hot targets. This approach significantly improves convergence speed and overall gen...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15150
• PDF: https://arxiv.org/pdf/2603.15150

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageGeneration #DeepLearning #ComputerVision #GeometryAware #AIResearch

129 views07:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

📝 Summary:
STALL is a training-free, model-agnostic detector for generated videos. It jointly models spatial and temporal evidence from real-data statistics within a probabilistic framework. STALL consistently outperforms prior image and video-based baselines, improving reliable detection.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15026
• PDF: https://arxiv.org/pdf/2603.15026
• Project Page: https://omerbenhayun.github.io/stall-video/
• Github: https://github.com/OmerBenHayun/stall-video

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Deepfakes #VideoDetection #ComputerVision #AI #DigitalForensics

83 views08:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

📝 Summary:
GlyphPrinter improves visual text rendering by addressing glyph accuracy. It introduces Region-Grouped DPO R-GDPO with region-level preferences from the GlyphCorrector dataset, significantly enhancing precision. This outperforms existing methods.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15616
• PDF: https://arxiv.org/pdf/2603.15616
• Project Page: https://henghuiding.com/GlyphPrinter/
• Github: https://github.com/FudanCVL/GlyphPrinter

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GlyphRendering #DeepLearning #ComputerVision #AIResearch #TextRendering

102 views08:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Learning Latent Proxies for Controllable Single-Image Relighting

📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch

91 views09:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training

📝 Summary:
IOMM is a data-efficient framework for UMM visual generation. It pre-trains with image-only data then fine-tunes with mixed data, achieving SOTA performance while significantly reducing computational costs.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16139
• PDF: https://arxiv.org/pdf/2603.16139
• Github: https://github.com/LINs-lab/IOMM

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#UMMVisualGeneration #MaskedModeling #EfficientAI #ComputerVision #GenerativeAI

67 views07:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation

📝 Summary:
Waypoint Diffusion Transformers WiT address trajectory conflicts in pixel-space flow matching using semantic waypoints from pre-trained vision models. WiT disentangles generation paths into segments, accelerating training convergence. It outperforms pixel-space baselines and speeds up JiT trainin...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15132
• PDF: https://arxiv.org/pdf/2603.15132
• Project Page: https://hainuo-wang.github.io/WiT/
• Github: https://github.com/hainuo-wang/WiT

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #Transformers #ComputerVision #DeepLearning #AI

106 views09:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models

📝 Summary:
VLA models struggle to integrate visual detail for action generation. DeepVision-VLA enhances visual representations via multi-level feature injection and action-guided pruning. This significantly boosts performance on robotic tasks.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15618
• PDF: https://arxiv.org/pdf/2603.15618
• Project Page: https://deepvision-vla.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VLAModels #ComputerVision #Robotics #DeepLearning #FoundationModels

147 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Video-CoE: Reinforcing Video Event Prediction via Chain of Events

📝 Summary:
Video-CoE introduces a Chain of Events CoE paradigm to improve video event prediction. It addresses MLLM limitations in logical reasoning and visual utilization by constructing temporal event chains and using enhanced training. CoE achieves state-of-the-art performance on VEP benchmarks.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14935
• PDF: https://arxiv.org/pdf/2603.14935

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoEventPrediction #ChainOfEvents #MLLM #ComputerVision #AI

105 views09:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass

📝 Summary:
CHROMM is a unified framework that jointly reconstructs cameras, scene point clouds, and human meshes from multi-person multi-view videos. It integrates strong priors, handles scale discrepancies, and uses multi-view fusion for faster, more robust human-scene reconstruction.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12789
• PDF: https://arxiv.org/pdf/2603.12789
• Project Page: https://nstar1125.github.io/chromm
• Github: https://nstar1125.github.io/chromm/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #HumanSceneReconstruction #MultiViewVideo #AIResearch

102 views16:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

📝 Summary:
V-JEPA 2.1 is a self-supervised model learning dense visual representations for images and videos. It combines dense predictive loss, deep self-supervision, multi-modal tokenizers, and scaling to achieve state-of-the-art performance across various benchmarks, significantly advancing visual unders...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14482
• PDF: https://arxiv.org/pdf/2603.14482
• Project Page: https://ai.meta.com/blog/v-jepa-2-world-model-benchmarks/
• Github: https://github.com/facebookresearch/vjepa2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SelfSupervisedLearning #ComputerVision #DeepLearning #AI #VideoUnderstanding

139 views19:01

✨ Explore Data Science 📝 Write your paper