ML Research Hub

✨Communication-Inspired Tokenization for Structured Image Representations

📝 Summary:
COMiT introduces a framework for learning structured, object-centric visual tokens through iterative encoding and flow-matching decoding. This single-transformer approach improves compositional generalization and relational reasoning by creating interpretable token structures.

🔹 Publication Date: Published on Feb 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.20731
• PDF: https://arxiv.org/pdf/2602.20731
• Project Page: https://araachie.github.io/comit/
• Github: https://github.com/araachie/comit

🔹 Models citing this paper:
• https://huggingface.co/cvg-unibe/comit-xl
• https://huggingface.co/cvg-unibe/comit-l
• https://huggingface.co/cvg-unibe/comit-b

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ComputerVision #Transformers #ImageRecognition #RepresentationLearning #AIResearch

212 views09:23

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

📝 Summary:
PhysicEdit addresses physically implausible image editing by modeling edits as predictive physical state transitions. It uses a dual-thinking diffusion framework guided by a vision-language model, greatly enhancing physical realism.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21778
• PDF: https://arxiv.org/pdf/2602.21778
• Project Page: https://liangbingzhao.github.io/statics2dynamics/
• Github: https://github.com/liangbingzhao/PhysicEdit

✨ Datasets citing this paper:
• https://huggingface.co/datasets/metazlb/PhysicTran38K

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageEditing #DiffusionModels #ComputerVision #PhysicsAI #AIResearch

221 views12:40

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:00

This media is not supported in your browser

VIEW IN TELEGRAM

✨EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

📝 Summary:
EmbodMocap is a dual-iPhone system for in-the-wild 4D human-scene reconstruction. It unifies human and scene data in a metric world frame, improving accuracy. This supports embodied AI tasks like animation and robot control.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23205
• PDF: https://arxiv.org/pdf/2602.23205
• Project Page: https://wenjiawang0312.github.io/projects/embodmocap/
• Github: https://github.com/WenjiaWang0312/EmbodMocap

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#EmbodiedAI #4DReconstruction #ComputerVision #Robotics #Animation

230 views11:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

📝 Summary:
MIGM-Shortcut accelerates masked image generation by learning a lightweight model to predict feature evolution velocity from previous features and sampled tokens. This achieves over 4x speedup with maintained quality on state-of-the-art models.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23996
• PDF: https://arxiv.org/pdf/2602.23996
• Github: https://github.com/Kaiwen-Zhu/MIGM-Shortcut

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageGeneration #DeepLearning #GenerativeAI #ComputerVision #AI

276 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

📝 Summary:
SenCache accelerates diffusion model inference by dynamically selecting cache timesteps based on model output sensitivity to input perturbations. This principled framework improves visual quality over existing heuristic methods within similar computational budgets.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24208
• PDF: https://arxiv.org/pdf/2602.24208
• Github: https://github.com/vita-epfl/SenCache

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #AI #MachineLearning #InferenceAcceleration #ComputerVision

195 views03:00

✨ Explore Data Science 📝 Write your paper

✨Mode Seeking meets Mean Seeking for Fast Long Video Generation

📝 Summary:
This paper introduces a Decoupled Diffusion Transformer combining mode seeking and mean seeking for efficient long video generation. It leverages global flow matching for narrative coherence and local distribution matching against a short-video teacher for realism, effectively bridging the fideli...

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24289
• PDF: https://arxiv.org/pdf/2602.24289
• Project Page: https://primecai.github.io/mmm/
• Github: https://primecai.github.io/mmm/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #AIResearch #MachineLearning #ComputerVision

195 views05:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨How to Take a Memorable Picture? Empowering Users with Actionable Feedback

📝 Summary:
This paper introduces Memorability Feedback MemFeed, a new task providing actionable natural language guidance to improve photo memorability. Their method, MemCoach, uses MLLMs and a teacher-student strategy, demonstrating that memorability can be taught and instructed.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21877
• PDF: https://arxiv.org/pdf/2602.21877
• Project Page: https://laitifranz.github.io/MemCoach/
• Github: https://laitifranz.github.io/MemCoach/

✨ Datasets citing this paper:
• https://huggingface.co/datasets/laitifranz/MemBench-InternVL3.5-Eval
• https://huggingface.co/datasets/laitifranz/MemBench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PhotoMemorability #MLLMs #ComputerVision #AIResearch #HumanComputerInteraction

226 views15:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

📝 Summary:
WorldStereo integrates camera-guided video generation and 3D reconstruction using geometric memory modules. These provide camera control and structural priors for multi-view consistent videos, enabling high-quality 3D scene reconstruction.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02049
• PDF: https://arxiv.org/pdf/2603.02049
• Project Page: https://3d.hunyuan.tencent.com/sceneTo3D
• Github: https://github.com/FuchengSu/WorldStereo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #3DReconstruction #ComputerVision #DeepLearning #NeuralRendering

126 views09:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Monocular Mesh Recovery and Body Measurement of Female Saanen Goats

📝 Summary:
This paper introduces a novel 3D body measurement system for Saanen goats. It uses a new parametric shape model and a multi-view RGBD dataset to enable accurate single-view 3D reconstruction and automated measurement of key body dimensions, improving precision livestock farming.

🔹 Publication Date: Published on Feb 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.19896
• PDF: https://arxiv.org/pdf/2602.19896
• Github: https://github.com/bojin-nwafu/Female-Saanen-Goats

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #PrecisionLivestock #AnimalScience #AgriTech

160 views10:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos

📝 Summary:
A large video scene graph dataset, SVG2, and a new model, TRaSER, are introduced. TRaSER generates spatio-temporal scene graphs, significantly improving relation, object, and attribute prediction, and boosting video question answering accuracy.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23543
• PDF: https://arxiv.org/pdf/2602.23543
• Project Page: https://uwgzq.github.io/papers/SVG2/

🔹 Models citing this paper:
• https://huggingface.co/UWGZQ/TRASER

✨ Datasets citing this paper:
• https://huggingface.co/datasets/UWGZQ/Synthetic_Visual_Genome2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoSceneGraphs #SpatioTemporal #ComputerVision #VideoQA #DeepLearning

❤2

217 views12:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

📝 Summary:
EmbodiedSplat provides real-time 3D scene understanding, combining online 3D Gaussian Splatting with CLIP embeddings from streaming images. It simultaneously reconstructs and semantically comprehends 3D scenes using a novel sparse coefficients field and CLIP global codebook for efficiency and gen...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04254
• PDF: https://arxiv.org/pdf/2603.04254
• Project Page: https://0nandon.github.io/EmbodiedSplat/
• Github: https://github.com/0nandon/EmbodiedSplat

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DSceneUnderstanding #3DGaussianSplatting #ComputerVision #AI #NeuralRendering

❤1

167 views08:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GroupEnsemble: Efficient Uncertainty Estimation for DETR-based Object Detection

📝 Summary:
DETR models lack spatial uncertainty and current estimation methods are too costly. GroupEnsemble efficiently estimates uncertainty by using independent query groups in a single forward pass with an attention mask. This outperforms Deep Ensembles at a fraction of the cost.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01847
• PDF: https://arxiv.org/pdf/2603.01847

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ObjectDetection #UncertaintyEstimation #DETR #ComputerVision #MachineLearning

173 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

📝 Summary:
This paper introduces InfinityStory, a novel framework, dataset, and model for long-form video generation. It tackles challenges in background consistency and seamless multi-subject transitions, achieving high consistency and smoother transitions on VBench.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03646
• PDF: https://arxiv.org/pdf/2603.03646

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ComputerVision

❤2

185 views10:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Specificity-aware reinforcement learning for fine-grained open-world classification

📝 Summary:
A novel RL framework SpeciaRL improves large multimodal models for open-world fine-grained classification. It enhances prediction specificity while maintaining correctness using a dynamic verifier-based reward. Experiments show SpeciaRL achieves the best trade-off.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03197
• PDF: https://arxiv.org/pdf/2603.03197

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #MachineLearning #ComputerVision #AI #MultimodalAI

130 views12:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HDINO: A Concise and Efficient Open-Vocabulary Detector

📝 Summary:
HDINO is an efficient open-vocabulary detector using a two-stage training strategy. It employs One-to-Many Semantic Alignment and lightweight feature fusion, avoiding manual data curation and complex feature extraction. HDINO achieves superior performance on COCO with less training data.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02924
• PDF: https://arxiv.org/pdf/2603.02924
• Github: https://github.com/HaoZ416/HDINO

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ObjectDetection #ComputerVision #OpenVocabulary #DeepLearning #AIResearch

179 views12:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

📝 Summary:
STMI is a novel multi-modal ReID framework that improves object re-identification. It uses segmentation-guided modulation for foreground enhancement, token reallocation for compact features, and cross-modal hypergraph interaction to capture high-order semantic relationships.

🔹 Publication Date: Published on Feb 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00695
• PDF: https://arxiv.org/pdf/2603.00695

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ObjectReID #ComputerVision #DeepLearning #MultiModalAI #STMI

190 views11:55

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform