ML Research Hub
32.8K subscribers
5.63K photos
357 videos
24 files
6.09K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Spanning the Visual Analogy Space with a Weight Basis of LoRAs

📝 Summary:
LoRWeB improves visual analogy learning by dynamically composing a basis of LoRA modules. It uses an encoder to select and weigh multiple LoRAs at inference time, rather than a single fixed module. This achieves state-of-the-art performance and significantly better generalization for image manipu...

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15727
• PDF: https://arxiv.org/pdf/2602.15727
• Project Page: https://research.nvidia.com/labs/par/lorweb/
• Github: https://github.com/NVlabs/LoRWeB

Datasets citing this paper:
https://huggingface.co/datasets/hilamanor/LoRWeB_evalset

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LoRA #VisualAnalogies #DeepLearning #AI #ComputerVision
Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

📝 Summary:
This paper presents a conditional binary segmentation framework for robust cross-view object correspondence. It uses cycle-consistency training to create view-invariant representations without ground-truth annotations. This approach achieves state-of-the-art performance on relevant benchmarks.

🔹 Publication Date: Published on Feb 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18996
• PDF: https://arxiv.org/pdf/2602.18996
• Github: https://github.com/shannany0606/CCMP

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #MachineLearning #ObjectCorrespondence #ImageSegmentation #SelfSupervisedLearning
1
VLANeXt: Recipes for Building Strong VLA Models

📝 Summary:
This paper systematically analyzes Vision-Language-Action VLA models through a unified framework, distilling 12 key design principles. The resulting VLANeXt model achieves superior performance on benchmarks and strong real-world generalization.

🔹 Publication Date: Published on Feb 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18532
• PDF: https://arxiv.org/pdf/2602.18532
• Project Page: https://dravenalg.github.io/VLANeXt/
• Github: https://github.com/DravenALG/awesome-vla

🔹 Models citing this paper:
https://huggingface.co/DravenALG/VLANeXt

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLANeXt #VLAModels #ComputerVision #Robotics #AIResearch
1
ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

📝 Summary:
ImplicitRDP is an end-to-end visual-force diffusion policy that integrates asynchronous vision and force sensing using structural slow-fast learning and virtual-target regularization. It improves reactivity and success in contact-rich manipulation tasks.

🔹 Publication Date: Published on Dec 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10946
• PDF: https://arxiv.org/pdf/2512.10946
• Project Page: https://implicit-rdp.github.io
• Github: https://github.com/Chen-Wendi/ImplicitRDP

🔹 Models citing this paper:
https://huggingface.co/WendiChen/ImplicitRDP_model

Datasets citing this paper:
https://huggingface.co/datasets/WendiChen/ImplicitRDP_dataset

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #DiffusionPolicy #MachineLearning #Manipulation #ComputerVision
LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency

📝 Summary:
LaS-Comp is a zero-shot 3D shape completion method that leverages 3D foundation models. It uses a two-stage approach for faithful reconstruction and seamless boundary refinement. This training-free framework outperforms prior state-of-the-art methods.

🔹 Publication Date: Published on Feb 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18735
• PDF: https://arxiv.org/pdf/2602.18735
• Github: https://github.com/DavidYan2001/LaS-Comp

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DCompletion #ZeroShotLearning #FoundationModels #ComputerVision #AI
Communication-Inspired Tokenization for Structured Image Representations

📝 Summary:
COMiT introduces a framework for learning structured, object-centric visual tokens through iterative encoding and flow-matching decoding. This single-transformer approach improves compositional generalization and relational reasoning by creating interpretable token structures.

🔹 Publication Date: Published on Feb 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.20731
• PDF: https://arxiv.org/pdf/2602.20731
• Project Page: https://araachie.github.io/comit/
• Github: https://github.com/araachie/comit

🔹 Models citing this paper:
https://huggingface.co/cvg-unibe/comit-xl
https://huggingface.co/cvg-unibe/comit-l
https://huggingface.co/cvg-unibe/comit-b

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #Transformers #ImageRecognition #RepresentationLearning #AIResearch
From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

📝 Summary:
PhysicEdit addresses physically implausible image editing by modeling edits as predictive physical state transitions. It uses a dual-thinking diffusion framework guided by a vision-language model, greatly enhancing physical realism.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21778
• PDF: https://arxiv.org/pdf/2602.21778
• Project Page: https://liangbingzhao.github.io/statics2dynamics/
• Github: https://github.com/liangbingzhao/PhysicEdit

Datasets citing this paper:
https://huggingface.co/datasets/metazlb/PhysicTran38K

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageEditing #DiffusionModels #ComputerVision #PhysicsAI #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

📝 Summary:
EmbodMocap is a dual-iPhone system for in-the-wild 4D human-scene reconstruction. It unifies human and scene data in a metric world frame, improving accuracy. This supports embodied AI tasks like animation and robot control.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23205
• PDF: https://arxiv.org/pdf/2602.23205
• Project Page: https://wenjiawang0312.github.io/projects/embodmocap/
• Github: https://github.com/WenjiaWang0312/EmbodMocap

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#EmbodiedAI #4DReconstruction #ComputerVision #Robotics #Animation
Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

📝 Summary:
MIGM-Shortcut accelerates masked image generation by learning a lightweight model to predict feature evolution velocity from previous features and sampled tokens. This achieves over 4x speedup with maintained quality on state-of-the-art models.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23996
• PDF: https://arxiv.org/pdf/2602.23996
• Github: https://github.com/Kaiwen-Zhu/MIGM-Shortcut

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #DeepLearning #GenerativeAI #ComputerVision #AI
SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

📝 Summary:
SenCache accelerates diffusion model inference by dynamically selecting cache timesteps based on model output sensitivity to input perturbations. This principled framework improves visual quality over existing heuristic methods within similar computational budgets.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24208
• PDF: https://arxiv.org/pdf/2602.24208
• Github: https://github.com/vita-epfl/SenCache

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #AI #MachineLearning #InferenceAcceleration #ComputerVision
Media is too big
VIEW IN TELEGRAM
Mode Seeking meets Mean Seeking for Fast Long Video Generation

📝 Summary:
This paper introduces a Decoupled Diffusion Transformer combining mode seeking and mean seeking for efficient long video generation. It leverages global flow matching for narrative coherence and local distribution matching against a short-video teacher for realism, effectively bridging the fideli...

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24289
• PDF: https://arxiv.org/pdf/2602.24289
• Project Page: https://primecai.github.io/mmm/
• Github: https://primecai.github.io/mmm/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #AIResearch #MachineLearning #ComputerVision
How to Take a Memorable Picture? Empowering Users with Actionable Feedback

📝 Summary:
This paper introduces Memorability Feedback MemFeed, a new task providing actionable natural language guidance to improve photo memorability. Their method, MemCoach, uses MLLMs and a teacher-student strategy, demonstrating that memorability can be taught and instructed.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21877
• PDF: https://arxiv.org/pdf/2602.21877
• Project Page: https://laitifranz.github.io/MemCoach/
• Github: https://laitifranz.github.io/MemCoach/

Datasets citing this paper:
https://huggingface.co/datasets/laitifranz/MemBench-InternVL3.5-Eval
https://huggingface.co/datasets/laitifranz/MemBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PhotoMemorability #MLLMs #ComputerVision #AIResearch #HumanComputerInteraction
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

📝 Summary:
WorldStereo integrates camera-guided video generation and 3D reconstruction using geometric memory modules. These provide camera control and structural priors for multi-view consistent videos, enabling high-quality 3D scene reconstruction.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02049
• PDF: https://arxiv.org/pdf/2603.02049
• Project Page: https://3d.hunyuan.tencent.com/sceneTo3D
• Github: https://github.com/FuchengSu/WorldStereo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #3DReconstruction #ComputerVision #DeepLearning #NeuralRendering
Monocular Mesh Recovery and Body Measurement of Female Saanen Goats

📝 Summary:
This paper introduces a novel 3D body measurement system for Saanen goats. It uses a new parametric shape model and a multi-view RGBD dataset to enable accurate single-view 3D reconstruction and automated measurement of key body dimensions, improving precision livestock farming.

🔹 Publication Date: Published on Feb 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.19896
• PDF: https://arxiv.org/pdf/2602.19896
• Github: https://github.com/bojin-nwafu/Female-Saanen-Goats

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #PrecisionLivestock #AnimalScience #AgriTech
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos

📝 Summary:
A large video scene graph dataset, SVG2, and a new model, TRaSER, are introduced. TRaSER generates spatio-temporal scene graphs, significantly improving relation, object, and attribute prediction, and boosting video question answering accuracy.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23543
• PDF: https://arxiv.org/pdf/2602.23543
• Project Page: https://uwgzq.github.io/papers/SVG2/

🔹 Models citing this paper:
https://huggingface.co/UWGZQ/TRASER

Datasets citing this paper:
https://huggingface.co/datasets/UWGZQ/Synthetic_Visual_Genome2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoSceneGraphs #SpatioTemporal #ComputerVision #VideoQA #DeepLearning
2
EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

📝 Summary:
EmbodiedSplat provides real-time 3D scene understanding, combining online 3D Gaussian Splatting with CLIP embeddings from streaming images. It simultaneously reconstructs and semantically comprehends 3D scenes using a novel sparse coefficients field and CLIP global codebook for efficiency and gen...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04254
• PDF: https://arxiv.org/pdf/2603.04254
• Project Page: https://0nandon.github.io/EmbodiedSplat/
• Github: https://github.com/0nandon/EmbodiedSplat

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DSceneUnderstanding #3DGaussianSplatting #ComputerVision #AI #NeuralRendering
1
GroupEnsemble: Efficient Uncertainty Estimation for DETR-based Object Detection

📝 Summary:
DETR models lack spatial uncertainty and current estimation methods are too costly. GroupEnsemble efficiently estimates uncertainty by using independent query groups in a single forward pass with an attention mask. This outperforms Deep Ensembles at a fraction of the cost.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01847
• PDF: https://arxiv.org/pdf/2603.01847

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #UncertaintyEstimation #DETR #ComputerVision #MachineLearning
InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

📝 Summary:
This paper introduces InfinityStory, a novel framework, dataset, and model for long-form video generation. It tackles challenges in background consistency and seamless multi-subject transitions, achieving high consistency and smoother transitions on VBench.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03646
• PDF: https://arxiv.org/pdf/2603.03646

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ComputerVision
2
Specificity-aware reinforcement learning for fine-grained open-world classification

📝 Summary:
A novel RL framework SpeciaRL improves large multimodal models for open-world fine-grained classification. It enhances prediction specificity while maintaining correctness using a dynamic verifier-based reward. Experiments show SpeciaRL achieves the best trade-off.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03197
• PDF: https://arxiv.org/pdf/2603.03197

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #MachineLearning #ComputerVision #AI #MultimodalAI
HDINO: A Concise and Efficient Open-Vocabulary Detector

📝 Summary:
HDINO is an efficient open-vocabulary detector using a two-stage training strategy. It employs One-to-Many Semantic Alignment and lightweight feature fusion, avoiding manual data curation and complex feature extraction. HDINO achieves superior performance on COCO with less training data.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02924
• PDF: https://arxiv.org/pdf/2603.02924
• Github: https://github.com/HaoZ416/HDINO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #ComputerVision #OpenVocabulary #DeepLearning #AIResearch
STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

📝 Summary:
STMI is a novel multi-modal ReID framework that improves object re-identification. It uses segmentation-guided modulation for foreground enhancement, token reallocation for compact features, and cross-modal hypergraph interaction to capture high-order semantic relationships.

🔹 Publication Date: Published on Feb 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00695
• PDF: https://arxiv.org/pdf/2603.00695

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectReID #ComputerVision #DeepLearning #MultiModalAI #STMI