ML Research Hub
32.6K subscribers
5.77K photos
367 videos
24 files
6.24K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Specificity-aware reinforcement learning for fine-grained open-world classification

📝 Summary:
A novel RL framework SpeciaRL improves large multimodal models for open-world fine-grained classification. It enhances prediction specificity while maintaining correctness using a dynamic verifier-based reward. Experiments show SpeciaRL achieves the best trade-off.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03197
• PDF: https://arxiv.org/pdf/2603.03197

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #MachineLearning #ComputerVision #AI #MultimodalAI
HDINO: A Concise and Efficient Open-Vocabulary Detector

📝 Summary:
HDINO is an efficient open-vocabulary detector using a two-stage training strategy. It employs One-to-Many Semantic Alignment and lightweight feature fusion, avoiding manual data curation and complex feature extraction. HDINO achieves superior performance on COCO with less training data.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02924
• PDF: https://arxiv.org/pdf/2603.02924
• Github: https://github.com/HaoZ416/HDINO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #ComputerVision #OpenVocabulary #DeepLearning #AIResearch
STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

📝 Summary:
STMI is a novel multi-modal ReID framework that improves object re-identification. It uses segmentation-guided modulation for foreground enhancement, token reallocation for compact features, and cross-modal hypergraph interaction to capture high-order semantic relationships.

🔹 Publication Date: Published on Feb 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00695
• PDF: https://arxiv.org/pdf/2603.00695

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectReID #ComputerVision #DeepLearning #MultiModalAI #STMI
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching

📝 Summary:
Fast-FoundationStereo achieves real-time zero-shot stereo matching, bridging the gap between slow robust models and fast specialized ones. It employs distillation, architecture search, and pruning, running over 10x faster with similar accuracy to prior foundation models. This sets a new state-of-...

🔹 Publication Date: Published on Dec 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11130
• PDF: https://arxiv.org/pdf/2512.11130
• Project Page: https://nvlabs.github.io/Fast-FoundationStereo/
• Github: https://github.com/NVlabs/Fast-FoundationStereo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#StereoMatching #ComputerVision #RealTimeAI #ZeroShotLearning #DeepLearning
1
PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

📝 Summary:
PixARMesh reconstructs complete 3D indoor scene meshes from a single image. It uses a unified model with cross-attention and autoregressive generation to directly predict layout and geometry, producing high-quality, lightweight meshes.

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05888
• PDF: https://arxiv.org/pdf/2603.05888
• Project Page: https://mlpc-ucsd.github.io/PixARMesh/
• Github: https://github.com/mlpc-ucsd/PixARMesh

🔹 Models citing this paper:
https://huggingface.co/zx1239856/PixARMesh-EdgeRunner
https://huggingface.co/zx1239856/PixARMesh-BPT

Datasets citing this paper:
https://huggingface.co/datasets/zx1239856/3d-front-ar-packed
https://huggingface.co/datasets/zx1239856/PixARMesh-eval-data

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #DeepLearning #SingleView3D #MeshGeneration
Layer by layer, module by module: Choose both for optimal OOD probing of ViT

📝 Summary:
Intermediate layers in ViTs provide better representations. Performance degradation in deeper layers is caused by distribution shift. Optimal probing depends on shift magnitude: FFN activation for strong shift, MHA output for weak shift.

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05280
• PDF: https://arxiv.org/pdf/2603.05280
• Github: https://github.com/ambroiseodt/vit-probing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ViT #OOD #DeepLearning #RepresentationLearning #ComputerVision
1
EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

📝 Summary:
EffectMaker is a unified framework for reference-based VFX customization. It uses a multimodal language model and diffusion transformer for semantic-visual guidance, generating high-quality effects consistently without per-effect fine-tuning. This is supported by a large synthetic dataset.

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06014
• PDF: https://arxiv.org/pdf/2603.06014
• Project Page: https://effectmaker.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VFX #GenerativeAI #DiffusionModels #MultimodalAI #ComputerVision
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

📝 Summary:
CoCo is a code-driven framework for text-to-image generation, using executable code for precise spatial layout and structured image creation. It significantly outperforms natural language CoT methods, enabling more controllable and accurate image synthesis.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08652
• PDF: https://arxiv.org/pdf/2603.08652

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextToImage #GenerativeAI #AIResearch #CodeDrivenAI #ComputerVision
1
Concept-Guided Fine-Tuning: Steering ViTs away from Spurious Correlations to Improve Robustness

📝 Summary:
A novel fine-tuning method improves Vision Transformer robustness to distribution shifts. It aligns ViT attention with AI-generated concept masks, shifting focus from spurious correlations to semantic features. This boosts out-of-distribution performance and model interpretability.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08309
• PDF: https://arxiv.org/pdf/2603.08309
• Project Page: https://yonisgit.github.io/concept-ft/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #ComputerVision #VisionTransformers #MLRobustness #ModelInterpretability
TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events

📝 Summary:
TAPFormer is a new transformer framework for robust arbitrary point tracking. It uses Transient Asynchronous Fusion to bridge low-rate frames and high-rate events, and Cross-modal Locally Weighted Fusion for adaptive attention. This method significantly outperforms existing trackers.

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04989
• PDF: https://arxiv.org/pdf/2603.04989
• Project Page: https://tapformer.github.io/
• Github: https://github.com/ljx1002/TAPFormer

🔹 Models citing this paper:
https://huggingface.co/ljx1002/tapformer

Datasets citing this paper:
https://huggingface.co/datasets/ljx1002/tapformer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PointTracking #Transformers #ComputerVision #EventCameras #DeepLearning
TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery

📝 Summary:
TALON is a test-time adaptation framework for on-the-fly category discovery. It dynamically updates prototypes and encoder parameters, while calibrating logits, to improve novel class recognition and prevent category explosion. This approach significantly outperforms existing methods.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08075
• PDF: https://arxiv.org/pdf/2603.08075
• Github: https://github.com/ynanwu/TALON

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MachineLearning #DeepLearning #CategoryDiscovery #TestTimeAdaptation #ComputerVision