ML Research Hub

✨Specificity-aware reinforcement learning for fine-grained open-world classification

📝 Summary:
A novel RL framework SpeciaRL improves large multimodal models for open-world fine-grained classification. It enhances prediction specificity while maintaining correctness using a dynamic verifier-based reward. Experiments show SpeciaRL achieves the best trade-off.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03197
• PDF: https://arxiv.org/pdf/2603.03197

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #MachineLearning #ComputerVision #AI #MultimodalAI

158 views12:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HDINO: A Concise and Efficient Open-Vocabulary Detector

📝 Summary:
HDINO is an efficient open-vocabulary detector using a two-stage training strategy. It employs One-to-Many Semantic Alignment and lightweight feature fusion, avoiding manual data curation and complex feature extraction. HDINO achieves superior performance on COCO with less training data.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02924
• PDF: https://arxiv.org/pdf/2603.02924
• Github: https://github.com/HaoZ416/HDINO

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ObjectDetection #ComputerVision #OpenVocabulary #DeepLearning #AIResearch

203 views12:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

📝 Summary:
STMI is a novel multi-modal ReID framework that improves object re-identification. It uses segmentation-guided modulation for foreground enhancement, token reallocation for compact features, and cross-modal hypergraph interaction to capture high-order semantic relationships.

🔹 Publication Date: Published on Feb 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00695
• PDF: https://arxiv.org/pdf/2603.00695

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ObjectReID #ComputerVision #DeepLearning #MultiModalAI #STMI

351 views11:55

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching

📝 Summary:
Fast-FoundationStereo achieves real-time zero-shot stereo matching, bridging the gap between slow robust models and fast specialized ones. It employs distillation, architecture search, and pruning, running over 10x faster with similar accuracy to prior foundation models. This sets a new state-of-...

🔹 Publication Date: Published on Dec 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11130
• PDF: https://arxiv.org/pdf/2512.11130
• Project Page: https://nvlabs.github.io/Fast-FoundationStereo/
• Github: https://github.com/NVlabs/Fast-FoundationStereo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#StereoMatching #ComputerVision #RealTimeAI #ZeroShotLearning #DeepLearning

❤1

440 views19:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

📝 Summary:
PixARMesh reconstructs complete 3D indoor scene meshes from a single image. It uses a unified model with cross-attention and autoregressive generation to directly predict layout and geometry, producing high-quality, lightweight meshes.

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05888
• PDF: https://arxiv.org/pdf/2603.05888
• Project Page: https://mlpc-ucsd.github.io/PixARMesh/
• Github: https://github.com/mlpc-ucsd/PixARMesh

🔹 Models citing this paper:
• https://huggingface.co/zx1239856/PixARMesh-EdgeRunner
• https://huggingface.co/zx1239856/PixARMesh-BPT

✨ Datasets citing this paper:
• https://huggingface.co/datasets/zx1239856/3d-front-ar-packed
• https://huggingface.co/datasets/zx1239856/PixARMesh-eval-data

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #DeepLearning #SingleView3D #MeshGeneration

arXiv.org

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields...

173 views02:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Layer by layer, module by module: Choose both for optimal OOD probing of ViT

📝 Summary:
Intermediate layers in ViTs provide better representations. Performance degradation in deeper layers is caused by distribution shift. Optimal probing depends on shift magnitude: FFN activation for strong shift, MHA output for weak shift.

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05280
• PDF: https://arxiv.org/pdf/2603.05280
• Github: https://github.com/ambroiseodt/vit-probing

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ViT #OOD #DeepLearning #RepresentationLearning #ComputerVision

❤1

217 views11:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

📝 Summary:
EffectMaker is a unified framework for reference-based VFX customization. It uses a multimodal language model and diffusion transformer for semantic-visual guidance, generating high-quality effects consistently without per-effect fine-tuning. This is supported by a large synthetic dataset.

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06014
• PDF: https://arxiv.org/pdf/2603.06014
• Project Page: https://effectmaker.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VFX #GenerativeAI #DiffusionModels #MultimodalAI #ComputerVision

191 views14:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

📝 Summary:
CoCo is a code-driven framework for text-to-image generation, using executable code for precise spatial layout and structured image creation. It significantly outperforms natural language CoT methods, enabling more controllable and accurate image synthesis.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08652
• PDF: https://arxiv.org/pdf/2603.08652

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#TextToImage #GenerativeAI #AIResearch #CodeDrivenAI #ComputerVision

❤1

136 views07:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Concept-Guided Fine-Tuning: Steering ViTs away from Spurious Correlations to Improve Robustness

📝 Summary:
A novel fine-tuning method improves Vision Transformer robustness to distribution shifts. It aligns ViT attention with AI-generated concept masks, shifting focus from spurious correlations to semantic features. This boosts out-of-distribution performance and model interpretability.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08309
• PDF: https://arxiv.org/pdf/2603.08309
• Project Page: https://yonisgit.github.io/concept-ft/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #ComputerVision #VisionTransformers #MLRobustness #ModelInterpretability

148 views07:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events

📝 Summary:
TAPFormer is a new transformer framework for robust arbitrary point tracking. It uses Transient Asynchronous Fusion to bridge low-rate frames and high-rate events, and Cross-modal Locally Weighted Fusion for adaptive attention. This method significantly outperforms existing trackers.

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04989
• PDF: https://arxiv.org/pdf/2603.04989
• Project Page: https://tapformer.github.io/
• Github: https://github.com/ljx1002/TAPFormer

🔹 Models citing this paper:
• https://huggingface.co/ljx1002/tapformer

✨ Datasets citing this paper:
• https://huggingface.co/datasets/ljx1002/tapformer

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PointTracking #Transformers #ComputerVision #EventCameras #DeepLearning

118 views10:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery

📝 Summary:
TALON is a test-time adaptation framework for on-the-fly category discovery. It dynamically updates prototypes and encoder parameters, while calibrating logits, to improve novel class recognition and prevent category explosion. This approach significantly outperforms existing methods.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08075
• PDF: https://arxiv.org/pdf/2603.08075
• Github: https://github.com/ynanwu/TALON

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MachineLearning #DeepLearning #CategoryDiscovery #TestTimeAdaptation #ComputerVision

172 views13:05

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform