ML Research Hub
32.5K subscribers
5.95K photos
383 videos
24 files
6.44K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Concept-Guided Fine-Tuning: Steering ViTs away from Spurious Correlations to Improve Robustness

📝 Summary:
A novel fine-tuning method improves Vision Transformer robustness to distribution shifts. It aligns ViT attention with AI-generated concept masks, shifting focus from spurious correlations to semantic features. This boosts out-of-distribution performance and model interpretability.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08309
• PDF: https://arxiv.org/pdf/2603.08309
• Project Page: https://yonisgit.github.io/concept-ft/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #ComputerVision #VisionTransformers #MLRobustness #ModelInterpretability
TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events

📝 Summary:
TAPFormer is a new transformer framework for robust arbitrary point tracking. It uses Transient Asynchronous Fusion to bridge low-rate frames and high-rate events, and Cross-modal Locally Weighted Fusion for adaptive attention. This method significantly outperforms existing trackers.

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04989
• PDF: https://arxiv.org/pdf/2603.04989
• Project Page: https://tapformer.github.io/
• Github: https://github.com/ljx1002/TAPFormer

🔹 Models citing this paper:
https://huggingface.co/ljx1002/tapformer

Datasets citing this paper:
https://huggingface.co/datasets/ljx1002/tapformer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PointTracking #Transformers #ComputerVision #EventCameras #DeepLearning
TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery

📝 Summary:
TALON is a test-time adaptation framework for on-the-fly category discovery. It dynamically updates prototypes and encoder parameters, while calibrating logits, to improve novel class recognition and prevent category explosion. This approach significantly outperforms existing methods.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08075
• PDF: https://arxiv.org/pdf/2603.08075
• Github: https://github.com/ynanwu/TALON

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MachineLearning #DeepLearning #CategoryDiscovery #TestTimeAdaptation #ComputerVision
4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

📝 Summary:
4DEquine is a new framework for 4D equine reconstruction from monocular video. It disentangles motion using spatio-temporal transformers and appearance with 3D Gaussian avatars. Training on synthetic data, it achieves state-of-the-art results on real-world datasets.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10125
• PDF: https://arxiv.org/pdf/2603.10125
• Project Page: https://luoxue-star.github.io/4DEquine_Project_Page/
• Github: https://github.com/luoxue-star/4DEquine

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #4DReconstruction #DeepLearning #Equine #AI
A Mixed Diet Makes DINO An Omnivorous Vision Encoder

📝 Summary:
The Omnivorous Vision Encoder learns modality-agnostic features by aligning multi-modal scene inputs and distilling semantics from a frozen teacher model. This resolves poor cross-modal alignment in existing encoders, yielding consistent, powerful embeddings for various modalities.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24181
• PDF: https://arxiv.org/pdf/2602.24181

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #ComputerVision #DeepLearning #SelfSupervisedLearning #AIResearch
1
HyPER-GAN: Hybrid Patch-Based Image-to-Image Translation for Real-Time Photorealism Enhancement

📝 Summary:
HyPER-GAN is a lightweight U-Net based model for real-time photorealism enhancement. Its hybrid training strategy, using real-world patches, improves visual realism, semantic consistency, and inference speed over state-of-the-art methods.

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.10604
• PDF: https://arxiv.org/pdf/2603.10604
• Github: https://github.com/stefanos50/HyPER-GAN

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GAN #ComputerVision #DeepLearning #ImageProcessing #Photorealism
Visual-ERM: Reward Modeling for Visual Equivalence

📝 Summary:
Visual-ERM is a multimodal generative reward model providing fine-grained visual feedback for vision-to-code tasks. It significantly improves reinforcement learning performance for chart, table, and SVG parsing, demonstrating that fine-grained visual supervision is essential.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13224
• PDF: https://arxiv.org/pdf/2603.13224
• Github: https://github.com/InternLM/Visual-ERM

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #ComputerVision #GenerativeAI #AI #DataScience
SimRecon: SimReady Compositional Scene Reconstruction from Real Videos

📝 Summary:
SimRecon reconstructs cluttered scenes from real videos using a Perception-Generation-Simulation pipeline. It employs Active Viewpoint Optimization for visual fidelity and a Scene Graph Synthesizer for physical plausibility. This enables superior compositional scene representations for simulation...

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02133
• PDF: https://arxiv.org/pdf/2603.02133
• Project Page: https://xiac20.github.io/SimRecon/
• Github: https://github.com/xiac20/SimRecon

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SceneReconstruction #ComputerVision #AI #Simulation #3DReconstruction
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

📝 Summary:
Cheers is a unified multimodal model that decouples visual details from semantic representations for efficient joint optimization of understanding and generation. It employs a vision tokenizer, LLM-based Transformer, and cascaded flow matching. Cheers achieves state-of-the-art performance with 4x...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12793
• PDF: https://arxiv.org/pdf/2603.12793
• Project Page: https://huggingface.co/ai9stars/Cheers
• Github: https://github.com/AI9Stars/Cheers

🔹 Models citing this paper:
https://huggingface.co/ai9stars/Cheers

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #LLM #ComputerVision #GenerativeAI #AIResearch
Fine-grained Motion Retrieval via Joint-Angle Motion Images and Token-Patch Late Interaction

📝 Summary:
This paper presents a novel text-motion retrieval method. It maps joint-angle motion features into Vision Transformer-compatible pseudo-images and uses an enhanced late interaction mechanism. This achieves superior performance and offers interpretable fine-grained text-motion alignments.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09930
• PDF: https://arxiv.org/pdf/2603.09930

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MotionRetrieval #DeepLearning #ComputerVision #AIResearch #NLP
SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

📝 Summary:
SNCE is a novel training objective for large-codebook discrete image generators. It supervises models with a soft categorical distribution over neighboring tokens, based on embedding proximity, instead of hard one-hot targets. This approach significantly improves convergence speed and overall gen...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15150
• PDF: https://arxiv.org/pdf/2603.15150

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #DeepLearning #ComputerVision #GeometryAware #AIResearch
Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

📝 Summary:
STALL is a training-free, model-agnostic detector for generated videos. It jointly models spatial and temporal evidence from real-data statistics within a probabilistic framework. STALL consistently outperforms prior image and video-based baselines, improving reliable detection.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15026
• PDF: https://arxiv.org/pdf/2603.15026
• Project Page: https://omerbenhayun.github.io/stall-video/
• Github: https://github.com/OmerBenHayun/stall-video

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Deepfakes #VideoDetection #ComputerVision #AI #DigitalForensics
GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

📝 Summary:
GlyphPrinter improves visual text rendering by addressing glyph accuracy. It introduces Region-Grouped DPO R-GDPO with region-level preferences from the GlyphCorrector dataset, significantly enhancing precision. This outperforms existing methods.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15616
• PDF: https://arxiv.org/pdf/2603.15616
• Project Page: https://henghuiding.com/GlyphPrinter/
• Github: https://github.com/FudanCVL/GlyphPrinter

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GlyphRendering #DeepLearning #ComputerVision #AIResearch #TextRendering
Learning Latent Proxies for Controllable Single-Image Relighting

📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch
Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training

📝 Summary:
IOMM is a data-efficient framework for UMM visual generation. It pre-trains with image-only data then fine-tunes with mixed data, achieving SOTA performance while significantly reducing computational costs.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16139
• PDF: https://arxiv.org/pdf/2603.16139
• Github: https://github.com/LINs-lab/IOMM

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#UMMVisualGeneration #MaskedModeling #EfficientAI #ComputerVision #GenerativeAI
WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation

📝 Summary:
Waypoint Diffusion Transformers WiT address trajectory conflicts in pixel-space flow matching using semantic waypoints from pre-trained vision models. WiT disentangles generation paths into segments, accelerating training convergence. It outperforms pixel-space baselines and speeds up JiT trainin...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15132
• PDF: https://arxiv.org/pdf/2603.15132
• Project Page: https://hainuo-wang.github.io/WiT/
• Github: https://github.com/hainuo-wang/WiT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #Transformers #ComputerVision #DeepLearning #AI