ML Research Hub
32.7K subscribers
5.63K photos
357 videos
24 files
6.09K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity

📝 Summary:
UnSAMv2 enables continuous segmentation granularity control for the SAM model without human annotations. It uses self-supervised learning on unlabeled data to discover mask-granularity pairs and a novel control embedding. UnSAMv2 significantly enhances SAM-2s performance across various segmentati...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13714
• PDF: https://arxiv.org/pdf/2511.13714
• Project Page: https://yujunwei04.github.io/UnSAMv2-Project-Page/
• Github: https://github.com/yujunwei04/UnSAMv2

Spaces citing this paper:
https://huggingface.co/spaces/yujunwei04/UnSAMv2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #ComputerVision #SelfSupervisedLearning #ImageSegmentation #DeepLearning
Φeat: Physically-Grounded Feature Representation

📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
EvoVLA: Self-Evolving Vision-Language-Action Model

📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition

📝 Summary:
TRivia is a self-supervised fine-tuning method for vision-language models to learn table recognition from unlabeled data. It uses a question-answering reward mechanism to autonomously optimize the model. This open-source solution outperforms state-of-the-art systems on popular benchmarks.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01248
• PDF: https://arxiv.org/pdf/2512.01248
• Github: https://github.com/opendatalab/TRivia

🔹 Models citing this paper:
https://huggingface.co/opendatalab/TRivia-3B

Spaces citing this paper:
https://huggingface.co/spaces/opendatalab/TRivia-3B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TableRecognition #VisionLanguageModels #SelfSupervisedLearning #AI #DeepLearning
🤖🧠 S3PRL Toolkit: Advancing Self-Supervised Speech Representation Learning

🗓️ 13 Dec 2025
📚 AI News & Trends

The field of speech technology has witnessed a transformative shift in recent years, powered by the rise of self-supervised learning (SSL). Instead of relying on large amounts of labeled data, self-supervised models learn from the patterns and structures inherent in raw audio, enabling powerful and general-purpose speech representations. At the forefront of this innovation stands ...

#S3PRL #SelfSupervisedLearning #SpeechTechnology #SSL #SpeechRepresentationLearning #AI
2
Puzzle Curriculum GRPO for Vision-Centric Reasoning

📝 Summary:
Puzzle Curriculum GRPO PC-GRPO improves VLM visual reasoning without annotations. It uses self-supervised puzzle environments for verifiable rewards and a difficulty-aware curriculum to enhance consistency and accuracy.

🔹 Publication Date: Published on Dec 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14944
• PDF: https://arxiv.org/pdf/2512.14944
• Project Page: https://pcgrpo.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLM #VisualReasoning #SelfSupervisedLearning #ComputerVision #AI
1
DeFM: Learning Foundation Representations from Depth for Robotics

📝 Summary:
DeFM is a self-supervised foundation model for depth representation learning in robotics. It learns geometric and semantic features from 60M depth images, achieving state-of-the-art performance across diverse robotic tasks and strong sim-to-real generalization.

🔹 Publication Date: Published on Jan 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18923
• PDF: https://arxiv.org/pdf/2601.18923
• Github: https://de-fm.github.io/

🔹 Models citing this paper:
https://huggingface.co/leggedrobotics/defm

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #FoundationModels #SelfSupervisedLearning #ComputerVision #MachineLearning
1
OmniRad: A Radiological Foundation Model for Multi-Task Medical Image Analysis

📝 Summary:
OmniRad is a self-supervised radiological foundation model pretrained on 1.2 million medical images. It improves classification F1 by 2.05 percent and achieves better segmentation through representation reuse and cross-task transferability.

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04547
• PDF: https://arxiv.org/pdf/2602.04547
• Github: https://github.com/unica-visual-intelligence-lab/OmniRad

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MedicalAI #FoundationModels #Radiology #SelfSupervisedLearning #MedicalImaging
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention

📝 Summary:
OmniVideo-R1 is a reinforced framework that enhances audio-visual understanding. It uses self-supervised query-intensive grounding and contrastive modality-attentive fusion. Experiments show OmniVideo-R1 consistently outperforms baselines, demonstrating its effectiveness.

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05847
• PDF: https://arxiv.org/pdf/2602.05847

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AudioVisualAI #SelfSupervisedLearning #DeepLearning #MultimodalAI #AIResearch
Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

📝 Summary:
This paper presents a conditional binary segmentation framework for robust cross-view object correspondence. It uses cycle-consistency training to create view-invariant representations without ground-truth annotations. This approach achieves state-of-the-art performance on relevant benchmarks.

🔹 Publication Date: Published on Feb 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18996
• PDF: https://arxiv.org/pdf/2602.18996
• Github: https://github.com/shannany0606/CCMP

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #MachineLearning #ObjectCorrespondence #ImageSegmentation #SelfSupervisedLearning
1