✨UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity
📝 Summary:
UnSAMv2 enables continuous segmentation granularity control for the SAM model without human annotations. It uses self-supervised learning on unlabeled data to discover mask-granularity pairs and a novel control embedding. UnSAMv2 significantly enhances SAM-2s performance across various segmentati...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13714
• PDF: https://arxiv.org/pdf/2511.13714
• Project Page: https://yujunwei04.github.io/UnSAMv2-Project-Page/
• Github: https://github.com/yujunwei04/UnSAMv2
✨ Spaces citing this paper:
• https://huggingface.co/spaces/yujunwei04/UnSAMv2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ComputerVision #SelfSupervisedLearning #ImageSegmentation #DeepLearning
📝 Summary:
UnSAMv2 enables continuous segmentation granularity control for the SAM model without human annotations. It uses self-supervised learning on unlabeled data to discover mask-granularity pairs and a novel control embedding. UnSAMv2 significantly enhances SAM-2s performance across various segmentati...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13714
• PDF: https://arxiv.org/pdf/2511.13714
• Project Page: https://yujunwei04.github.io/UnSAMv2-Project-Page/
• Github: https://github.com/yujunwei04/UnSAMv2
✨ Spaces citing this paper:
• https://huggingface.co/spaces/yujunwei04/UnSAMv2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ComputerVision #SelfSupervisedLearning #ImageSegmentation #DeepLearning
✨Φeat: Physically-Grounded Feature Representation
📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
✨EvoVLA: Self-Evolving Vision-Language-Action Model
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
✨TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition
📝 Summary:
TRivia is a self-supervised fine-tuning method for vision-language models to learn table recognition from unlabeled data. It uses a question-answering reward mechanism to autonomously optimize the model. This open-source solution outperforms state-of-the-art systems on popular benchmarks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01248
• PDF: https://arxiv.org/pdf/2512.01248
• Github: https://github.com/opendatalab/TRivia
🔹 Models citing this paper:
• https://huggingface.co/opendatalab/TRivia-3B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/TRivia-3B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TableRecognition #VisionLanguageModels #SelfSupervisedLearning #AI #DeepLearning
📝 Summary:
TRivia is a self-supervised fine-tuning method for vision-language models to learn table recognition from unlabeled data. It uses a question-answering reward mechanism to autonomously optimize the model. This open-source solution outperforms state-of-the-art systems on popular benchmarks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01248
• PDF: https://arxiv.org/pdf/2512.01248
• Github: https://github.com/opendatalab/TRivia
🔹 Models citing this paper:
• https://huggingface.co/opendatalab/TRivia-3B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/TRivia-3B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#TableRecognition #VisionLanguageModels #SelfSupervisedLearning #AI #DeepLearning
🤖🧠 S3PRL Toolkit: Advancing Self-Supervised Speech Representation Learning
🗓️ 13 Dec 2025
📚 AI News & Trends
The field of speech technology has witnessed a transformative shift in recent years, powered by the rise of self-supervised learning (SSL). Instead of relying on large amounts of labeled data, self-supervised models learn from the patterns and structures inherent in raw audio, enabling powerful and general-purpose speech representations. At the forefront of this innovation stands ...
#S3PRL #SelfSupervisedLearning #SpeechTechnology #SSL #SpeechRepresentationLearning #AI
🗓️ 13 Dec 2025
📚 AI News & Trends
The field of speech technology has witnessed a transformative shift in recent years, powered by the rise of self-supervised learning (SSL). Instead of relying on large amounts of labeled data, self-supervised models learn from the patterns and structures inherent in raw audio, enabling powerful and general-purpose speech representations. At the forefront of this innovation stands ...
#S3PRL #SelfSupervisedLearning #SpeechTechnology #SSL #SpeechRepresentationLearning #AI
❤2
✨Puzzle Curriculum GRPO for Vision-Centric Reasoning
📝 Summary:
Puzzle Curriculum GRPO PC-GRPO improves VLM visual reasoning without annotations. It uses self-supervised puzzle environments for verifiable rewards and a difficulty-aware curriculum to enhance consistency and accuracy.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14944
• PDF: https://arxiv.org/pdf/2512.14944
• Project Page: https://pcgrpo.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLM #VisualReasoning #SelfSupervisedLearning #ComputerVision #AI
📝 Summary:
Puzzle Curriculum GRPO PC-GRPO improves VLM visual reasoning without annotations. It uses self-supervised puzzle environments for verifiable rewards and a difficulty-aware curriculum to enhance consistency and accuracy.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14944
• PDF: https://arxiv.org/pdf/2512.14944
• Project Page: https://pcgrpo.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLM #VisualReasoning #SelfSupervisedLearning #ComputerVision #AI
❤1
✨DeFM: Learning Foundation Representations from Depth for Robotics
📝 Summary:
DeFM is a self-supervised foundation model for depth representation learning in robotics. It learns geometric and semantic features from 60M depth images, achieving state-of-the-art performance across diverse robotic tasks and strong sim-to-real generalization.
🔹 Publication Date: Published on Jan 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18923
• PDF: https://arxiv.org/pdf/2601.18923
• Github: https://de-fm.github.io/
🔹 Models citing this paper:
• https://huggingface.co/leggedrobotics/defm
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #FoundationModels #SelfSupervisedLearning #ComputerVision #MachineLearning
📝 Summary:
DeFM is a self-supervised foundation model for depth representation learning in robotics. It learns geometric and semantic features from 60M depth images, achieving state-of-the-art performance across diverse robotic tasks and strong sim-to-real generalization.
🔹 Publication Date: Published on Jan 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18923
• PDF: https://arxiv.org/pdf/2601.18923
• Github: https://de-fm.github.io/
🔹 Models citing this paper:
• https://huggingface.co/leggedrobotics/defm
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #FoundationModels #SelfSupervisedLearning #ComputerVision #MachineLearning
❤1
✨OmniRad: A Radiological Foundation Model for Multi-Task Medical Image Analysis
📝 Summary:
OmniRad is a self-supervised radiological foundation model pretrained on 1.2 million medical images. It improves classification F1 by 2.05 percent and achieves better segmentation through representation reuse and cross-task transferability.
🔹 Publication Date: Published on Feb 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04547
• PDF: https://arxiv.org/pdf/2602.04547
• Github: https://github.com/unica-visual-intelligence-lab/OmniRad
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MedicalAI #FoundationModels #Radiology #SelfSupervisedLearning #MedicalImaging
📝 Summary:
OmniRad is a self-supervised radiological foundation model pretrained on 1.2 million medical images. It improves classification F1 by 2.05 percent and achieves better segmentation through representation reuse and cross-task transferability.
🔹 Publication Date: Published on Feb 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04547
• PDF: https://arxiv.org/pdf/2602.04547
• Github: https://github.com/unica-visual-intelligence-lab/OmniRad
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MedicalAI #FoundationModels #Radiology #SelfSupervisedLearning #MedicalImaging
✨OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
📝 Summary:
OmniVideo-R1 is a reinforced framework that enhances audio-visual understanding. It uses self-supervised query-intensive grounding and contrastive modality-attentive fusion. Experiments show OmniVideo-R1 consistently outperforms baselines, demonstrating its effectiveness.
🔹 Publication Date: Published on Feb 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05847
• PDF: https://arxiv.org/pdf/2602.05847
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AudioVisualAI #SelfSupervisedLearning #DeepLearning #MultimodalAI #AIResearch
📝 Summary:
OmniVideo-R1 is a reinforced framework that enhances audio-visual understanding. It uses self-supervised query-intensive grounding and contrastive modality-attentive fusion. Experiments show OmniVideo-R1 consistently outperforms baselines, demonstrating its effectiveness.
🔹 Publication Date: Published on Feb 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05847
• PDF: https://arxiv.org/pdf/2602.05847
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AudioVisualAI #SelfSupervisedLearning #DeepLearning #MultimodalAI #AIResearch
✨Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction
📝 Summary:
This paper presents a conditional binary segmentation framework for robust cross-view object correspondence. It uses cycle-consistency training to create view-invariant representations without ground-truth annotations. This approach achieves state-of-the-art performance on relevant benchmarks.
🔹 Publication Date: Published on Feb 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18996
• PDF: https://arxiv.org/pdf/2602.18996
• Github: https://github.com/shannany0606/CCMP
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ComputerVision #MachineLearning #ObjectCorrespondence #ImageSegmentation #SelfSupervisedLearning
📝 Summary:
This paper presents a conditional binary segmentation framework for robust cross-view object correspondence. It uses cycle-consistency training to create view-invariant representations without ground-truth annotations. This approach achieves state-of-the-art performance on relevant benchmarks.
🔹 Publication Date: Published on Feb 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18996
• PDF: https://arxiv.org/pdf/2602.18996
• Github: https://github.com/shannany0606/CCMP
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ComputerVision #MachineLearning #ObjectCorrespondence #ImageSegmentation #SelfSupervisedLearning
❤1