✨Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
📝 Summary:
Naive action fine-tuning degrades visual representations in Vision-Language-Action models. This study analyzes this degradation and introduces a simple method to align representations, improving out-of-distribution generalization.
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25616
• PDF: https://arxiv.org/pdf/2510.25616
• Project Page: https://blind-vla-paper.github.io
• Github: https://github.com/CognitiveAISystems/BlindVLA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLA #OODGeneralization #ComputerVision #MachineLearning #RepresentationLearning
📝 Summary:
Naive action fine-tuning degrades visual representations in Vision-Language-Action models. This study analyzes this degradation and introduces a simple method to align representations, improving out-of-distribution generalization.
🔹 Publication Date: Published on Oct 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.25616
• PDF: https://arxiv.org/pdf/2510.25616
• Project Page: https://blind-vla-paper.github.io
• Github: https://github.com/CognitiveAISystems/BlindVLA
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLA #OODGeneralization #ComputerVision #MachineLearning #RepresentationLearning
✨Dynamic Reflections: Probing Video Representations with Text Alignment
📝 Summary:
This work presents the first comprehensive study on video-text representation alignment. It reveals alignment depends on data richness and correlates with downstream task performance, suggesting its value for general video understanding. This introduces video-text alignment as a zero-shot method ...
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02767
• PDF: https://arxiv.org/pdf/2511.02767
• Github: https://video-prh.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoUnderstanding #TextAlignment #VideoTextAI #ZeroShotLearning #RepresentationLearning
📝 Summary:
This work presents the first comprehensive study on video-text representation alignment. It reveals alignment depends on data richness and correlates with downstream task performance, suggesting its value for general video understanding. This introduces video-text alignment as a zero-shot method ...
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02767
• PDF: https://arxiv.org/pdf/2511.02767
• Github: https://video-prh.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoUnderstanding #TextAlignment #VideoTextAI #ZeroShotLearning #RepresentationLearning
❤1