✨SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding
📝 Summary:
SIMS-V uses 3D simulators to generate diverse spatial video training data. This efficiently trains multimodal language models, overcoming real-world data bottlenecks. A 7B model trained on this simulated data significantly outperforms larger baselines on real-world spatial reasoning tasks.
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04668
• PDF: https://arxiv.org/pdf/2511.04668
• Github: https://ellisbrown.github.io/sims-v/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpatialAI #MultimodalLLM #SimulatedData #ComputerVision #DeepLearning
📝 Summary:
SIMS-V uses 3D simulators to generate diverse spatial video training data. This efficiently trains multimodal language models, overcoming real-world data bottlenecks. A 7B model trained on this simulated data significantly outperforms larger baselines on real-world spatial reasoning tasks.
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04668
• PDF: https://arxiv.org/pdf/2511.04668
• Github: https://ellisbrown.github.io/sims-v/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpatialAI #MultimodalLLM #SimulatedData #ComputerVision #DeepLearning
✨Visual Spatial Tuning
📝 Summary:
Visual Spatial Tuning VST is a framework that progressively trains Vision-Language Models VLMs using specialized datasets VST-P for spatial perception and VST-R for reasoning. VST achieves state-of-the-art results on spatial benchmarks without harming general VLM capabilities, leading to more phy...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05491
• PDF: https://arxiv.org/pdf/2511.05491
• Project Page: https://yangr116.github.io/vst_project/
• Github: https://github.com/Yangr116/VST
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #SpatialAI #ComputerVision #DeepLearning #AIResearch
📝 Summary:
Visual Spatial Tuning VST is a framework that progressively trains Vision-Language Models VLMs using specialized datasets VST-P for spatial perception and VST-R for reasoning. VST achieves state-of-the-art results on spatial benchmarks without harming general VLM capabilities, leading to more phy...
🔹 Publication Date: Published on Nov 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05491
• PDF: https://arxiv.org/pdf/2511.05491
• Project Page: https://yangr116.github.io/vst_project/
• Github: https://github.com/Yangr116/VST
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #SpatialAI #ComputerVision #DeepLearning #AIResearch
✨Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
📝 Summary:
VR-Bench evaluates video models' spatial reasoning using maze-solving tasks. It demonstrates that video models excel in spatial perception and reasoning, outperforming VLMs, and benefit from diverse sampling during inference. These findings show the strong potential of reasoning via video for spa...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15065
• PDF: https://arxiv.org/pdf/2511.15065
• Project Page: https://imyangc7.github.io/VRBench_Web/
• Github: https://github.com/ImYangC7/VR-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoModels #AIReasoning #SpatialAI #ComputerVision #MachineLearning
📝 Summary:
VR-Bench evaluates video models' spatial reasoning using maze-solving tasks. It demonstrates that video models excel in spatial perception and reasoning, outperforming VLMs, and benefit from diverse sampling during inference. These findings show the strong potential of reasoning via video for spa...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15065
• PDF: https://arxiv.org/pdf/2511.15065
• Project Page: https://imyangc7.github.io/VRBench_Web/
• Github: https://github.com/ImYangC7/VR-Bench
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoModels #AIReasoning #SpatialAI #ComputerVision #MachineLearning
❤1