✨SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding
📝 Summary:
SIMS-V uses 3D simulators to generate diverse spatial video training data. This efficiently trains multimodal language models, overcoming real-world data bottlenecks. A 7B model trained on this simulated data significantly outperforms larger baselines on real-world spatial reasoning tasks.
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04668
• PDF: https://arxiv.org/pdf/2511.04668
• Github: https://ellisbrown.github.io/sims-v/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpatialAI #MultimodalLLM #SimulatedData #ComputerVision #DeepLearning
📝 Summary:
SIMS-V uses 3D simulators to generate diverse spatial video training data. This efficiently trains multimodal language models, overcoming real-world data bottlenecks. A 7B model trained on this simulated data significantly outperforms larger baselines on real-world spatial reasoning tasks.
🔹 Publication Date: Published on Nov 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04668
• PDF: https://arxiv.org/pdf/2511.04668
• Github: https://ellisbrown.github.io/sims-v/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpatialAI #MultimodalLLM #SimulatedData #ComputerVision #DeepLearning
✨SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
📝 Summary:
SpatialThinker is a new 3D-aware MLLM that uses RL and dense spatial rewards to significantly improve spatial understanding. It integrates structured spatial grounding and multi-step reasoning, outperforming existing models and GPT-4o on spatial VQA and real-world benchmarks.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07403
• PDF: https://arxiv.org/pdf/2511.07403
• Github: https://github.com/hunarbatra/SpatialThinker
🔹 Models citing this paper:
• https://huggingface.co/OX-PIXL/SpatialThinker-3B
• https://huggingface.co/OX-PIXL/SpatialThinker-7B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OX-PIXL/STVQA-7K
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultimodalLLM #3DReasoning #ReinforcementLearning #AIResearch #ComputerVision
📝 Summary:
SpatialThinker is a new 3D-aware MLLM that uses RL and dense spatial rewards to significantly improve spatial understanding. It integrates structured spatial grounding and multi-step reasoning, outperforming existing models and GPT-4o on spatial VQA and real-world benchmarks.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07403
• PDF: https://arxiv.org/pdf/2511.07403
• Github: https://github.com/hunarbatra/SpatialThinker
🔹 Models citing this paper:
• https://huggingface.co/OX-PIXL/SpatialThinker-3B
• https://huggingface.co/OX-PIXL/SpatialThinker-7B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OX-PIXL/STVQA-7K
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MultimodalLLM #3DReasoning #ReinforcementLearning #AIResearch #ComputerVision