Data Science | Machine Learning with Python for Researchers

✨SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

📝 Summary:
SIMS-V uses 3D simulators to generate diverse spatial video training data. This efficiently trains multimodal language models, overcoming real-world data bottlenecks. A 7B model trained on this simulated data significantly outperforms larger baselines on real-world spatial reasoning tasks.

🔹 Publication Date: Published on Nov 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04668
• PDF: https://arxiv.org/pdf/2511.04668
• Github: https://ellisbrown.github.io/sims-v/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SpatialAI #MultimodalLLM #SimulatedData #ComputerVision #DeepLearning

415 views14:05

✨SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards

📝 Summary:
SpatialThinker is a new 3D-aware MLLM that uses RL and dense spatial rewards to significantly improve spatial understanding. It integrates structured spatial grounding and multi-step reasoning, outperforming existing models and GPT-4o on spatial VQA and real-world benchmarks.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07403
• PDF: https://arxiv.org/pdf/2511.07403
• Github: https://github.com/hunarbatra/SpatialThinker

🔹 Models citing this paper:
• https://huggingface.co/OX-PIXL/SpatialThinker-3B
• https://huggingface.co/OX-PIXL/SpatialThinker-7B

✨ Datasets citing this paper:
• https://huggingface.co/datasets/OX-PIXL/STVQA-7K

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalLLM #3DReasoning #ReinforcementLearning #AIResearch #ComputerVision

227 views07:02