ML Research Hub
32.6K subscribers
3.39K photos
132 videos
23 files
3.61K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho
Download Telegram
Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models

📝 Summary:
QTSplus is a query-aware token selector for long-video multimodal language models. It dynamically selects the most important visual tokens based on a text query, significantly compressing vision data and reducing latency. This method maintains overall accuracy and enhances temporal understanding ...

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/AlpachinoNLP/qtsplus
• PDF: https://arxiv.org/pdf/2511.11910
• Project Page: https://qtsplus.github.io/
• Github: https://github.com/Siyou-Li/QTSplus

🔹 Models citing this paper:
https://huggingface.co/AlpachinoNLP/QTSplus-3B
https://huggingface.co/AlpachinoNLP/QTSplus-3B-FT

Spaces citing this paper:
https://huggingface.co/spaces/AlpachinoNLP/QTSplus-3B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #VideoAI #LLM #Tokenization #ComputerVision