✨Factorized Learning for Temporally Grounded Video-Language Models
📝 Summary:
Video-language models struggle with temporal grounding from coupled tasks. Our D^2VLM framework decouples grounding and textual response using evidence tokens. Factorized preference optimization explicitly optimizes temporal grounding for both tasks.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24097
• PDF: https://arxiv.org/pdf/2512.24097
• Project Page: https://github.com/nusnlp/d2vlm
• Github: https://github.com/nusnlp/d2vlm
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Video-language models struggle with temporal grounding from coupled tasks. Our D^2VLM framework decouples grounding and textual response using evidence tokens. Factorized preference optimization explicitly optimizes temporal grounding for both tasks.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24097
• PDF: https://arxiv.org/pdf/2512.24097
• Project Page: https://github.com/nusnlp/d2vlm
• Github: https://github.com/nusnlp/d2vlm
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
📝 Summary:
This paper presents JavisGPT, the first unified multimodal large language model (MLLM) for Joint Audio-Video (JAV) comprehension and generation. JavisGPT adopts a concise encoder-LLM-decoder architect...
🔹 Publication Date: Published on Dec 28, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.23377
• PDF: https://arxiv.org/pdf/2512.22905
• Project Page: https://javisverse.github.io/JavisGPT-page/
• Github: https://github.com/JavisVerse/JavisGPT
🔹 Models citing this paper:
• https://huggingface.co/JavisVerse/JavisGPT-v0.1-7B-Instruct
✨ Datasets citing this paper:
• https://huggingface.co/datasets/JavisVerse/MM-PreTrain
• https://huggingface.co/datasets/JavisVerse/JavisUnd-Eval
• https://huggingface.co/datasets/JavisVerse/AV-FineTune
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
This paper presents JavisGPT, the first unified multimodal large language model (MLLM) for Joint Audio-Video (JAV) comprehension and generation. JavisGPT adopts a concise encoder-LLM-decoder architect...
🔹 Publication Date: Published on Dec 28, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.23377
• PDF: https://arxiv.org/pdf/2512.22905
• Project Page: https://javisverse.github.io/JavisGPT-page/
• Github: https://github.com/JavisVerse/JavisGPT
🔹 Models citing this paper:
• https://huggingface.co/JavisVerse/JavisGPT-v0.1-7B-Instruct
✨ Datasets citing this paper:
• https://huggingface.co/datasets/JavisVerse/MM-PreTrain
• https://huggingface.co/datasets/JavisVerse/JavisUnd-Eval
• https://huggingface.co/datasets/JavisVerse/AV-FineTune
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
arXiv.org
JavisDiT: Joint Audio-Video Diffusion Transformer with...
This paper introduces JavisDiT, a novel Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG). Built upon the powerful Diffusion Transformer (DiT)...
✨Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
📝 Summary:
The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. While foundat...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24385
• PDF: https://arxiv.org/pdf/2512.24385
• Github: https://github.com/worldbench/awesome-spatial-intelligence
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. While foundat...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24385
• PDF: https://arxiv.org/pdf/2512.24385
• Github: https://github.com/worldbench/awesome-spatial-intelligence
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Valori: A Deterministic Memory Substrate for AI Systems
📝 Summary:
Valori introduces a deterministic AI memory substrate using fixed-point arithmetic, ensuring bit-identical results across platforms. This eliminates non-determinism from floating-point operations in vector embeddings and search, making AI systems trustworthy and verifiable.
🔹 Publication Date: Published on Dec 25, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22280
• PDF: https://arxiv.org/pdf/2512.22280
• Project Page: https://valori.systems/
• Github: https://github.com/varshith-Git/Valori-Kernel
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Valori introduces a deterministic AI memory substrate using fixed-point arithmetic, ensuring bit-identical results across platforms. This eliminates non-determinism from floating-point operations in vector embeddings and search, making AI systems trustworthy and verifiable.
🔹 Publication Date: Published on Dec 25, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22280
• PDF: https://arxiv.org/pdf/2512.22280
• Project Page: https://valori.systems/
• Github: https://github.com/varshith-Git/Valori-Kernel
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts
📝 Summary:
A framework called BEDA uses probabilistic constraints on belief estimation to improve strategic dialogue through formalized adversarial and alignment acts, outperforming baselines across multiple tas...
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24885
• PDF: https://arxiv.org/pdf/2512.24885
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A framework called BEDA uses probabilistic constraints on belief estimation to improve strategic dialogue through formalized adversarial and alignment acts, outperforming baselines across multiple tas...
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24885
• PDF: https://arxiv.org/pdf/2512.24885
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨On the Role of Discreteness in Diffusion LLMs
📝 Summary:
This paper examines diffusion language models, highlighting five properties separating diffusion mechanics from language requirements. Existing approaches face structural trade-offs. Key issues identified are uniform corruption and token-wise marginal training, urging development of diffusion pro...
🔹 Publication Date: Published on Dec 27, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22630
• PDF: https://arxiv.org/pdf/2512.22630
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
This paper examines diffusion language models, highlighting five properties separating diffusion mechanics from language requirements. Existing approaches face structural trade-offs. Key issues identified are uniform corruption and token-wise marginal training, urging development of diffusion pro...
🔹 Publication Date: Published on Dec 27, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22630
• PDF: https://arxiv.org/pdf/2512.22630
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
📝 Summary:
DiffThinker introduces a generative multimodal reasoning framework using diffusion models. It reframes vision-centric tasks as image-to-image generation for superior logical consistency and spatial precision. DiffThinker significantly outperforms existing MLLMs across various domains, showcasing ...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24165
• PDF: https://arxiv.org/pdf/2512.24165
• Project Page: https://diffthinker-project.github.io/
• Github: https://github.com/lcqysl/DiffThinker
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
DiffThinker introduces a generative multimodal reasoning framework using diffusion models. It reframes vision-centric tasks as image-to-image generation for superior logical consistency and spatial precision. DiffThinker significantly outperforms existing MLLMs across various domains, showcasing ...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24165
• PDF: https://arxiv.org/pdf/2512.24165
• Project Page: https://diffthinker-project.github.io/
• Github: https://github.com/lcqysl/DiffThinker
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨Deep Delta Learning
📝 Summary:
The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly...
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00417
• PDF: https://arxiv.org/pdf/2601.00417
• Github: https://github.com/yifanzhang-pro/deep-delta-learning
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly...
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00417
• PDF: https://arxiv.org/pdf/2601.00417
• Github: https://github.com/yifanzhang-pro/deep-delta-learning
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Fast-weight Product Key Memory
📝 Summary:
FwPKM introduces a dynamic, fast-weight episodic memory mechanism for sequence modeling that balances storage capacity and efficiency, achieving strong performance on long-context tasks like Needle in...
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00671
• PDF: https://arxiv.org/pdf/2601.00671
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
FwPKM introduces a dynamic, fast-weight episodic memory mechanism for sequence modeling that balances storage capacity and efficiency, achieving strong performance on long-context tasks like Needle in...
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00671
• PDF: https://arxiv.org/pdf/2601.00671
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
📝 Summary:
NextFlow is a unified decoder-only transformer enabling fast multimodal understanding and generation. It uses next-token prediction for text and next-scale for images, generating 1024x1024 images in 5 seconds. It achieves state-of-the-art performance among unified models.
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02204
• PDF: https://arxiv.org/pdf/2601.02204
• Github: https://github.com/ByteVisionLab/NextFlow
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
NextFlow is a unified decoder-only transformer enabling fast multimodal understanding and generation. It uses next-token prediction for text and next-scale for images, generating 1024x1024 images in 5 seconds. It achieves state-of-the-art performance among unified models.
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02204
• PDF: https://arxiv.org/pdf/2601.02204
• Github: https://github.com/ByteVisionLab/NextFlow
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research