ML Research Hub

✨See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

📝 Summary:
Bi-directional Perceptual Shaping BiPS improves vision-language models by using question-conditioned masked views to shape perception during training. It employs two constraints to ensure complete coverage of relevant pixels and enforce fine-grained visual reliance, preventing text-only shortcuts...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22120
• PDF: https://arxiv.org/pdf/2512.22120
• Github: https://github.com/zss02/BiPS

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #VisionLanguageModels #MachineLearning #AIResearch #DeepLearning

❤1

187 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ProEdit: Inversion-based Editing From Prompts Done Right

📝 Summary:
Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions. Existing methods typically inject source image information during the...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22118
• PDF: https://arxiv.org/pdf/2512.22118
• Project Page: https://isee-laboratory.github.io/ProEdit/
• Github: https://isee-laboratory.github.io/ProEdit

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

192 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SVBench: Evaluation of Video Generation Models on Social Reasoning

📝 Summary:
Recent text-to-video generation models exhibit remarkable progress in visual realism, motion fidelity, and text-video alignment, yet they remain fundamentally limited in their ability to generate soci...

🔹 Publication Date: Published on Dec 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21507
• PDF: https://arxiv.org/pdf/2512.21507
• Github: https://github.com/Gloria2tt/SVBench-Evaluation

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

199 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SWE-RM: Execution-free Feedback For Software Engineering Agents

📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels

❤1

237 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

📝 Summary:
MiA-RAG enhances RAG systems with global context awareness, inspired by human understanding. It uses hierarchical summarization to build a 'mindscape,' improving long-context retrieval and generation for better evidence-based understanding.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17220
• PDF: https://arxiv.org/pdf/2512.17220

🔹 Models citing this paper:
• https://huggingface.co/MindscapeRAG/MiA-Emb-8B
• https://huggingface.co/MindscapeRAG/MiA-Emb-4B
• https://huggingface.co/MindscapeRAG/MiA-Emb-0.6B

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#RAG #LLM #NLP #GenerativeAI #ContextUnderstanding

❤1

209 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TimeBill: Time-Budgeted Inference for Large Language Models

📝 Summary:
TimeBill is a framework for LLMs in time-critical systems. It predicts execution time and adaptively adjusts KV cache eviction to balance inference efficiency and response performance within given time budgets, improving task completion rates.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21859
• PDF: https://arxiv.org/pdf/2512.21859

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AI #RealTimeAI #InferenceOptimization #DeepLearning

❤1

292 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

269 views04:02

ML Research Hub

✨UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

📝 Summary:
UniPercept-Bench provides a unified framework and datasets for perceptual image understanding aesthetics, quality, structure, texture. The UniPercept model, trained with DAPT and T-ARL, outperforms MLLMs, generalizes across VR and VQA, and acts as a text-to-image reward model.

🔹 Publication Date: Published on Dec 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21675
• PDF: https://arxiv.org/pdf/2512.21675
• Project Page: https://thunderbolt215.github.io/Unipercept-project/
• Github: https://github.com/thunderbolt215/UniPercept

🔹 Models citing this paper:
• https://huggingface.co/Thunderbolt215215/UniPercept

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Thunderbolt215215/UniPercept-Bench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageUnderstanding #ComputerVision #AIResearch #PerceptualAI #DeepLearning

arXiv.org

UniPercept: Towards Unified Perceptual-Level Image Understanding...

Multimodal large language models (MLLMs) have achieved remarkable progress in visual understanding tasks such as visual grounding, segmentation, and captioning. However, their ability to perceive...

❤1

323 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding

📝 Summary:
Omni-Weather is a new multimodal foundation model that unifies weather generation and understanding in a single architecture. It uses shared self-attention and a Chain-of-Thought dataset for interpretable, high-quality outputs, achieving state-of-the-art performance.

🔹 Publication Date: Published on Dec 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21643
• PDF: https://arxiv.org/pdf/2512.21643

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#WeatherGeneration #FoundationModels #MultimodalAI #AIResearch #DeepLearning

❤1

395 views06:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

📝 Summary:
SlideTailor generates personalized presentation slides for scientific papers by learning user preferences implicitly from example pairs and visual templates. It uses a chain-of-speech mechanism to align content with oral narration.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20292
• PDF: https://arxiv.org/pdf/2512.20292
• Project Page: https://github.com/nusnlp/SlideTailor
• Github: https://github.com/nusnlp/SlideTailor

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SlideGeneration #ScientificCommunication #AI #NLP #ResearchTools

❤2

489 views06:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

📝 Summary:
Expert-Router Coupling ERC loss aligns MoE router decisions with expert capabilities. It uses proxy tokens and activation constraints to ensure experts specialize, improving performance and computational efficiency. ERC also allows tracking expert specialization during training.

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23447
• PDF: https://arxiv.org/pdf/2512.23447

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MixtureOfExperts #DeepLearning #MachineLearning #AI #NeuralNetworks

262 views09:53

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:55

This media is not supported in your browser

VIEW IN TELEGRAM

✨LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI

188 views09:53

✨ Explore Data Science 📝 Write your paper

ML Research Hub

132 views09:56

ML Research Hub

✨SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

📝 Summary:
SmartSnap introduces proactive, in-situ self-verification for autonomous agents, moving away from passive, post-hoc task verification. Self-Verifying Agents complete tasks and curate minimal snapshot evidence to prove accomplishment, boosting scalability and performance for LLM-driven agents in G...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/yolay/smartsnap
• PDF: https://arxiv.org/pdf/2512.22322
• Project Page: https://huggingface.co/collections/yolay/smartsnap

🔹 Models citing this paper:
• https://huggingface.co/yolay/SmartSnap-LLaMA3.1-8B
• https://huggingface.co/yolay/SmartSnap-Qwen2.5-7B
• https://huggingface.co/yolay/SmartSnap-Qwen3-8B

✨ Datasets citing this paper:
• https://huggingface.co/datasets/yolay/SmartSnap-FT
• https://huggingface.co/datasets/yolay/SmartSnap-RL

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #LLM #AutonomousAgents #AgentVerification #AIResearch

huggingface.co

SmartSnap - a yolay Collection

Data and Checkpoints of "SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents" [arxiv.org/abs/2512.22322]

132 views09:56

✨ Explore Data Science 📝 Write your paper

✨Yume-1.5: A Text-Controlled Interactive World Generation Model

📝 Summary:
Yume-1.5 is a novel framework that generates realistic, interactive, and continuous worlds from a single image or text prompt. It overcomes prior limitations in real-time performance and text control by using unified context compression, streaming acceleration, and text-controlled world events.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22096
• PDF: https://arxiv.org/pdf/2512.22096
• Project Page: https://stdstu12.github.io/YUME-Project/
• Github: https://github.com/stdstu12/YUME

🔹 Models citing this paper:
• https://huggingface.co/stdstu123/Yume-5B-720P

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #GenerativeAI #WorldGeneration #ComputerGraphics #DeepLearning

103 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

📝 Summary:
Transparent objects are hard for perception. This work observes video diffusion models can synthesize transparent phenomena, so they repurpose one. Their DKT model, trained on a new dataset, achieves zero-shot SOTA for depth and normal estimation of transparent objects, proving diffusion knows tr...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23705
• PDF: https://arxiv.org/pdf/2512.23705
• Project Page: https://daniellli.github.io/projects/DKT/
• Github: https://github.com/Daniellli/DKT

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ComputerVision #DiffusionModels #DepthEstimation #TransparentObjects #AIResearch

139 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SpotEdit: Selective Region Editing in Diffusion Transformers

📝 Summary:
SpotEdit is a training-free framework for selective image editing in diffusion transformers. It avoids reprocessing stable regions by reusing their features, combining them with edited areas. This reduces computation and preserves unchanged regions, enhancing efficiency and precision.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22323
• PDF: https://arxiv.org/pdf/2512.22323
• Project Page: https://biangbiang0321.github.io/SpotEdit.github.io
• Github: https://biangbiang0321.github.io/SpotEdit.github.io

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageEditing #DiffusionModels #ComputerVision #AIResearch #DeepLearning

110 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.

🔹 Publication Date: Published on Dec 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision

94 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

📝 Summary:
GRAN-TED improves text encoders for diffusion models by addressing evaluation and adaptation challenges. It introduces TED-6K, an efficient text-only benchmark that predicts generation quality 750x faster. Using this, GRAN-TED develops a superior encoder via a two-stage training method, enhancing...

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15560
• PDF: https://arxiv.org/pdf/2512.15560

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #TextEmbeddings #AIResearch #MachineLearning #NLP

96 views09:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Act2Goal: From World Model To General Goal-conditioned Policy

📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning

97 views09:58

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform