ML Research Hub
32.8K subscribers
4.45K photos
272 videos
23 files
4.81K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

📝 Summary:
This report evaluated 7 frontier AI models for safety across language, vision-language, and image generation. It found varied safety performance, with GPT-5.2 consistently strong. All models showed significant vulnerability to adversarial attacks, highlighting the multidimensional nature of AI sa...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10527
• PDF: https://arxiv.org/pdf/2601.10527

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

📝 Summary:
Text-to-image diffusion models enhanced with language model reasoning capabilities achieve improved factual consistency and semantic alignment through a think-then-generate paradigm with dual-gradient...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10332
• PDF: https://arxiv.org/pdf/2601.10332
• Project Page: https://zhijie-group.github.io/Think-Then-Generate/
• Github: https://github.com/zhijie-group/Think-Then-Generate

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

📝 Summary:
Molmo2 is a new open-source video-language model family that achieves state-of-the-art performance through novel datasets and training methods, particularly excelling in video grounding tasks without ...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10611
• PDF: https://arxiv.org/pdf/2601.10611

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Inference-time Physics Alignment of Video Generative Models with Latent World Models

📝 Summary:
Latent world models enhance video generation physics plausibility through inference-time alignment and trajectory steering, achieving superior performance in challenging benchmarks. AI-generated summa...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10553
• PDF: https://arxiv.org/pdf/2601.10553

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

📝 Summary:
A large-scale Chinese image-text dataset called DanQing is introduced to advance vision-language pretraining, demonstrating superior performance in various downstream tasks through continual pretraini...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10305
• PDF: https://arxiv.org/pdf/2601.10305
• Project Page: https://deepglint.github.io/DanQing/
• Github: https://github.com/deepglint/DanQing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

📝 Summary:
Chain-of-Frame reasoning is integrated into text-to-image generation through progressive visual refinement with explicit intermediate steps, achieving superior performance on benchmark datasets. AI-ge...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2601.10061
• PDF: https://arxiv.org/pdf/2601.10061
• Project Page: https://cof-t2i.github.io/
• Github: https://github.com/VisionChengzhuo/CoF-T2I

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

📝 Summary:
MatchTIR enhances LLM reasoning by introducing fine-grained credit assignment through bipartite matching and dual-level advantage estimation for tool-integrated tasks. AI-generated summary Tool-Integr...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10712
• PDF: https://arxiv.org/pdf/2601.10712
• Project Page: https://huggingface.co/collections/ChangleQu/matchtir
• Github: https://github.com/quchangle1/MatchTIR

🔹 Models citing this paper:
https://huggingface.co/ChangleQu/Qwen3-8B-MatchTIR-KM
https://huggingface.co/ChangleQu/Qwen3-8B-MatchTIR-OT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
FlowAct-R1: Towards Interactive Humanoid Video Generation

📝 Summary:
FlowAct-R1 enables real-time interactive humanoid video generation with high-fidelity synthesis and low-latency responsiveness through MMDiT architecture and chunkwise diffusion forcing strategies. AI...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10103
• PDF: https://arxiv.org/pdf/2601.10103

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

📝 Summary:
Reinforcement learning for large language models is enhanced by a rollout-level objective that rewards rare high-level reasoning strategies, improving diverse solution discovery without sacrificing in...

🔹 Publication Date: Published on Jan 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08763
• PDF: https://arxiv.org/pdf/2601.08763

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

📝 Summary:
Multi-Agent Test-Time Reinforcement Learning (MATTRL) enhances multi-agent reasoning through structured textual experience injection and consensus-based decision making at inference time. AI-generated...

🔹 Publication Date: Published on Jan 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09667
• PDF: https://arxiv.org/pdf/2601.09667

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge

📝 Summary:
EvasionBench introduces a large-scale benchmark for detecting evasive responses in earnings calls using a multi-model annotation framework that leverages disagreement between advanced language models ...

🔹 Publication Date: Published on Jan 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09142
• PDF: https://arxiv.org/pdf/2601.09142

🔹 Models citing this paper:
https://huggingface.co/FutureMa/Eva-4B

Spaces citing this paper:
https://huggingface.co/spaces/FutureMa/financial-evasion-detection

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback

📝 Summary:
A guardrail model and reasoning framework are developed to detect and prevent unsafe tool invocations in LLM agents, improving both safety and task performance under adversarial conditions. AI-generat...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10156
• PDF: https://arxiv.org/pdf/2601.10156
• Github: https://github.com/MurrayTom/ToolSafe

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Transition Matching Distillation for Fast Video Generation

📝 Summary:
Transition Matching Distillation enables efficient video generation by distilling diffusion models into few-step predictors using conditional flows and semantic representation decomposition. AI-genera...

🔹 Publication Date: Published on Jan 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09881
• PDF: https://arxiv.org/pdf/2601.09881

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Action100M: A Large-scale Video Action Dataset

📝 Summary:
Action100M is a large-scale video action dataset constructed from internet instructional videos using automated pipelines with V-JEPA embeddings and GPT-based reasoning for structured annotations. AI-...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10592
• PDF: https://arxiv.org/pdf/2601.10592

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
STEP3-VL-10B Technical Report

📝 Summary:
STEP3-VL-10B is a lightweight 10B multimodal model that rivals much larger models and proprietary flagships in performance. It uses unified pre-training, scaled post-training, and Parallel Coordinated Reasoning for efficient visual reasoning. This open-source model sets a new standard for compact...

🔹 Publication Date: Published on Jan 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09668
• PDF: https://arxiv.org/pdf/2601.09668
• Project Page: https://stepfun-ai.github.io/Step3-VL-10B
• Github: https://github.com/stepfun-ai/Step3-VL-10B

🔹 Models citing this paper:
https://huggingface.co/stepfun-ai/Step3-VL-10B
https://huggingface.co/stepfun-ai/Step3-VL-10B-Base

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary

📝 Summary:
Process Reward Learning decomposes reinforcement learning objectives into intermediate steps to provide fine-grained supervision for improving large language model reasoning abilities. AI-generated su...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10201
• PDF: https://arxiv.org/pdf/2601.10201
• Github: https://github.com/MaxwellJryao/Process-Reward-Learning

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

📝 Summary:
LaViT addresses the perception gap in multimodal reasoning by aligning latent visual thoughts through autoregressive reconstruction of visual semantics and attention trajectories, improving visual gro...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10129
• PDF: https://arxiv.org/pdf/2601.10129
• Github: https://github.com/Svardfox/LaViT

🔹 Models citing this paper:
https://huggingface.co/Svard/LaViT-3B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Deriving Character Logic from Storyline as Codified Decision Trees

📝 Summary:
Executable and interpretable decision trees are induced from narrative data to create robust behavioral profiles for role-playing agents, outperforming traditional methods in consistency and reliabili...

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10080
• PDF: https://arxiv.org/pdf/2601.10080

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Urban Socio-Semantic Segmentation with Vision-Language Reasoning

📝 Summary:
SocioReasoner, a vision-language AI, performs urban socio-semantic segmentation of social entities. It simulates human reasoning using reinforcement learning on a new dataset. This approach outperforms state-of-the-art models, achieving strong zero-shot generalization.

🔹 Publication Date: Published on Jan 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10477
• PDF: https://arxiv.org/pdf/2601.10477
• Github: https://github.com/AMAP-ML/SocioReasoner

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

📝 Summary:
A logic-structured training framework explicitly models instruction logic through constraint-aware reward mechanisms, improving instruction-following and reasoning capabilities in large language model...

🔹 Publication Date: Published on Jan 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06431
• PDF: https://arxiv.org/pdf/2601.06431

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

📝 Summary:
A novel framework injects semantic intent into Mixture-of-Experts routing for image generation and editing, resolving task interference through hierarchical task annotation and predictive alignment re...

🔹 Publication Date: Published on Jan 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08881
• PDF: https://arxiv.org/pdf/2601.08881
• Project Page: https://yuci-gpt.github.io/TAG-MoE/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research