ML Research Hub
32.8K subscribers
5.58K photos
354 videos
24 files
6.04K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Using Songs to Improve Kazakh Automatic Speech Recognition

📝 Summary:
This study improves Kazakh ASR for low-resource languages by using songs as a novel data source. Fine-tuning models with song data, especially combined with existing corpora, significantly boosts performance and offers meaningful adaptation gains.

🔹 Publication Date: Published on Mar 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00961
• PDF: https://arxiv.org/pdf/2603.00961

Datasets citing this paper:
https://huggingface.co/datasets/yeshpanovrustem/kazakh_songs_asr

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#KazakhASR #LowResourceNLP #SpeechRecognition #DataInnovation #MachineLearning
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos

📝 Summary:
A large video scene graph dataset, SVG2, and a new model, TRaSER, are introduced. TRaSER generates spatio-temporal scene graphs, significantly improving relation, object, and attribute prediction, and boosting video question answering accuracy.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23543
• PDF: https://arxiv.org/pdf/2602.23543
• Project Page: https://uwgzq.github.io/papers/SVG2/

🔹 Models citing this paper:
https://huggingface.co/UWGZQ/TRASER

Datasets citing this paper:
https://huggingface.co/datasets/UWGZQ/Synthetic_Visual_Genome2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoSceneGraphs #SpatioTemporal #ComputerVision #VideoQA #DeepLearning
2
PhotoBench: Beyond Visual Matching Towards Personalized Intent-Driven Photo Retrieval

📝 Summary:
PhotoBench introduces a new benchmark for personalized, intent-driven photo retrieval from authentic albums, moving beyond visual matching. It shows current models struggle with non-visual constraints and multi-source fusion, stressing the need for robust agentic reasoning systems.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01493
• PDF: https://arxiv.org/pdf/2603.01493
• Github: https://github.com/LaVieEnRose365/PhotoBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Unified Vision-Language Modeling via Concept Space Alignment

📝 Summary:
V-SONAR extends the text-only SONAR embedding space to support vision-language tasks through post-hoc alignment, enabling zero-shot visual concept understanding and outperforming state-of-the-art mode...

🔹 Publication Date: Published on Mar 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01096
• PDF: https://arxiv.org/pdf/2603.01096

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Classroom Final Exam: An Instructor-Tested Reasoning Benchmark

📝 Summary:
Classroom Final Exam CFE is a multimodal benchmark using authentic university STEM exam problems to assess LLM reasoning. Frontier models achieve only ~60% accuracy, struggling with multi-step solutions and maintaining intermediate states. This highlights significant room for improvement.

🔹 Publication Date: Published on Feb 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.19517
• PDF: https://arxiv.org/pdf/2602.19517
• Project Page: https://analogyai.ai/cfe_bench.html
• Github: https://github.com/Analogy-AI/CFE_Bench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Cryo-Bench: Benchmarking Foundation Models for Cryosphere Applications

📝 Summary:
Cryo-Bench benchmarks Geo-Foundation Models GFMs for cryosphere tasks, addressing a data gap. It evaluates 14 GFMs, finding they adapt well despite limited pretraining. For optimal results, encoder fine-tuning with hyperparameter optimization is recommended.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01576
• PDF: https://arxiv.org/pdf/2603.01576
• Github: https://github.com/Sk-2103/Cryo-Bench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Planning from Observation and Interaction

📝 Summary:
This paper presents a planning-based Inverse Reinforcement Learning algorithm for real-world robot manipulation. It learns effectively from observation and interaction alone, without prior knowledge or pre-training. The approach demonstrates superior sample efficiency and enables online transfer ...

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24121
• PDF: https://arxiv.org/pdf/2602.24121
• Project Page: https://uwrobotlearning.github.io/mpail2/
• Github: https://github.com/UWRobotLearning/mpail2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

📝 Summary:
CMI-RewardBench establishes a comprehensive ecosystem for evaluating music reward models under compositional multimodal instruction. It provides large-scale datasets, a unified benchmark for various alignment tasks, and CMI reward models that correlate strongly with human judgments.

🔹 Publication Date: Published on Feb 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00610
• PDF: https://arxiv.org/pdf/2603.00610
• Github: https://github.com/Haiwen-Xia/CMI-RewardBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

📝 Summary:
SteerEval is a new hierarchical benchmark to evaluate large language model controllability across language, sentiment, and personality. It shows that control often degrades at finer-grained levels, providing a framework for safer and more controllable LLM behavior.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02578
• PDF: https://arxiv.org/pdf/2603.02578

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

📝 Summary:
Generative Reward Models can be improved by structuring Chain-of-Thought reasoning into breadth and depth components and optimizing them through supervised fine-tuning and reinforcement learning with ...

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01571
• PDF: https://arxiv.org/pdf/2603.01571
• Project Page: https://huggingface.co/collections/DonJoey/mix-grm

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
DREAM: Where Visual Understanding Meets Text-to-Image Generation

📝 Summary:
DREAM is a unified multimodal framework that combines visual representation learning and text-to-image generation through progressive masking and semantically aligned decoding, achieving superior perf...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02667
• PDF: https://arxiv.org/pdf/2603.02667
• Github: https://github.com/chaoli-charlie/dream

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference

📝 Summary:
PRISM is a Process Reward Model-guided inference algorithm that enhances DEEPTHINK systems by using step-level verification to improve population refinement and solution aggregation, achieving strong ...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02479
• PDF: https://arxiv.org/pdf/2603.02479

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

📝 Summary:
Unified multimodal models generally underperform specialized VLMs in generation-to-understanding tasks. However, they show consistent enhancements in spatial intelligence, visual illusions, and multi-round reasoning. This highlights the need for diverse training data to unlock their full potential.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03241
• PDF: https://arxiv.org/pdf/2603.03241
• Project Page: https://nssmd.github.io/unig2u.github.io/
• Github: https://github.com/nssmd/UniG2U

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

📝 Summary:
NOVA is a novel unpaired video editing framework that uses sparse semantic guidance and dense synthesis to achieve high-fidelity editing with improved motion preservation and temporal coherence. AI-ge...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02802
• PDF: https://arxiv.org/pdf/2603.02802
• Github: https://github.com/WeChatCV/NovaEdit

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

📝 Summary:
A scalable data generation pipeline creates high-fidelity video editing training data, and a unified architecture enables improved instruction-following and reference fidelity in controllable video ed...

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02175
• PDF: https://arxiv.org/pdf/2603.02175
• Project Page: https://showlab.github.io/Kiwi-Edit/
• Github: https://github.com/showlab/Kiwi-Edit

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Kling-MotionControl Technical Report

📝 Summary:
Kling-MotionControl is a DiT-based framework for character animation that combines heterogeneous motion representations, adaptive identity-agnostic learning, and advanced acceleration techniques to ac...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03160
• PDF: https://arxiv.org/pdf/2603.03160

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HateMirage: An Explainable Multi-Dimensional Dataset for Decoding Faux Hate and Subtle Online Abuse

📝 Summary:
HateMirage is a new dataset designed to advance research on hate speech embedded in misinformation by providing multi-dimensional annotations for target, intent, and implication, offering a benchmark ...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02684
• PDF: https://arxiv.org/pdf/2603.02684
• Github: https://github.com/Sai-Kartheek-Reddy/HateMirage

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SGDC: Structurally-Guided Dynamic Convolution for Medical Image Segmentation

📝 Summary:
Structure-Guided Dynamic Convolution enhances medical image segmentation by using explicit structural guidance to preserve fine-grained details lost through traditional average pooling methods. AI-gen...

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23496
• PDF: https://arxiv.org/pdf/2602.23496
• Github: https://github.com/solstice0621/SGDC

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Token Reduction via Local and Global Contexts Optimization for Efficient Video Large Language Models

📝 Summary:
AOT framework reduces video token redundancy through local-global optimal transport to preserve informative contexts while achieving efficient spatiotemporal compression in video large language models...

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01400
• PDF: https://arxiv.org/pdf/2603.01400
• Project Page: https://tyroneli.github.io/AOT/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels

📝 Summary:
A feedforward model called Track4World enables efficient holistic 3D tracking of every pixel in a video by utilizing a global 3D scene representation and novel 3D correlation scheme for dense flow est...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02573
• PDF: https://arxiv.org/pdf/2603.02573
• Project Page: https://jiah-cloud.github.io/Track4World.github.io/
• Github: https://github.com/TencentARC/Track4World

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Beyond Language Modeling: An Exploration of Multimodal Pretraining

📝 Summary:
Controlled multimodal pretraining experiments reveal key insights about unified visual representations, data complementarity, world modeling emergence, and efficient scaling through mixture-of-experts...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03276
• PDF: https://arxiv.org/pdf/2603.03276

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research