ML Research Hub
32.7K subscribers
5.64K photos
358 videos
24 files
6.09K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
Helios: Real Real-Time Long Video Generation Model

📝 Summary:
Helios is a 14B autoregressive diffusion model that achieves real-time minute-scale video generation at 19.5 FPS on a single GPU. It innovatively overcomes long-video drifting and real-time performance challenges without conventional acceleration or anti-drifting techniques. Helios supports T2V, ...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04379
• PDF: https://arxiv.org/pdf/2603.04379
• Github: https://pku-yuangroup.github.io/Helios-Page/

🔹 Models citing this paper:
https://huggingface.co/BestWishYsh/Helios-Base
https://huggingface.co/BestWishYsh/Helios-Distilled
https://huggingface.co/BestWishYsh/Helios-Mid

Spaces citing this paper:
https://huggingface.co/spaces/multimodalart/Helios-Distilled

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

📝 Summary:
ArtHOI synthesizes articulated human-object interactions by formulating 4D reconstruction from monocular video priors, using optical flow for part segmentation and a decoupled reconstruction pipeline ...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04338
• PDF: https://arxiv.org/pdf/2603.04338
• Project Page: https://arthoi.github.io/
• Github: https://github.com/Inso-13/ArtHOI

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

📝 Summary:
A memory mechanism called Memex enables large language model agents to handle long-horizon tasks more effectively by maintaining compact context through structured summaries while storing full interac...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04257
• PDF: https://arxiv.org/pdf/2603.04257

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Phi-4-reasoning-vision-15B Technical Report

📝 Summary:
A compact open-weight multimodal reasoning model is presented that achieves competitive performance through careful architecture design, high-quality data curation, and a hybrid approach combining dir...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03975
• PDF: https://arxiv.org/pdf/2603.03975
• Project Page: https://www.microsoft.com/en-us/research/blog/phi-4-reasoning-vision-and-the-lessons-of-training-a-multimodal-reasoning-model/
• Github: https://github.com/microsoft/Phi-4-reasoning-vision-15B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Heterogeneous Agent Collaborative Reinforcement Learning

📝 Summary:
HACRL enables collaborative reinforcement learning where heterogeneous agents share verified rollouts during training to improve collectively while maintaining independent operation at inference time,...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02604
• PDF: https://arxiv.org/pdf/2603.02604
• Project Page: https://zzx-peter.github.io/hacrl/
• Github: https://zzx-peter.github.io/hacrl/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

📝 Summary:
Structure of Thought prompting technique enhances language model performance by guiding explicit intermediate text structuring across diverse tasks, while T2S-Bench benchmark evaluates and improves te...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03790
• PDF: https://arxiv.org/pdf/2603.03790
• Project Page: https://t2s-bench.github.io/T2S-Bench-Page/
• Github: https://t2s-bench.github.io/T2S-Bench-Page/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

📝 Summary:
CubeComposer is a spatio-temporal autoregressive diffusion model that generates high-resolution 360° panoramic videos by decomposing them into cubemap representations and using efficient autoregressiv...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04291
• PDF: https://arxiv.org/pdf/2603.04291
• Project Page: https://lg-li.github.io/project/cubecomposer
• Github: https://github.com/TencentARC/CubeComposer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

📝 Summary:
Proact-VL is a multimodal framework that enables real-time interactive AI companions for gaming scenarios with low-latency responses and strong video understanding capabilities. AI-generated summary P...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03447
• PDF: https://arxiv.org/pdf/2603.03447

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

📝 Summary:
MemSifter is a framework that uses a small proxy model to offload memory retrieval from large language models, employing reinforcement learning with task-performance rewards and training techniques li...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03379
• PDF: https://arxiv.org/pdf/2603.03379
• Github: https://github.com/plageon/MemSifter

🔹 Models citing this paper:
https://huggingface.co/zstanjj/MemSifter-4B-Thinking

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

📝 Summary:
MUSE is an open-source platform for evaluating multimodal safety in large language models, incorporating automated cross-modal attack generation and a dual-metric framework to assess alignment across ...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02482
• PDF: https://arxiv.org/pdf/2603.02482

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

📝 Summary:
SWE-CI presents a repository-level benchmark for evaluating code generation agents' ability to maintain code quality through long-term software evolution cycles. AI-generated summary Large language mo...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03823
• PDF: https://arxiv.org/pdf/2603.03823

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RIVER: A Real-Time Interaction Benchmark for Video LLMs

📝 Summary:
RIVER Bench is introduced to evaluate real-time video comprehension through retrospective memory, live-perception, and proactive anticipation tasks. This benchmark reveals current offline models struggle with real-time processing, long-term memory, and future perception, highlighting the need for...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03985
• PDF: https://arxiv.org/pdf/2603.03985
• Github: https://github.com/OpenGVLab/RIVER

Datasets citing this paper:
https://huggingface.co/datasets/nanamma/RIVER

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

📝 Summary:
EmbodiedSplat provides real-time 3D scene understanding, combining online 3D Gaussian Splatting with CLIP embeddings from streaming images. It simultaneously reconstructs and semantically comprehends 3D scenes using a novel sparse coefficients field and CLIP global codebook for efficiency and gen...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04254
• PDF: https://arxiv.org/pdf/2603.04254
• Project Page: https://0nandon.github.io/EmbodiedSplat/
• Github: https://github.com/0nandon/EmbodiedSplat

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DSceneUnderstanding #3DGaussianSplatting #ComputerVision #AI #NeuralRendering
1
GroupEnsemble: Efficient Uncertainty Estimation for DETR-based Object Detection

📝 Summary:
DETR models lack spatial uncertainty and current estimation methods are too costly. GroupEnsemble efficiently estimates uncertainty by using independent query groups in a single forward pass with an attention mask. This outperforms Deep Ensembles at a fraction of the cost.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01847
• PDF: https://arxiv.org/pdf/2603.01847

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #UncertaintyEstimation #DETR #ComputerVision #MachineLearning
InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

📝 Summary:
This paper introduces InfinityStory, a novel framework, dataset, and model for long-form video generation. It tackles challenges in background consistency and seamless multi-subject transitions, achieving high consistency and smoother transitions on VBench.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03646
• PDF: https://arxiv.org/pdf/2603.03646

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ComputerVision
2
BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

📝 Summary:
BeamPERL improved a compact LLM's beam statics performance by 66.7% using RL with verifiable rewards. However, it learned procedural solution patterns rather than true physical reasoning, failing at topological shifts. This shows verifiable rewards alone dont guarantee transferable scientific rea...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04124
• PDF: https://arxiv.org/pdf/2603.04124
• Project Page: https://huggingface.co/collections/lamm-mit/beamperl
• Github: https://github.com/lamm-mit/BeamPERL

🔹 Models citing this paper:
https://huggingface.co/lamm-mit/BeamPERL

Datasets citing this paper:
https://huggingface.co/datasets/lamm-mit/BeamRL-TrainData
https://huggingface.co/datasets/lamm-mit/BeamRL-EvalData

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #ReinforcementLearning #BeamMechanics #AIResearch #DeepLearning
Qwen Technical Report

📝 Summary:
Qwen is a series of large language models encompassing base, chat, coding, and mathematics variants. These models consistently achieve superior performance across diverse tasks, significantly outperforming open-source counterparts. Qwen-Chat models also feature advanced tool-use and planning capa...

🔹 Publication Date: Published on Sep 28, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2309.16609
• PDF: https://arxiv.org/pdf/2309.16609
• Github: https://github.com/QwenLM/Qwen-7B

🔹 Models citing this paper:
https://huggingface.co/Qwen/Qwen-7B-Chat
https://huggingface.co/Qwen/Qwen-7B
https://huggingface.co/Qwen/Qwen-14B-Chat

Datasets citing this paper:
https://huggingface.co/datasets/huyxdang/qwen-medqa-tagged
https://huggingface.co/datasets/huyxdang/qwen-math-predictions

Spaces citing this paper:
https://huggingface.co/spaces/pliny-the-prompter/obliteratus
https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
https://huggingface.co/spaces/lhoestq/fake-data-generator-jsonl

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Qwen #LLM #AI #NLP #DeepLearning
MIBURI: Towards Expressive Interactive Gesture Synthesis

📝 Summary:
MIBURI is an online, real-time framework generating expressive full-body gestures and facial expressions for spoken dialogue. It uses body-part aware codecs and LLM embeddings to create natural, diverse, and contextually aligned motions causally, overcoming limitations of prior methods.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03282
• PDF: https://arxiv.org/pdf/2603.03282
• Project Page: https://vcai.mpi-inf.mpg.de/projects/MIBURI/
• Github: https://github.com/m-hamza-mughal/miburi

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GestureSynthesis #AI #HumanComputerInteraction #NLP #RealtimeTech