ML Research Hub
32.7K subscribers
5.64K photos
358 videos
24 files
6.09K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

📝 Summary:
Proact-VL is a multimodal framework that enables real-time interactive AI companions for gaming scenarios with low-latency responses and strong video understanding capabilities. AI-generated summary P...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03447
• PDF: https://arxiv.org/pdf/2603.03447

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

📝 Summary:
MemSifter is a framework that uses a small proxy model to offload memory retrieval from large language models, employing reinforcement learning with task-performance rewards and training techniques li...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03379
• PDF: https://arxiv.org/pdf/2603.03379
• Github: https://github.com/plageon/MemSifter

🔹 Models citing this paper:
https://huggingface.co/zstanjj/MemSifter-4B-Thinking

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

📝 Summary:
MUSE is an open-source platform for evaluating multimodal safety in large language models, incorporating automated cross-modal attack generation and a dual-metric framework to assess alignment across ...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02482
• PDF: https://arxiv.org/pdf/2603.02482

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

📝 Summary:
SWE-CI presents a repository-level benchmark for evaluating code generation agents' ability to maintain code quality through long-term software evolution cycles. AI-generated summary Large language mo...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03823
• PDF: https://arxiv.org/pdf/2603.03823

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RIVER: A Real-Time Interaction Benchmark for Video LLMs

📝 Summary:
RIVER Bench is introduced to evaluate real-time video comprehension through retrospective memory, live-perception, and proactive anticipation tasks. This benchmark reveals current offline models struggle with real-time processing, long-term memory, and future perception, highlighting the need for...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03985
• PDF: https://arxiv.org/pdf/2603.03985
• Github: https://github.com/OpenGVLab/RIVER

Datasets citing this paper:
https://huggingface.co/datasets/nanamma/RIVER

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

📝 Summary:
EmbodiedSplat provides real-time 3D scene understanding, combining online 3D Gaussian Splatting with CLIP embeddings from streaming images. It simultaneously reconstructs and semantically comprehends 3D scenes using a novel sparse coefficients field and CLIP global codebook for efficiency and gen...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04254
• PDF: https://arxiv.org/pdf/2603.04254
• Project Page: https://0nandon.github.io/EmbodiedSplat/
• Github: https://github.com/0nandon/EmbodiedSplat

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DSceneUnderstanding #3DGaussianSplatting #ComputerVision #AI #NeuralRendering
1
GroupEnsemble: Efficient Uncertainty Estimation for DETR-based Object Detection

📝 Summary:
DETR models lack spatial uncertainty and current estimation methods are too costly. GroupEnsemble efficiently estimates uncertainty by using independent query groups in a single forward pass with an attention mask. This outperforms Deep Ensembles at a fraction of the cost.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01847
• PDF: https://arxiv.org/pdf/2603.01847

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #UncertaintyEstimation #DETR #ComputerVision #MachineLearning
InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

📝 Summary:
This paper introduces InfinityStory, a novel framework, dataset, and model for long-form video generation. It tackles challenges in background consistency and seamless multi-subject transitions, achieving high consistency and smoother transitions on VBench.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03646
• PDF: https://arxiv.org/pdf/2603.03646

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ComputerVision
2
BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

📝 Summary:
BeamPERL improved a compact LLM's beam statics performance by 66.7% using RL with verifiable rewards. However, it learned procedural solution patterns rather than true physical reasoning, failing at topological shifts. This shows verifiable rewards alone dont guarantee transferable scientific rea...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04124
• PDF: https://arxiv.org/pdf/2603.04124
• Project Page: https://huggingface.co/collections/lamm-mit/beamperl
• Github: https://github.com/lamm-mit/BeamPERL

🔹 Models citing this paper:
https://huggingface.co/lamm-mit/BeamPERL

Datasets citing this paper:
https://huggingface.co/datasets/lamm-mit/BeamRL-TrainData
https://huggingface.co/datasets/lamm-mit/BeamRL-EvalData

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #ReinforcementLearning #BeamMechanics #AIResearch #DeepLearning
Qwen Technical Report

📝 Summary:
Qwen is a series of large language models encompassing base, chat, coding, and mathematics variants. These models consistently achieve superior performance across diverse tasks, significantly outperforming open-source counterparts. Qwen-Chat models also feature advanced tool-use and planning capa...

🔹 Publication Date: Published on Sep 28, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2309.16609
• PDF: https://arxiv.org/pdf/2309.16609
• Github: https://github.com/QwenLM/Qwen-7B

🔹 Models citing this paper:
https://huggingface.co/Qwen/Qwen-7B-Chat
https://huggingface.co/Qwen/Qwen-7B
https://huggingface.co/Qwen/Qwen-14B-Chat

Datasets citing this paper:
https://huggingface.co/datasets/huyxdang/qwen-medqa-tagged
https://huggingface.co/datasets/huyxdang/qwen-math-predictions

Spaces citing this paper:
https://huggingface.co/spaces/pliny-the-prompter/obliteratus
https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
https://huggingface.co/spaces/lhoestq/fake-data-generator-jsonl

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Qwen #LLM #AI #NLP #DeepLearning
MIBURI: Towards Expressive Interactive Gesture Synthesis

📝 Summary:
MIBURI is an online, real-time framework generating expressive full-body gestures and facial expressions for spoken dialogue. It uses body-part aware codecs and LLM embeddings to create natural, diverse, and contextually aligned motions causally, overcoming limitations of prior methods.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03282
• PDF: https://arxiv.org/pdf/2603.03282
• Project Page: https://vcai.mpi-inf.mpg.de/projects/MIBURI/
• Github: https://github.com/m-hamza-mughal/miburi

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GestureSynthesis #AI #HumanComputerInteraction #NLP #RealtimeTech
Specificity-aware reinforcement learning for fine-grained open-world classification

📝 Summary:
A novel RL framework SpeciaRL improves large multimodal models for open-world fine-grained classification. It enhances prediction specificity while maintaining correctness using a dynamic verifier-based reward. Experiments show SpeciaRL achieves the best trade-off.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03197
• PDF: https://arxiv.org/pdf/2603.03197

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #MachineLearning #ComputerVision #AI #MultimodalAI
HDINO: A Concise and Efficient Open-Vocabulary Detector

📝 Summary:
HDINO is an efficient open-vocabulary detector using a two-stage training strategy. It employs One-to-Many Semantic Alignment and lightweight feature fusion, avoiding manual data curation and complex feature extraction. HDINO achieves superior performance on COCO with less training data.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02924
• PDF: https://arxiv.org/pdf/2603.02924
• Github: https://github.com/HaoZ416/HDINO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #ComputerVision #OpenVocabulary #DeepLearning #AIResearch
Qwen2.5 Technical Report

📝 Summary:
Qwen2.5, an enhanced series of large language models, demonstrates superior performance across various benchmarks and use cases through extensive pre-training and advanced post-training techniques. AI...

🔹 Publication Date: Published on Dec 19, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2412.15115
• PDF: https://arxiv.org/pdf/2412.15115
• Github: https://github.com/QwenLM/Qwen2.5

🔹 Models citing this paper:
https://huggingface.co/Qwen/QwQ-32B
https://huggingface.co/Qwen/QwQ-32B-GGUF
https://huggingface.co/Qwen/QwQ-32B-AWQ

Datasets citing this paper:
https://huggingface.co/datasets/HuggingFaceTB/smoltalk2

Spaces citing this paper:
https://huggingface.co/spaces/modelscope/DocResearch
https://huggingface.co/spaces/ITHwangg/candle-qwen25-wasm-demo
https://huggingface.co/spaces/GuminiResearch/Gumini_sLLM_Report

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

📝 Summary:
This study empirically analyzes visual token pruning in LVLMs. It finds attention-based pruning is better for simple images, while diversity-based methods suit complex ones. These insights lead to improved adaptive pruning strategies that reduce hallucination.

🔹 Publication Date: Published on Mar 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.01236
• PDF: https://arxiv.org/pdf/2603.01236
• Project Page: https://paper.pnu-cvsp.com/AgilePruner/
• Github: https://github.com/cvsp-lab/AgilePruner

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LVLMs #VisualTokenPruning #AdaptiveAI #HallucinationReduction #DeepLearning
1
This media is not supported in your browser
VIEW IN TELEGRAM
V_1: Unifying Generation and Self-Verification for Parallel Reasoners

📝 Summary:
V1 unifies generation and verification for complex reasoning tasks. It leverages models' superior ability in pairwise self-verification over independent scoring, improving performance and efficiency in code generation and math.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04304
• PDF: https://arxiv.org/pdf/2603.04304
• Project Page: https://harmandotpy.github.io/v1-verification/
• Github: https://github.com/HarmanDotpy/pairwise-self-verification

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #LLMs #MachineLearning #CodeGeneration #AIReasoning
1
Underwater Camouflaged Object Tracking Meets Vision-Language SAM2

📝 Summary:
A new large-scale multi-modal underwater camouflaged object tracking dataset, UW-COT220, was introduced. Evaluations showed SAM2 improved tracking performance over SAM. A novel vision-language framework, VL-SAM2, achieved state-of-the-art results on both underwater and open-air object tracking da...

🔹 Publication Date: Published on Sep 25, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2409.16902
• PDF: https://arxiv.org/pdf/2409.16902
• Github: https://github.com/983632847/awesome-multimodal-object-tracking

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1