ML Research Hub
32.5K subscribers
6.07K photos
392 videos
24 files
6.56K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow

📝 Summary:
FlowScene is a generative model that uses multimodal graph conditioning and rectified flow to create realistic, style-consistent indoor scenes. It offers fine-grained control over object shapes, textures, and relations, surpassing prior methods.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19598
• PDF: https://arxiv.org/pdf/2603.19598

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GenerativeAI #3DSceneGeneration #MultimodalAI #DeepLearning #ComputerGraphics
🎁 23 Years of SPOTO – Claim Your Free IT Certs Prep Kit!

🔥Whether you're preparing for #Python, #AI, #Cisco, #PMI, #Fortinet, #AWS, #Azure, #Excel, #comptia, #ITIL, #cloud or any other in-demand certification – SPOTO has got you covered!

Free Resources :
・Free Python, Excel, Cyber Security, Cisco, SQL, ITIL, PMP, AWS courses: https://bit.ly/4lk4m3c
・IT Certs E-book: https://bit.ly/4bdZOqt
・IT Exams Skill Test: https://bit.ly/4sDvi0b
・Free AI material and support tools: https://bit.ly/46TpsQ8
・Free Cloud Study Guide: https://bit.ly/4lk3dIS


👉 Become Part of Our IT Learning Circle! resources and support:
https://chat.whatsapp.com/Cnc5M5353oSBo3savBl397

💬 Want exam help? Chat with an admin now!
wa.link/rozuuw
TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

📝 Summary:
TerraScope is a new VLM for Earth Observation enabling pixel-grounded geospatial reasoning. It offers modality-flexible and multi-temporal capabilities, outperforming existing models on a new benchmark for accurate and interpretable results.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19039
• PDF: https://arxiv.org/pdf/2603.19039
• Project Page: https://shuyansy.github.io/terrascope/
• Github: https://github.com/shuyansy/Earth-Observation-VLMs

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#EarthObservation #VLM #Geospatial #RemoteSensing #ComputerVision
HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering

📝 Summary:
HiMu is a training-free framework for long video QA. It efficiently selects relevant frames using hierarchical query decomposition with lightweight multimodal experts, preserving temporal and cross-modal structure. HiMu advances the efficiency-accuracy Pareto front.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18558
• PDF: https://arxiv.org/pdf/2603.18558
• Project Page: https://danbenami.github.io/HiMu.io/
• Github: https://github.com/DanBenAmi/HiMu

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoQA #MultimodalAI #ComputerVision #MachineLearning #AI
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

📝 Summary:
V-JEPA 2 uses self-supervised learning on web videos and minimal robot data. It excels at video understanding, anticipation, Q&A, and zero-shot robotic planning. This approach yields a powerful world model for physical world planning.

🔹 Publication Date: Published on Jun 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/v-jepa-2-self-supervised-video-models-enable-understanding-prediction-and-planning
• PDF: https://arxiv.org/pdf/2506.09985
• Github: https://github.com/facebookresearch/vjepa2

Datasets citing this paper:
https://huggingface.co/datasets/ckadirt/vjxla

Spaces citing this paper:
https://huggingface.co/spaces/vselvarajijay/vjepa2-latent-prediction
https://huggingface.co/spaces/aavi21458/vjepa2-latent-prediction

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #SelfSupervisedLearning #VideoAI #Robotics #WorldModels
🔥1
LoopRPT: Reinforcement Pre-Training for Looped Language Models

📝 Summary:
LoopRPT is a reinforcement pre-training framework for looped language models. It directly shapes intermediate representations by assigning reinforcement signals to latent steps, improving latent reasoning. This leads to better accuracy-computation trade-offs and enhanced early-stage reasoning.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19714
• PDF: https://arxiv.org/pdf/2603.19714

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #LanguageModels #AI #NLP #DeepLearning
Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social Dilemmas

📝 Summary:
This paper uses LLMs to synthesize agent policies for multi-agent environments. Dense feedback including social metrics consistently outperforms sparse reward-only feedback, guiding LLMs toward effective cooperative strategies in social dilemmas.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19453
• PDF: https://arxiv.org/pdf/2603.19453
• Github: https://github.com/vicgalle/llm-policies-social-dilemmas

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #MultiAgentSystems #SocialDilemmas #ReinforcementLearning #AIResearch
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models

📝 Summary:
This paper introduces ProactiveBench to measure if MLLMs can proactively ask for user help on challenging tasks. It finds MLLMs generally lack this proactiveness, and conversational history can even hinder it. However, reinforcement learning shows promise for teaching models this crucial collabor...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19466
• PDF: https://arxiv.org/pdf/2603.19466
• Project Page: https://huggingface.co/datasets/tdemin16/ProactiveBench
• Github: https://github.com/tdemin16/proactivebench

Datasets citing this paper:
https://huggingface.co/datasets/tdemin16/ProactiveBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MLLMs #AIProactiveness #BenchmarkingAI #ReinforcementLearning #LargeLanguageModels
1
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus

📝 Summary:
λ-RLM replaces open-ended recursive code generation in LLMs with a typed functional runtime based on λ-calculus. This provides formal guarantees and improves long-context reasoning by outperforming standard RLMs in accuracy and latency.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20105
• PDF: https://arxiv.org/pdf/2603.20105
• Github: https://github.com/lambda-calculus-LLM/lambda-RLM

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #LambdaCalculus #AI #NaturalLanguageProcessing #DeepLearning
1
Versatile Editing of Video Content, Actions, and Dynamics without Training

📝 Summary:
DynaEdit is a training-free method for versatile video editing using pretrained text-to-video models. It addresses limitations in handling complex edits, actions, and object interactions by solving technical issues like misalignment and jitter, achieving state-of-the-art results.

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17989
• PDF: https://arxiv.org/pdf/2603.17989
• Project Page: https://dynaedit.github.io

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoEditing #TextToVideo #GenerativeAI #ComputerVision #AIResearch
Deep Tabular Research via Continual Experience-Driven Execution

📝 Summary:
This paper introduces Deep Tabular Research DTR, an agentic framework for complex tabular reasoning. It constructs a hierarchical meta-graph, uses expectation-aware path selection, and refines iteratively via siamese structured memory, highlighting the importance of separating planning from execu...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09151
• PDF: https://arxiv.org/pdf/2603.09151

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DeepLearning #TabularData #AI #MachineLearning #AIagents
CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management

📝 Summary:
CurveStream enhances streaming video understanding in MLLMs via a curvature-aware hierarchical memory framework. It dynamically routes frames based on semantic intensity to prevent Out-of-Memory errors and achieve over 10 percent performance gains.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19571
• PDF: https://arxiv.org/pdf/2603.19571
• Github: https://github.com/streamingvideos/CurveStream

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MLLMs #StreamingVideo #VideoUnderstanding #MemoryManagement #AI
s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs

📝 Summary:
s2n-bignum-bench is a new benchmark evaluating LLMs on formal proof synthesis for industrial cryptographic assembly routines. It bridges the gap between competition math and real-world verification by requiring LLMs to generate HOL Light proofs for AWS s2n-bignum library code.

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14628
• PDF: https://arxiv.org/pdf/2603.14628
• Project Page: https://kings-crown.github.io/s2n-bignum-leaderboard/
• Github: https://github.com/kings-crown/s2n-bignum-bench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

📝 Summary:
State space models demonstrate competitive performance as vision backbones for vision-language models, matching or exceeding transformer-based architectures while operating at smaller scales and requi...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19209
• PDF: https://arxiv.org/pdf/2603.19209
• Project Page: https://lab-spell.github.io/vlm-ssm-vision-encoders/
• Github: https://github.com/raykuo18/vlm-ssm-vision-encoders

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

📝 Summary:
TAPESTRY generates high-fidelity 360-degree turntable videos conditioned on 3D geometry, enabling consistent texture synthesis and neural rendering for complete 3D asset creation. AI-generated summary...

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17735
• PDF: https://arxiv.org/pdf/2603.17735
• Project Page: https://zerone182.github.io/TAPESTRY/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

📝 Summary:
This paper reformulates efficient LLM reasoning as a lossy compression problem using the Conditional Information Bottleneck. This models reasoning as a computational bridge containing only essential information, maximizing task reward while compressing completions. The method prunes cognitive blo...

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08462
• PDF: https://arxiv.org/pdf/2603.08462

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Probing Cultural Signals in Large Language Models through Author Profiling

📝 Summary:
Large language models exhibit systematic cultural biases when performing author profiling from song lyrics, with varying degrees of ethnic alignment across different models. AI-generated summary Large...

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16749
• PDF: https://arxiv.org/pdf/2603.16749
• Github: https://github.com/ValentinLafargue/CulturalProbingLLM

Datasets citing this paper:
https://huggingface.co/datasets/ValentinLAFARGUE/AuthorProfilingResults

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination

📝 Summary:
ReLi3D is a unified pipeline that reconstructs 3D geometry, materials, and illumination from multi-view images. It uses a transformer and two-path prediction to disentangle these elements, enabling near-instantaneous generation of relightable 3D assets.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19753
• PDF: https://arxiv.org/pdf/2603.19753
• Project Page: https://reli3d.jdihlmann.com/
• Github: https://github.com/Stability-AI/ReLi3D

🔹 Models citing this paper:
https://huggingface.co/StabilityLabs/ReLi3D

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
DROID-SLAM in the Wild

📝 Summary:
A real-time RGB SLAM system handles dynamic and cluttered environments. It estimates per-pixel uncertainty from multi-view visual features via differentiable bundle adjustment. This enables state-of-the-art performance at real-time speeds.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19076
• PDF: https://arxiv.org/pdf/2603.19076
• Project Page: https://moyangli00.github.io/droid-w/
• Github: https://github.com/MoyangLi00/DROID-W

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency Optimization

📝 Summary:
R e d u c i n g l a t e n c y a n d e n e r g y c o n s u m p t i o n i s c r i t i c a l t o i m p r o v i n g t h e e f f i c i e n c y o f m e m o r y s y s t e m s i n m o d e r n c o m p u t i n ...

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17309
• PDF: https://arxiv.org/pdf/2603.17309
• Project Page: https://github.com/Chirag-Sai-Panuganti/ReLMXEL
• Github: https://github.com/Chirag-Sai-Panuganti/ReLMXEL

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1