ML Research Hub
32.5K subscribers
6.03K photos
388 videos
24 files
6.53K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent Denoising

📝 Summary:
DreamPartGen generates 3D objects by modeling part geometry and appearance with Duplex Part Latents. It captures inter-part relationships using Relational Semantic Latents for improved text-shape alignment. A co-denoising process ensures consistency and achieves state-of-the-art results.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19216
• PDF: https://arxiv.org/pdf/2603.19216
• Project Page: https://plan-lab.github.io/dreampartgen

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGeneration #GenerativeAI #DeepLearning #ComputerVision #TextTo3D
1
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

📝 Summary:
F1 is a new VLA model integrating visual foresight generation into decision making using a Mixture-of-Transformer architecture. It predicts future visual states to guide actions, significantly improving task success and generalization in dynamic environments.

🔹 Publication Date: Published on Sep 8, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.06951
• PDF: https://arxiv.org/pdf/2509.06951
• Project Page: https://aopolin-lv.github.io/F1-VLA/
• Github: https://github.com/InternRobotics/F1-VLA

🔹 Models citing this paper:
https://huggingface.co/InternRobotics/F1-VLA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLA #AI #Robotics #Transformers #DeepLearning
A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

📝 Summary:
LLM agents struggle with long-horizon web navigation due to planning issues and sparse RL rewards. This paper proposes subgoal decomposition for online planning and MiRA, a milestone-based RL framework with dense rewards. This significantly boosts success rates for both proprietary and open LLM a...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19685
• PDF: https://arxiv.org/pdf/2603.19685

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMAgents #ReinforcementLearning #LargeLanguageModels #AIResearch #WebNavigation
This media is not supported in your browser
VIEW IN TELEGRAM
WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

📝 Summary:
This research investigates if 2D foundation image models inherently possess 3D world modeling capabilities. It proposes an agentic framework to leverage this, demonstrating that 2D models can synthesize expansive, consistent 3D worlds.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19708
• PDF: https://arxiv.org/pdf/2603.19708
• Project Page: https://ziyaerkoc.com/worldagents/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #ComputerVision #3DWorldModels #GenerativeAI #FoundationModels
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

📝 Summary:
LumosX enhances text-to-video generation by improving face-attribute alignment and subject consistency. It uses a new data pipeline to infer subject dependencies and Relational Attention mechanisms to explicitly link subjects with attributes, achieving state-of-the-art personalized multi-subject ...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20192
• PDF: https://arxiv.org/pdf/2603.20192
• Project Page: https://jiazheng-xing.github.io/lumosx-home/
• Github: https://github.com/alibaba-damo-academy/Lumos-Custom

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextToVideo #VideoGeneration #PersonalizedAI #ComputerVision #DeepLearning
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

📝 Summary:
Discrete Moment Matching Distillation (D-MMD) enables effective distillation of discrete diffusion models by adapting continuous-domain techniques, achieving superior performance compared to previous ...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20155
• PDF: https://arxiv.org/pdf/2603.20155

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
How Well Does Generative Recommendation Generalize?

📝 Summary:
Generative recommendation models excel at generalization tasks while item ID-based models perform better at memorization, with a complementary approach showing improved recommendation performance thro...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19809
• PDF: https://arxiv.org/pdf/2603.19809
• Github: https://github.com/Jamesding000/MemGen-GR

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Hyperagents

📝 Summary:
Hyperagents are self-referential AI systems integrating task and meta-agents into a single editable program. They enable metacognitive self-modification, improving their task-solving and their own improvement process for open-ended, self-accelerating progress across domains.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19461
• PDF: https://arxiv.org/pdf/2603.19461
• Github: https://github.com/facebookresearch/Hyperagents

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #Metacognition #SelfModifyingAI #AutonomousSystems #AGI
Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

📝 Summary:
XBridge combines LLMs with translation models to boost multilingual performance, especially for low-resource languages. It keeps the LLM as an English knowledge core, bridging model misalignment with lightweight mapping layers for semantic consistency without retraining the LLM.

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17512
• PDF: https://arxiv.org/pdf/2603.17512
• Github: https://github.com/ictnlp/XBridge

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #MultilingualAI #NLP #LowResourceLanguages #AIResearch
Teaching an Agent to Sketch One Part at a Time

📝 Summary:
Researchers developed an agent that generates vector sketches incrementally, one part at a time. It uses a multi-modal language model and process-reward reinforcement learning with a new part-annotated dataset. This enables controllable and editable text-to-vector sketch generation.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19500
• PDF: https://arxiv.org/pdf/2603.19500

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #GenerativeAI #MachineLearning #ComputerVision #ReinforcementLearning
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

📝 Summary:
HopChain is a framework that synthesizes multi-hop vision-language reasoning data to improve VLMs. This data features logically dependent reasoning chains, addressing VLMs' struggle with complex reasoning. Training with HopChain data significantly enhances generalizable VLM performance across div...

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17024
• PDF: https://arxiv.org/pdf/2603.17024

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLMs #DataSynthesis #MultiHopReasoning #AIResearch #ComputerVision
Media is too big
VIEW IN TELEGRAM
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

📝 Summary:
Astrolabe is an efficient online reinforcement learning framework for distilled autoregressive video models. It improves generation quality using a forward-process RL formulation and streaming training with a multi-reward objective, avoiding expensive re-distillation or reverse-process optimization.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17051
• PDF: https://arxiv.org/pdf/2603.17051
• Project Page: https://franklinz233.github.io/projects/astrolabe/
• Github: https://github.com/franklinz233/Astrolabe

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #VideoGeneration #DeepLearning #AI #ModelOptimization
BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection

📝 Summary:
BEAVER is a training-free framework that improves long-context LLM inference using structure-aware hierarchical selection and dense tensor mapping. It maintains semantic integrity, achieves comparable performance to SOTA methods, and significantly reduces latency by 26.4x on large contexts.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19635
• PDF: https://arxiv.org/pdf/2603.19635
• Project Page: https://cslikai.cn/BEAVER/
• Github: https://github.com/JusperLee/BEAVER

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AI #PromptEngineering #DeepLearning #ModelOptimization
AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

📝 Summary:
AgentDS benchmark evaluates AI agents and human-AI collaboration in domain-specific data science tasks, revealing continued necessity of human expertise despite advances in large language models and A...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19005
• PDF: https://arxiv.org/pdf/2603.19005
• Project Page: https://agentds.org/

Datasets citing this paper:
https://huggingface.co/datasets/lainmn/AgentDS-Insurance
https://huggingface.co/datasets/lainmn/AgentDS-RetailBanking
https://huggingface.co/datasets/lainmn/AgentDS-Manufacturing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
EgoForge: Goal-Directed Egocentric World Simulator

📝 Summary:
EgoForge is an egocentric goal-directed world simulator that generates coherent first-person video rollouts from minimal static inputs using trajectory-level reward-guided refinement during diffusion ...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20169
• PDF: https://arxiv.org/pdf/2603.20169
• Project Page: https://plan-lab.github.io/projects/egoforge

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow

📝 Summary:
FlowScene is a generative model that uses multimodal graph conditioning and rectified flow to create realistic, style-consistent indoor scenes. It offers fine-grained control over object shapes, textures, and relations, surpassing prior methods.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19598
• PDF: https://arxiv.org/pdf/2603.19598

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GenerativeAI #3DSceneGeneration #MultimodalAI #DeepLearning #ComputerGraphics
🎁 23 Years of SPOTO – Claim Your Free IT Certs Prep Kit!

🔥Whether you're preparing for #Python, #AI, #Cisco, #PMI, #Fortinet, #AWS, #Azure, #Excel, #comptia, #ITIL, #cloud or any other in-demand certification – SPOTO has got you covered!

Free Resources :
・Free Python, Excel, Cyber Security, Cisco, SQL, ITIL, PMP, AWS courses: https://bit.ly/4lk4m3c
・IT Certs E-book: https://bit.ly/4bdZOqt
・IT Exams Skill Test: https://bit.ly/4sDvi0b
・Free AI material and support tools: https://bit.ly/46TpsQ8
・Free Cloud Study Guide: https://bit.ly/4lk3dIS


👉 Become Part of Our IT Learning Circle! resources and support:
https://chat.whatsapp.com/Cnc5M5353oSBo3savBl397

💬 Want exam help? Chat with an admin now!
wa.link/rozuuw
TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

📝 Summary:
TerraScope is a new VLM for Earth Observation enabling pixel-grounded geospatial reasoning. It offers modality-flexible and multi-temporal capabilities, outperforming existing models on a new benchmark for accurate and interpretable results.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19039
• PDF: https://arxiv.org/pdf/2603.19039
• Project Page: https://shuyansy.github.io/terrascope/
• Github: https://github.com/shuyansy/Earth-Observation-VLMs

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#EarthObservation #VLM #Geospatial #RemoteSensing #ComputerVision
HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering

📝 Summary:
HiMu is a training-free framework for long video QA. It efficiently selects relevant frames using hierarchical query decomposition with lightweight multimodal experts, preserving temporal and cross-modal structure. HiMu advances the efficiency-accuracy Pareto front.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18558
• PDF: https://arxiv.org/pdf/2603.18558
• Project Page: https://danbenami.github.io/HiMu.io/
• Github: https://github.com/DanBenAmi/HiMu

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoQA #MultimodalAI #ComputerVision #MachineLearning #AI
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

📝 Summary:
V-JEPA 2 uses self-supervised learning on web videos and minimal robot data. It excels at video understanding, anticipation, Q&A, and zero-shot robotic planning. This approach yields a powerful world model for physical world planning.

🔹 Publication Date: Published on Jun 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/v-jepa-2-self-supervised-video-models-enable-understanding-prediction-and-planning
• PDF: https://arxiv.org/pdf/2506.09985
• Github: https://github.com/facebookresearch/vjepa2

Datasets citing this paper:
https://huggingface.co/datasets/ckadirt/vjxla

Spaces citing this paper:
https://huggingface.co/spaces/vselvarajijay/vjepa2-latent-prediction
https://huggingface.co/spaces/aavi21458/vjepa2-latent-prediction

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #SelfSupervisedLearning #VideoAI #Robotics #WorldModels