ML Research Hub
32.6K subscribers
3.39K photos
133 videos
23 files
3.62K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho
Download Telegram
ReCode: Unify Plan and Action for Universal Granularity Control

📝 Summary:
ReCode unifies planning and action in LLM agents via recursive code generation. It treats plans as abstract functions recursively decomposed into primitive actions, enabling dynamic decision granularity. This significantly improves performance and data efficiency.

🔹 Publication Date: Published on Oct 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.23564
• PDF: https://arxiv.org/pdf/2510.23564
• Github: https://github.com/FoundationAgents/ReCode

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMAgents #AI #CodeGeneration #Planning #GranularityControl
LongCat-Video Technical Report

📝 Summary:
LongCat-Video is a 13.6B Diffusion Transformer model excelling in efficient, high-quality long video generation. It uses a unified architecture for tasks like Text-to-Video and coarse-to-fine generation for efficiency. This model is a significant step toward developing world models.

🔹 Publication Date: Published on Oct 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22200
• PDF: https://arxiv.org/pdf/2510.22200
• Github: https://github.com/meituan-longcat/LongCat-Video

🔹 Models citing this paper:
https://huggingface.co/meituan-longcat/LongCat-Video

Spaces citing this paper:
https://huggingface.co/spaces/multimodalart/LongCat-Video
https://huggingface.co/spaces/rahul7star/LongCat-Video
https://huggingface.co/spaces/armaishere/meituan-longcat-LongCat-Video

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #Transformers #AI #TextToVideo
RAG-Anything: All-in-One RAG Framework

📝 Summary:
RAG-Anything is a unified framework extending RAG to all modalities, not just text. It integrates cross-modal relationships and semantic matching via dual-graph construction and hybrid retrieval. This significantly improves performance on complex multimodal benchmarks.

🔹 Publication Date: Published on Oct 14

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/rag-anything-all-in-one-rag-framework
• PDF: https://arxiv.org/pdf/2510.12323
• Github: https://github.com/HKUDS/RAG-Anything

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RAG #MultimodalAI #MachineLearning #InformationRetrieval #GraphAI
PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold

📝 Summary:
PokeeResearch-7B is a 7B-parameter deep research agent achieving state-of-the-art results using Reinforcement Learning from AI Feedback RLAIF. Its chain-of-thought reasoning scaffold enhances robustness and alignment, producing an efficient, resilient, and research-grade AI.

🔹 Publication Date: Published on Oct 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15862
• PDF: https://arxiv.org/pdf/2510.15862
• Github: https://github.com/Pokee-AI/PokeeResearchOSS

🔹 Models citing this paper:
https://huggingface.co/PokeeAI/pokee_research_7b
https://huggingface.co/Mungert/pokee_research_7b-GGUF

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #ReinforcementLearning #LLM #MachineLearning #AIResearch
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

📝 Summary:
FAPO improves LLM reasoning by penalizing flawed-positive rollouts, which are unreliable reasoning patterns. This secures early gains while shifting optimization toward reliable reasoning later, enhancing correctness and stability.

🔹 Publication Date: Published on Oct 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22543
• PDF: https://arxiv.org/pdf/2510.22543
• Project Page: https://fapo-rl.github.io/
• Github: https://fapo-rl.github.io

🔹 Models citing this paper:
https://huggingface.co/dyyyyyyyy/FAPO-32B
https://huggingface.co/dyyyyyyyy/FAPO-GenRM-4B

Datasets citing this paper:
https://huggingface.co/datasets/dyyyyyyyy/FAPO-Critic
https://huggingface.co/datasets/dyyyyyyyy/FAPO-Reasoning-Dataset

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AI #ReinforcementLearning #DeepLearning #Reasoning
The Unreasonable Effectiveness of Scaling Agents for Computer Use

📝 Summary:
Behavior Best-of-N bBoN improves computer-use agent reliability by generating multiple rollouts and selecting them via behavior narratives. This method achieves state-of-the-art performance on OSWorld and generalizes across operating systems, demonstrating effective CUA scaling.

🔹 Publication Date: Published on Oct 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.02250
• PDF: https://arxiv.org/pdf/2510.02250
• Project Page: https://www.simular.ai/articles/agent-s3
• Github: https://github.com/simular-ai/Agent-S

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIAgents #AIScaling #OperatingSystems #BehavioralAI #AIResearch
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

📝 Summary:
Agent S2 is a compositional framework for computer use agents that delegates tasks across generalist and specialist models. Using Mixture-of-Grounding and Proactive Hierarchical Planning, it achieves state-of-the-art performance on diverse benchmarks and operating systems.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.00906
• PDF: https://arxiv.org/pdf/2504.00906
• Project Page: https://www.simular.ai/articles/agent-s2-technical-review
• Github: https://github.com/simular-ai/Agent-S

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIAgents #MachineLearning #AI #GeneralistSpecialist #AutonomousSystems
1
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

📝 Summary:
Pico-Banana-400K is a new 400K-image dataset for text-guided image editing, built from real photos. It offers diverse edit types, high quality, and specialized subsets for multi-turn, preference-based, and long-short instruction editing, enabling comprehensive model development.

🔹 Publication Date: Published on Oct 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19808
• PDF: https://arxiv.org/pdf/2510.19808
• Github: https://github.com/apple/pico-banana-400k

🔹 Models citing this paper:
https://huggingface.co/eigen-ai-labs/eigen-banana-qwen-image-edit

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageEditing #TextGuidedEditing #Dataset #ComputerVision #AI
MIRIX: Multi-Agent Memory System for LLM-Based Agents

📝 Summary:
MIRIX is a modular multi-agent memory system for LLM-based agents that integrates diverse memory types and a dynamic framework. It significantly enhances memory capabilities for multimodal and long-form conversations. MIRIX achieves superior performance on challenging benchmarks, outperforming ex...

🔹 Publication Date: Published on Jul 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.07957
• PDF: https://arxiv.org/pdf/2507.07957
• Project Page: https://mirix.io/
• Github: https://github.com/Mirix-AI/MIRIX

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #MultiAgentSystems #AISystems #MemorySystems #AI
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

📝 Summary:
C2C enables direct semantic communication between LLMs by projecting and fusing their KV-caches, overcoming text-based communication limits. This method preserves rich semantics, improving accuracy by 3-5% and achieving a 2x speedup over traditional text communication.

🔹 Publication Date: Published on Oct 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.03215
• PDF: https://arxiv.org/pdf/2510.03215
• Project Page: https://fuvty.github.io/C2C_Project_Page/
• Github: https://github.com/thu-nics/C2C

🔹 Models citing this paper:
https://huggingface.co/nics-efc/C2C_Fuser

Spaces citing this paper:
https://huggingface.co/spaces/fuvty/C2C_demo
https://huggingface.co/spaces/nics-efc/C2C_demo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #SemanticCommunication #AI #DeepLearning #NLP
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

📝 Summary:
Skyfall-GS synthesizes large-scale, explorable 3D urban scenes by combining satellite imagery for geometry and diffusion models for realistic textures. This framework offers improved cross-view consistent geometry and photorealistic appearances without needing costly 3D annotations.

🔹 Publication Date: Published on Oct 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15869
• PDF: https://arxiv.org/pdf/2510.15869
• Project Page: https://skyfall-gs.jayinnn.dev/
• Github: https://github.com/jayin92/skyfall-gs

🔹 Models citing this paper:
https://huggingface.co/jayinnn/Skyfall-GS-ply

Datasets citing this paper:
https://huggingface.co/datasets/jayinnn/Skyfall-GS-eval
https://huggingface.co/datasets/jayinnn/Skyfall-GS-datasets

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #SatelliteImagery #DiffusionModels #UrbanModeling
This media is not supported in your browser
VIEW IN TELEGRAM
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

📝 Summary:
Easy Dataset is a framework that synthesizes LLM fine-tuning data from unstructured documents using a GUI and LLMs. It generates domain-specific question-answer pairs with human oversight. This improves LLM performance in specific domains while retaining general knowledge.

🔹 Publication Date: Published on Jul 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.04009
• PDF: https://arxiv.org/pdf/2507.04009
• Github: https://github.com/ConardLi/easy-dataset

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #DataSynthesis #FineTuning #AI #NLP
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

📝 Summary:
InternVL3 introduces a native multimodal pre-training paradigm, jointly learning from multimodal and text data to overcome conventional alignment challenges. This unified approach, combined with advanced techniques, achieves state-of-the-art performance on multimodal tasks, rivaling proprietary m...

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.10479
• PDF: https://arxiv.org/pdf/2504.10479
• Project Page: https://internvl.github.io/blog/2025-04-11-InternVL-3.0/

🔹 Models citing this paper:
https://huggingface.co/OpenGVLab/InternVL3-78B
https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B
https://huggingface.co/OpenGVLab/InternVL3-8B

Datasets citing this paper:
https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2-prompts

Spaces citing this paper:
https://huggingface.co/spaces/AntResearchNLP/ViLaBench
https://huggingface.co/spaces/TIGER-Lab/MEGA-Bench
https://huggingface.co/spaces/prithivMLmods/Tiny-VLMs-Lab

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #DeepLearning #AIResearch #OpenSourceAI #GenerativeAI
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

📝 Summary:
RLinf-VLA is a unified framework for scalable reinforcement learning training of vision-language-action models, overcoming supervised fine-tuning limitations. It offers a 1.6x-1.8x speedup, supports diverse architectures and algorithms, and shows strong generalization in simulation and on a real ...

🔹 Publication Date: Published on Oct 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.06710
• PDF: https://arxiv.org/pdf/2510.06710
• Project Page: https://rlinf.readthedocs.io/en/latest/
• Github: https://github.com/RLinf/RLinf

🔹 Models citing this paper:
https://huggingface.co/RLinf/RLinf-math-7B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #VLA #Robotics #AIResearch #MachineLearning
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation

📝 Summary:
ChronoEdit ensures physical consistency in image editing by reframing it as a video generation problem. It uses pretrained video models and temporal reasoning tokens to imagine plausible physical transformations between edited images. This approach significantly improves realism and visual fideli...

🔹 Publication Date: Published on Oct 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.04290
• PDF: https://arxiv.org/pdf/2510.04290
• Project Page: https://research.nvidia.com/labs/toronto-ai/chronoedit
• Github: https://github.com/nv-tlabs/ChronoEdit

🔹 Models citing this paper:
https://huggingface.co/nvidia/ChronoEdit-14B-Diffusers
https://huggingface.co/vantagewithai/ChronoEdit-GGUF
https://huggingface.co/vantagewithai/ChronoEdit-fp8-scaled

Spaces citing this paper:
https://huggingface.co/spaces/nvidia/ChronoEdit
https://huggingface.co/spaces/JarlJarle/nvidia-ChronoEdit-14B-Diffusers

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageEditing #VideoGeneration #TemporalReasoning #ComputerVision #AIResearch