ML Research Hub
32.4K subscribers
6.16K photos
407 videos
24 files
6.68K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

📝 Summary:
CroBo is a visual state representation framework that learns what-is-where composition for robotics. It uses global-to-local reconstruction to encode scene element identities and spatial locations in a compact token. This enables tracking scene dynamics for sequential decision making.

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13904
• PDF: https://arxiv.org/pdf/2603.13904
• Project Page: https://seokminlee-chris.github.io/CroBo-ProjectPage/
• Github: https://github.com/SeokminLee-Chris/CroBo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #ComputerVision #SceneUnderstanding #AI #StateRepresentation
1
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

📝 Summary:
VFIG is a vision-language model that converts raster images into scalable vector graphics SVG. It employs a 66K dataset and hierarchical training for high-fidelity conversion, outperforming open-source models and matching proprietary ones.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2603.24575
• PDF: https://arxiv.org/pdf/2603.24575
• Project Page: https://vfig-proj.github.io/
• Github: https://github.com/RAIVNLab/VFig

🔹 Models citing this paper:
https://huggingface.co/XunmeiLiu/VFIG-4B

Spaces citing this paper:
https://huggingface.co/spaces/allenai/VFig-Image2SVG-Demo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #SVG #VectorGraphics #AI #ComputerVision
Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

📝 Summary:
ScratchMath introduces a benchmark for analyzing errors in student handwritten math. It reveals MLLMs significantly lag human experts in visual and logical reasoning, but proprietary models show potential for error explanation.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24961
• PDF: https://arxiv.org/pdf/2603.24961
• Project Page: https://bbsngg.github.io/ScratchMath/
• Github: https://github.com/ai-for-edu/ScratchMath

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

📝 Summary:
Language models typically give one answer, but many tasks have multiple solutions. This paper presents multi-answer RL, allowing LMs to generate multiple plausible answers with confidence in a single pass, improving diversity, accuracy, and computational efficiency.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24844
• PDF: https://arxiv.org/pdf/2603.24844
• Project Page: https://multi-answer-rl.github.io/
• Github: https://github.com/ishapuri/multi_answer_rl

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AVO: Agentic Variation Operators for Autonomous Evolutionary Search

📝 Summary:
Agentic variation operators enable autonomous discovery of performance-critical micro-architectural optimizations for attention kernels, outperforming state-of-the-art implementations on advanced GPU ...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24517
• PDF: https://arxiv.org/pdf/2603.24517

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
WAFT-Stereo: Warping-Alone Field Transforms for Stereo Matching

📝 Summary:
WAFT-Stereo achieves state-of-the-art stereo matching performance by replacing cost volumes with warping techniques, demonstrating superior efficiency and accuracy on major benchmarks. AI-generated su...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24836
• PDF: https://arxiv.org/pdf/2603.24836
• Github: https://github.com/princeton-vl/WAFT-Stereo

🔹 Models citing this paper:
https://huggingface.co/MemorySlices/WAFT-Stereo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading

📝 Summary:
QuantAgent is a multi-agent LLM framework for high-frequency trading. It uses specialized agents for indicators, patterns, trends, and risk to make rapid decisions. It outperforms existing neural and rule-based systems in accuracy and returns.

🔹 Publication Date: Published on Sep 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.09995
• PDF: https://arxiv.org/pdf/2509.09995
• Project Page: https://Y-Research-SBU.github.io/QuantAgent/
• Github: https://github.com/Y-Research-SBU/QuantAgent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #MultiAgent #HighFrequencyTrading #FinTech #AlgorithmicTrading
2
🚀 Master Data Science & Programming!

Unlock your potential with this curated list of Telegram channels. Whether you need books, datasets, interview prep, or project ideas, we have the perfect resource for you. Join the community today!


🔰 Machine Learning with Python
Learn Machine Learning with hands-on Python tutorials, real-world code examples, and clear explanations for researchers and developers.
https://t.iss.one/CodeProgrammer

🔖 Machine Learning
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.
https://t.iss.one/DataScienceM

🧠 Code With Python
This channel delivers clear, practical content for developers, covering Python, Django, Data Structures, Algorithms, and DSA – perfect for learning, coding, and mastering key programming skills.
https://t.iss.one/DataScience4

🎯 PyData Careers | Quiz
Python Data Science jobs, interview tips, and career insights for aspiring professionals.
https://t.iss.one/DataScienceQ

💾 Kaggle Data Hub
Your go-to hub for Kaggle datasets – explore, analyze, and leverage data for Machine Learning and Data Science projects.
https://t.iss.one/datasets1

🧑‍🎓 Udemy Coupons | Courses
The first channel in Telegram that offers free Udemy coupons
https://t.iss.one/DataScienceC

😀 ML Research Hub
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.
https://t.iss.one/DataScienceT

💬 Data Science Chat
An active community group for discussing data challenges and networking with peers.
https://t.iss.one/DataScience9

🐍 Python Arab| بايثون عربي
The largest Arabic-speaking group for Python developers to share knowledge and help.
https://t.iss.one/PythonArab

🖊 Data Science Jupyter Notebooks
Explore the world of Data Science through Jupyter Notebooks—insights, tutorials, and tools to boost your data journey. Code, analyze, and visualize smarter with every post.
https://t.iss.one/DataScienceN

📺 Free Online Courses | Videos
Free online courses covering data science, machine learning, analytics, programming, and essential skills for learners.
https://t.iss.one/DataScienceV

📈 Data Analytics
Dive into the world of Data Analytics – uncover insights, explore trends, and master data-driven decision making.
https://t.iss.one/DataAnalyticsX

🎧 Learn Python Hub
Master Python with step-by-step courses – from basics to advanced projects and practical applications.
https://t.iss.one/Python53

⭐️ Research Papers
Professional Academic Writing & Simulation Services
https://t.iss.one/DataScienceY

━━━━━━━━━━━━━━━━━━
Admin: @HusseinSheikho
Please open Telegram to view this post
VIEW IN TELEGRAM
1
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

📝 Summary:
ShotStream enables real-time interactive multi-shot video generation via a novel causal architecture. It uses dual-cache memory for visual consistency and two-stage distillation to reduce latency and error. This achieves high-quality, coherent videos at 16 FPS, paving the way for dynamic storytel...

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25746
• PDF: https://arxiv.org/pdf/2603.25746
• Project Page: https://luo0207.github.io/ShotStream/
• Github: https://github.com/KlingAIResearch/ShotStream

🔹 Models citing this paper:
https://huggingface.co/KlingTeam/ShotStream

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #RealTimeAI #DeepLearning #AIStorytelling
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

📝 Summary:
Hybrid Memory improves video world models by consistently tracking dynamic subjects during occlusion. It combines static background archiving with active dynamic subject tracking. This ensures motion continuity and outperforms existing methods in generation quality.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25716
• PDF: https://arxiv.org/pdf/2603.25716
• Project Page: https://kj-chen666.github.io/Hybrid-Memory-in-Video-World-Models/
• Github: https://github.com/H-EmbodVis/HyDRA

🔹 Models citing this paper:
https://huggingface.co/H-EmbodVis/HyDRA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoWorldModels #ComputerVision #AI #MachineLearning #GenerativeAI
This media is not supported in your browser
VIEW IN TELEGRAM
Know3D: Prompting 3D Generation with Knowledge from Vision-Language Models

📝 Summary:
Know3D integrates vision-language models into 3D generation via latent hidden-state injection. This enables language-controlled synthesis of unseen back-views, transforming stochastic hallucination into a semantically guided process for 3D assets.

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22782
• PDF: https://arxiv.org/pdf/2603.22782
• Project Page: https://xishuxishu.github.io/Know3D.github.io/
• Github: https://github.com/xishuxishu/Know3D

Spaces citing this paper:
https://huggingface.co/spaces/xishushu/Know3D

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGeneration #VisionLanguageModels #GenerativeAI #DeepLearning #AIResearch
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

📝 Summary:
Full-duplex speech models need high-quality multi-speaker conversational data, which is scarce and difficult to process due to natural dialogue dynamics. This paper introduces Sommelier, a robust, scalable, open-source data processing pipeline to address this data bottleneck.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25750
• PDF: https://arxiv.org/pdf/2603.25750
• Project Page: https://kyudan1.github.io/sommelier.github.io//

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeechAI #AudioProcessing #DataProcessing #OpenSource #NLP
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

📝 Summary:
Trace2Skill generates transferable LLM agent skills by analyzing diverse execution traces in parallel and consolidating them via inductive reasoning. This framework significantly improves performance, transfers across LLM scales, and generalizes to new settings without model updates.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25158
• PDF: https://arxiv.org/pdf/2603.25158

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AgentAI #TransferLearning #MachineLearning #AIResearch
Media is too big
VIEW IN TELEGRAM
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

📝 Summary:
PackForcing enables efficient, long-video generation via hierarchical KV-cache management and spatiotemporal compression, overcoming memory and consistency issues. It generates 2-minute coherent videos on a single GPU, demonstrating that short-video training suffices for high-quality long-video s...

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25730
• PDF: https://arxiv.org/pdf/2603.25730
• Github: https://github.com/ShandaAI/PackForcing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #ModelEfficiency #LongContext
Diffutron: A Masked Diffusion Language Model for Turkish Language

📝 Summary:
Diffutron introduces a compact masked diffusion language model for Turkish. It uses resource-efficient LoRA-based pre-training and progressive instruction tuning. The model achieves competitive performance for non-autoregressive Turkish text generation despite its small size.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20466
• PDF: https://arxiv.org/pdf/2603.20466

🔹 Models citing this paper:
https://huggingface.co/diffutron/DiffutronLM-0.3B-Instruct
https://huggingface.co/diffutron/DiffutronLM-0.3B-Base
https://huggingface.co/diffutron/DiffutronLM-0.3B-1st-Stage

Datasets citing this paper:
https://huggingface.co/datasets/diffutron/DiffutronLM-Pretraining-Corpus

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LanguageModels #TurkishNLP #DiffusionModels #NLP #AI
MedOpenClaw: Auditable Medical Imaging Agents Reasoning over Uncurated Full Studies

📝 Summary:
MEDOPENCLAW and MEDFLOWBENCH enable evaluating medical VLMs in interactive 3D environments, unlike static 2D images. Surprisingly, top VLMs struggle with professional tools due to poor spatial grounding. This work highlights a critical gap for auditable, full-study medical agents.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24649
• PDF: https://arxiv.org/pdf/2603.24649
• Project Page: https://jakobshen.github.io/MedOpenClaw

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MedicalAI #VLMs #MedicalImaging #AuditableAI #3DImaging
This media is not supported in your browser
VIEW IN TELEGRAM
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset

📝 Summary:
This paper introduces KITScenes LongTail, a new dataset for long-tail driving events. It offers multi-view video, trajectories, and multilingual expert reasoning traces. This resource improves few-shot generalization and evaluates multimodal models instruction following capabilities.

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23607
• PDF: https://arxiv.org/pdf/2603.23607
• Project Page: https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail

Datasets citing this paper:
https://huggingface.co/datasets/KIT-MRT/KITScenes-LongTail

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AutonomousDriving #ComputerVision #Datasets #LongTailLearning #MultimodalAI
Natural-Language Agent Harnesses

📝 Summary:
Natural-Language Agent Harnesses NLAHs and Intelligent Harness Runtime IHR enable portable, executable agent harness design through natural language. This externalizes control logic from code, making harnesses easier to transfer, compare, and study.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25723
• PDF: https://arxiv.org/pdf/2603.25723

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#NaturalLanguageProcessing #AI #AIAgents #SoftwareEngineering #CodePortability
RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

📝 Summary:
RealChart2Code is a new benchmark assessing VLM ability to generate complex, multi-panel charts from real data. It reveals significant performance gaps between proprietary and open-weight models, highlighting VLM struggles with intricate plots.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25804
• PDF: https://arxiv.org/pdf/2603.25804
• Project Page: https://huggingface.co/datasets/zjj1233/RealChart2Code
• Github: https://github.com/Speakn0w/RealChart2Code

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLM #ChartToCode #Benchmark #AI #DataScience
Please open Telegram to view this post
VIEW IN TELEGRAM
2
GenMask: Adapting DiT for Segmentation via Direct Mask

📝 Summary:
GenMask directly trains a DiT for joint image generation and segmentation using a novel timestep sampling strategy. This strategy emphasizes extreme noise for masks, enabling harmonious training. It outperforms indirect adaptation, simplifying the workflow.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23906
• PDF: https://arxiv.org/pdf/2603.23906

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Segmentation #ImageGeneration #DiT #DeepLearning #ComputerVision
1