ML Research Hub – Telegram

ML Research Hub

32.9K subscribers

5.32K photos

332 videos

24 files

5.75K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.9K subscribers

ML Research Hub

✨Learning Image-based Tree Crown Segmentation from Enhanced Lidar-based Pseudo-labels

📝 Summary:
This study trains deep learning models to segment individual tree crowns from aerial imagery. It uses enhanced pseudo-labels derived from ALS data, improved by SAM 2, to eliminate manual annotation. This method produces superior, domain-specific segmentation models.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13022
• PDF: https://arxiv.org/pdf/2602.13022

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DeepLearning #ImageSegmentation #RemoteSensing #Forestry #ComputerVision

312 views09:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

📝 Summary:
Favia is an agent-based framework that identifies vulnerability-fixing commits by combining scalable ranking with deep semantic reasoning via LLM agents. It uses specialized tools and environmental context to robustly identify complex fixes, outperforming existing methods and achieving better pre...

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12500
• PDF: https://arxiv.org/pdf/2602.12500
• Github: https://github.com/andstor/agentic-security-patch-classification-replication-package

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Cybersecurity #LLMAgents #VulnerabilityManagement #SoftwareSecurity #AIResearch

287 views14:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching

📝 Summary:
UniDFlow is a unified discrete flow-matching framework for multimodal understanding, generation, and editing. It decouples understanding and generation via low-rank adapters and improves tasks with reference-based alignment without retraining. This achieves SOTA performance and strong zero-shot g...

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12221
• PDF: https://arxiv.org/pdf/2602.12221
• Project Page: https://plan-lab.github.io/unidflow

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #GenerativeAI #FlowMatching #MachineLearning #DeepLearning

225 views16:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

📝 Summary:
SQuTR is a new robustness benchmark for spoken query to text retrieval. It uses 37k diverse queries, real speaker profiles, and 17 noise categories to test systems. Experiments show all systems struggle under extreme noise, making robustness a key bottleneck.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12783
• PDF: https://arxiv.org/pdf/2602.12783
• Github: https://github.com/ttoyekk1a/SQuTR-Spoken-Query-to-Text-Retrieval

✨ Datasets citing this paper:
• https://huggingface.co/datasets/SLLMCommunity/SQuTR

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SQTR #Robustness #NLP #SpeechRecognition #Benchmarking

👍1

222 views17:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OpenLID-v3: Improving the Precision of Closely Related Language Identification -- An Experience Report

📝 Summary:
OpenLIDv3 improves language identification for closely related and low resource languages. It uses enhanced training data, cluster merging, and noise detection. This significantly boosts precision over prior tools.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13139
• PDF: https://arxiv.org/pdf/2602.13139
• Project Page: https://huggingface.co/HPLT/OpenLID-v3
• Github: https://github.com/hplt-project/openlid

🔹 Models citing this paper:
• https://huggingface.co/HPLT/OpenLID-v3

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LanguageIdentification #NLP #LowResourceLanguages #MachineLearning #AIResearch

👍1

216 views18:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

📝 Summary:
This survey explores self evolving AI agents that adapt to dynamic environments through automatic enhancement using interaction data and feedback. It provides a unified framework, reviews techniques, and discusses safety and ethics, aiming to advance adaptive lifelong agentic systems.

🔹 Publication Date: Published on Aug 10, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07407
• PDF: https://arxiv.org/pdf/2508.07407
• Project Page: https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents
• Github: https://github.com/EvoAgentX/Awesome-Self-Evolving-Agents

✨ Spaces citing this paper:
• https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AIAgents #SelfEvolvingAI #FoundationModels #LifelongLearning #AIResearch

209 views19:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SemanticMoments: Training-Free Motion Similarity via Third Moment Features

📝 Summary:
Existing video models struggle with semantic motion often biased towards appearance. SemanticMoments addresses this with a training-free method using temporal statistics on semantic features to consistently outperform other approaches for motion-centric video understanding.

🔹 Publication Date: Published on Feb 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.09146
• PDF: https://arxiv.org/pdf/2602.09146
• Project Page: https://x.com/HubermanSaar/status/2023485404280672498?s=20

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SemanticMoments #VideoUnderstanding #ComputerVision #MachineLearning #MotionAnalysis

❤1

222 views20:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation

📝 Summary:
This paper introduces a novel framework for generating high-quality synthetic data for LLMs in recommender systems. This synthetic data significantly outperforms real data and enables the first robust power-law scaling for LLMs in recommendation, allowing for predictable capability development.

🔹 Publication Date: Published on Feb 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.07298
• PDF: https://arxiv.org/pdf/2602.07298

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

183 views23:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

📝 Summary:
scPilot enables large language models to directly analyze single-cell RNA-seq data through omics-native reasoning. This framework improves accuracy in cell-type annotation and developmental trajectory reconstruction via step-by-step reasoning, providing auditable and interpretable analyses.

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.11609
• PDF: https://arxiv.org/pdf/2602.11609
• Github: https://github.com/maitrix-org/scPilot

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

149 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Steer2Edit: From Activation Steering to Component-Level Editing

📝 Summary:
Steer2Edit transforms LLM steering signals into training-free, component-level weight edits. This method precisely targets attention heads and MLP neurons, improving safety, truthfulness, and efficiency with better attribute-utility trade-offs than global steering.

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.09870
• PDF: https://arxiv.org/pdf/2602.09870

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

142 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Qute: Towards Quantum-Native Database

📝 Summary:
This paper envisions a quantum database (Qute) that treats quantum computation as a first-class execution option. Unlike prior simulation-based methods that either run quantum algorithms on classical ...

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14699
• PDF: https://arxiv.org/pdf/2602.14699
• Github: https://github.com/weAIDB/Qute

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

99 views04:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

📝 Summary:
REDSearcher presents a unified framework for optimizing search agents through improved task synthesis, tool-augmented queries, midtraining capability enhancement, and simulated environments to address...

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14234
• PDF: https://arxiv.org/pdf/2602.14234
• Project Page: https://redsearchagent.github.io/index/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

71 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings

📝 Summary:
A reasoning-driven universal multimodal embedding framework integrates embedder-guided reinforcement learning with traceability chain-of-thought to enhance cross-modal semantic consistency and retriev...

🔹 Publication Date: Published on Feb 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13823
• PDF: https://arxiv.org/pdf/2602.13823
• Github: https://github.com/ZoengHN/Embed-RL

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

93 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨UniWeTok: An Unified Binary Tokenizer with Codebook Size 2^{128} for Unified Multimodal Large Language Model

📝 Summary:
UniWeTok introduces a unified discrete tokenizer with a massive binary codebook and novel training techniques to achieve superior performance in image generation and multimodal tasks while reducing co...

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14178
• PDF: https://arxiv.org/pdf/2602.14178
• Github: https://github.com/shallowdream204/BitDance

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

71 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models

📝 Summary:
LaViDa-R1 is a multimodal reasoning diffusion language model that unifies supervised fine-tuning and multi-task reinforcement learning with novel training techniques for enhanced performance across vi...

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14147
• PDF: https://arxiv.org/pdf/2602.14147

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

69 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

📝 Summary:
A new benchmark called BrowseComp-V3 challenges multimodal large language models with complex, multi-hop reasoning tasks requiring deep search across text and visual modalities, revealing significant ...

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12876
• PDF: https://arxiv.org/pdf/2602.12876

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

76 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FireRed-Image-Edit-1.0 Techinical Report

📝 Summary:
FireRed-Image-Edit uses a diffusion transformer with optimized data curation and training methods to achieve state-of-the-art performance in instruction-based image editing, supported by a comprehensi...

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13344
• PDF: https://arxiv.org/pdf/2602.13344
• Project Page: https://huggingface.co/spaces/FireRedTeam/FireRed-Image-Edit-1.0
• Github: https://github.com/FireRedTeam/FireRed-Image-Edit

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

80 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AIDev: Studying AI Coding Agents on GitHub

📝 Summary:
AIDev is a large-scale dataset of agent-authored pull requests from real-world GitHub repositories that captures AI coding agent usage in practical software development scenarios. AI-generated summary...

🔹 Publication Date: Published on Feb 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.15003
• PDF: https://arxiv.org/pdf/2602.09185
• Project Page: https://huggingface.co/datasets/hao-li/AIDev
• Github: https://huggingface.co/papers?q=GitHub%20repositories

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

104 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't)

📝 Summary:
Targeted instruction selection for LLM fine-tuning can be improved by systematically analyzing data representation and selection algorithms, with gradient-based representations and greedy round-robin ...

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14696
• PDF: https://arxiv.org/pdf/2602.14696
• Github: https://github.com/dcml-lab/targeted-instruction-selection

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

150 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

111 views05:02

ML Research Hub

✨BitDance: Scaling Autoregressive Generative Models with Binary Tokens

📝 Summary:
BitDance is a scalable autoregressive image generator using binary visual tokens and a binary diffusion head. It introduces next-patch diffusion for parallel token prediction, significantly improving inference speed and achieving state-of-the-art performance with fewer parameters.

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14041
• PDF: https://arxiv.org/pdf/2602.14041
• Github: https://github.com/shallowdream204/BitDance

🔹 Models citing this paper:
• https://huggingface.co/shallowdream204/BitDance-14B-16x
• https://huggingface.co/shallowdream204/BitDance-14B-64x
• https://huggingface.co/shallowdream204/BitDance-ImageNet

✨ Spaces citing this paper:
• https://huggingface.co/spaces/shallowdream204/BitDance-14B-64x

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token...

114 views05:02

✨ Explore Data Science 📝 Write your paper