ML Research Hub
32.9K subscribers
5.32K photos
332 videos
24 files
5.75K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning

📝 Summary:
GeneralVLA is a hierarchical vision-language-action model that enables zero-shot robotic manipulation through knowledge-guided trajectory planning. It requires no real-world data collection and outperforms existing methods, also generating robust training data.

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04315
• PDF: https://arxiv.org/pdf/2602.04315
• Project Page: https://aigeeksgroup.github.io/GeneralVLA
• Github: https://aigeeksgroup.github.io/GeneralVLA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

📝 Summary:
Feature Activation Coverage measures data diversity in an interpretable feature space and enables diversity-driven data synthesis that improves downstream performance across multiple language model ar...

🔹 Publication Date: Published on Feb 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2602.10388
• PDF: https://arxiv.org/pdf/2602.10388
• Project Page: https://website-sigma-three-35.vercel.app/
• Github: https://github.com/Zhongzhi660/FAC-Synthesis

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis

📝 Summary:
Reinforcement learning (RL) with verifiable rewards has become a standard post-training stage for boosting visual reasoning in vision-language models, yet it remains unclear what capabilities RL actua...

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12395
• PDF: https://arxiv.org/pdf/2602.12395
• Project Page: https://github.com/tianyi-lab/Frankenstein
• Github: https://github.com/tianyi-lab/Frankenstein

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

📝 Summary:
Video Language Models (VideoLMs) empower AI systems to understand temporal dynamics in videos. To fit to the maximum context window constraint, current methods use keyframe sampling which can miss bot...

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13191
• PDF: https://arxiv.org/pdf/2602.13191
• Project Page: https://sayands.github.io/cope/
• Github: https://sayands.github.io/cope/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models

📝 Summary:
Bit-Plane Decomposition Quantization (BPDQ) improves low-bit quantization by using variable quantization grids derived from bit-planes and scalar coefficients, achieving better accuracy than tradition...

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04163
• PDF: https://arxiv.org/pdf/2602.04163

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

📝 Summary:
Reinforcement learning-based sim-real co-training framework improves vision-language-action policy performance through interactive simulation and real-world data anchoring. AI-generated summary Simula...

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12628
• PDF: https://arxiv.org/pdf/2602.12628

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

📝 Summary:
Diffusion large language models (dLLMs) for CUDA kernel generation achieve superior performance through a specialized dataset and reinforcement learning framework. AI-generated summary Diffusion large...

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.11715
• PDF: https://arxiv.org/pdf/2602.11715
• Project Page: https://deadlykitten4.github.io/DICE/
• Github: https://github.com/deadlykitten4/DICE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Limited Time Offer: Premium Q1 & Q2 Publications at Just $300!
🎓 Exclusive February Sale - Ending Soon!
Are you looking to boost your academic profile with high-impact publications? We're offering an exceptional opportunity you don't want to miss!
What We Offer:
Q1 & Q2 Journal Articles - Top-tier, indexed publications
Unbeatable Price: Only $300 per article
Limited Time: Offer valid until the end of February 2026
Why Choose Our Service?

Fast publication process
Reputable Q1 & Q2 journals
Expert support throughout
Guaranteed acceptance

@Omidyzd62
1
Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback

📝 Summary:
Self-EvolveRec improves recommender system design via a directional feedback loop. It uses a User Simulator for qualitative critiques and a Model Diagnosis Tool for quantitative verification, with adaptive evaluation. It outperforms existing methods.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12612
• PDF: https://arxiv.org/pdf/2602.12612
• Github: https://github.com/Sein-Kim/self_evolverec

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RecommenderSystems #LLM #MachineLearning #ArtificialIntelligence #DeepLearning
AI-Trader: Benchmarking Autonomous Agents in Real-Time Financial Markets

📝 Summary:
AI-Trader introduces the first fully automated live benchmark for evaluating LLM agents in financial decision-making. It reveals that general AI does not ensure trading success, with most agents showing poor returns and weak risk management. Risk control proves crucial, and liquid markets offer b...

🔹 Publication Date: Published on Dec 1, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10971
• PDF: https://arxiv.org/pdf/2512.10971
• Project Page: https://ai4trade.ai/
• Github: https://github.com/HKUDS/AI-Trader

Datasets citing this paper:
https://huggingface.co/datasets/T1anyu/AI-Trader

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #LLMAgents #FinTech #AlgorithmicTrading #FinancialAI
Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost

📝 Summary:
Quantized LLMs are difficult to fine-tune directly using existing methods. Quantized Evolution Strategies QES enables full-parameter fine-tuning of quantized LLMs. It uses error feedback and seed replay for high-precision optimization at low memory cost, outperforming prior methods.

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03120
• PDF: https://arxiv.org/pdf/2602.03120
• Github: https://github.com/dibbla/Quantized-Evolution-Strategies

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #Quantization #FineTuning #EvolutionStrategies #AI
Learning Image-based Tree Crown Segmentation from Enhanced Lidar-based Pseudo-labels

📝 Summary:
This study trains deep learning models to segment individual tree crowns from aerial imagery. It uses enhanced pseudo-labels derived from ALS data, improved by SAM 2, to eliminate manual annotation. This method produces superior, domain-specific segmentation models.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13022
• PDF: https://arxiv.org/pdf/2602.13022

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DeepLearning #ImageSegmentation #RemoteSensing #Forestry #ComputerVision
Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

📝 Summary:
Favia is an agent-based framework that identifies vulnerability-fixing commits by combining scalable ranking with deep semantic reasoning via LLM agents. It uses specialized tools and environmental context to robustly identify complex fixes, outperforming existing methods and achieving better pre...

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12500
• PDF: https://arxiv.org/pdf/2602.12500
• Github: https://github.com/andstor/agentic-security-patch-classification-replication-package

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Cybersecurity #LLMAgents #VulnerabilityManagement #SoftwareSecurity #AIResearch
Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching

📝 Summary:
UniDFlow is a unified discrete flow-matching framework for multimodal understanding, generation, and editing. It decouples understanding and generation via low-rank adapters and improves tasks with reference-based alignment without retraining. This achieves SOTA performance and strong zero-shot g...

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12221
• PDF: https://arxiv.org/pdf/2602.12221
• Project Page: https://plan-lab.github.io/unidflow

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #GenerativeAI #FlowMatching #MachineLearning #DeepLearning
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

📝 Summary:
SQuTR is a new robustness benchmark for spoken query to text retrieval. It uses 37k diverse queries, real speaker profiles, and 17 noise categories to test systems. Experiments show all systems struggle under extreme noise, making robustness a key bottleneck.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12783
• PDF: https://arxiv.org/pdf/2602.12783
• Github: https://github.com/ttoyekk1a/SQuTR-Spoken-Query-to-Text-Retrieval

Datasets citing this paper:
https://huggingface.co/datasets/SLLMCommunity/SQuTR

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SQTR #Robustness #NLP #SpeechRecognition #Benchmarking
👍1
OpenLID-v3: Improving the Precision of Closely Related Language Identification -- An Experience Report

📝 Summary:
OpenLIDv3 improves language identification for closely related and low resource languages. It uses enhanced training data, cluster merging, and noise detection. This significantly boosts precision over prior tools.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13139
• PDF: https://arxiv.org/pdf/2602.13139
• Project Page: https://huggingface.co/HPLT/OpenLID-v3
• Github: https://github.com/hplt-project/openlid

🔹 Models citing this paper:
https://huggingface.co/HPLT/OpenLID-v3

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LanguageIdentification #NLP #LowResourceLanguages #MachineLearning #AIResearch
👍1
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

📝 Summary:
This survey explores self evolving AI agents that adapt to dynamic environments through automatic enhancement using interaction data and feedback. It provides a unified framework, reviews techniques, and discusses safety and ethics, aiming to advance adaptive lifelong agentic systems.

🔹 Publication Date: Published on Aug 10, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07407
• PDF: https://arxiv.org/pdf/2508.07407
• Project Page: https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents
• Github: https://github.com/EvoAgentX/Awesome-Self-Evolving-Agents

Spaces citing this paper:
https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIAgents #SelfEvolvingAI #FoundationModels #LifelongLearning #AIResearch
SemanticMoments: Training-Free Motion Similarity via Third Moment Features

📝 Summary:
Existing video models struggle with semantic motion often biased towards appearance. SemanticMoments addresses this with a training-free method using temporal statistics on semantic features to consistently outperform other approaches for motion-centric video understanding.

🔹 Publication Date: Published on Feb 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.09146
• PDF: https://arxiv.org/pdf/2602.09146
• Project Page: https://x.com/HubermanSaar/status/2023485404280672498?s=20

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SemanticMoments #VideoUnderstanding #ComputerVision #MachineLearning #MotionAnalysis
1
Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation

📝 Summary:
This paper introduces a novel framework for generating high-quality synthetic data for LLMs in recommender systems. This synthetic data significantly outperforms real data and enables the first robust power-law scaling for LLMs in recommendation, allowing for predictable capability development.

🔹 Publication Date: Published on Feb 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.07298
• PDF: https://arxiv.org/pdf/2602.07298

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

📝 Summary:
scPilot enables large language models to directly analyze single-cell RNA-seq data through omics-native reasoning. This framework improves accuracy in cell-type annotation and developmental trajectory reconstruction via step-by-step reasoning, providing auditable and interpretable analyses.

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.11609
• PDF: https://arxiv.org/pdf/2602.11609
• Github: https://github.com/maitrix-org/scPilot

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Steer2Edit: From Activation Steering to Component-Level Editing

📝 Summary:
Steer2Edit transforms LLM steering signals into training-free, component-level weight edits. This method precisely targets attention heads and MLP neurons, improving safety, truthfulness, and efficiency with better attribute-utility trade-offs than global steering.

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.09870
• PDF: https://arxiv.org/pdf/2602.09870

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research