ML Research Hub – Telegram

ML Research Hub

32.9K subscribers

5.35K photos

332 videos

24 files

5.78K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.9K subscribers

ML Research Hub

✨Proxy Compression for Language Modeling

📝 Summary:
Proxy compression trains language models on both raw bytes and compressed views. This enables efficient training on compressed inputs while offering a robust, end-to-end raw-byte inference. It improves training efficiency and eventually matches tokenizer performance.

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04289
• PDF: https://arxiv.org/pdf/2602.04289
• Github: https://github.com/LZhengisme/proxy-compression

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LanguageModels #Compression #MachineLearning #AI #Efficiency

147 views10:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation

📝 Summary:
3DiMo enables view-agnostic human motion control in video generation by training a motion encoder alongside a pretrained video generator to distill driving frames into compact motion tokens that align...

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03796
• PDF: https://arxiv.org/pdf/2602.03796
• Github: https://hjrphoebus.github.io/3DiMo/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

134 views11:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨Skin Tokens: A Learned Compact Representation for Unified Autoregressive Rigging

📝 Summary:
Generative 3D models face challenges in animation rigging, which this work addresses by introducing SkinTokens—a learned discrete representation for skinning weights—and TokenRig, a unified autoregres...

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04805
• PDF: https://arxiv.org/pdf/2602.04805
• Project Page: https://zjp-shadow.github.io/works/SkinTokens/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

178 views11:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FASA: Frequency-aware Sparse Attention

📝 Summary:
FASA addresses LLM KV cache memory for long contexts by dynamically predicting token importance. It leverages functional sparsity in RoPEs frequency chunks to identify critical tokens for focused attention. This significantly reduces memory and computation while maintaining high performance.

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03152
• PDF: https://arxiv.org/pdf/2602.03152

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #SparseAttention #MemoryEfficiency #DeepLearning #NLP

173 views11:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations

📝 Summary:
AutoFigure is an agentic AI framework that automatically generates publication-ready scientific illustrations from long-form text. It uses extensive thinking and validation to ensure structural soundness and aesthetic appeal. Supported by FigureBench, a large new benchmark, AutoFigure surpasses b...

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03828
• PDF: https://arxiv.org/pdf/2602.03828
• Github: https://github.com/ResearAI/AutoFigure-Edit

✨ Datasets citing this paper:
• https://huggingface.co/datasets/WestlakeNLP/FigureBench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #GenerativeAI #ScientificIllustrations #ResearchTools #AcademicPublishing

196 views11:59

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

📝 Summary:
D-CORE is a two-stage training framework improving large reasoning models' task decomposition and reasoning. It overcomes Lazy Reasoning using self-distillation and diversity-aware reinforcement learning. D-CORE achieves superior tool-use performance, setting new state-of-the-art results even wit...

🔹 Publication Date: Published on Feb 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02160
• PDF: https://arxiv.org/pdf/2602.02160
• Github: https://github.com/alibaba/EfficientAI

🔹 Models citing this paper:
• https://huggingface.co/bowiehsu/D-CORE-8B

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #TaskDecomposition #ToolUse #ReinforcementLearning #AIResearch

149 views14:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨No One-Size-Fits-All: Building Systems For Translation to Bashkir, Kazakh, Kyrgyz, Tatar and Chuvash Using Synthetic And Original Data

📝 Summary:
This paper explores machine translation for five Turkic languages using nllb-200 LoRA fine-tuning on synthetic data and prompt-based methods. It achieved varied chrF++ scores for different language pairs and releases the dataset and model weights.

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04442
• PDF: https://arxiv.org/pdf/2602.04442

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

125 views14:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SAFE: Stable Alignment Finetuning with Entropy-Aware Predictive Control for RLHF

📝 Summary:
A new reinforcement learning algorithm for language model alignment that improves stability and performance over PPO through enhanced KL divergence control and adaptive reward management. AI-generated...

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04651
• PDF: https://arxiv.org/pdf/2602.04651

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

156 views14:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization

📝 Summary:
SkeletonGaussian enables editable 4D generation by decomposing motion into rigid skeleton-driven and non-rigid fine-grained components using hexplane-based refinement. AI-generated summary 4D generati...

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04271
• PDF: https://arxiv.org/pdf/2602.04271
• Project Page: https://wusar.github.io/projects/skeletongaussian/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

192 views14:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Reward-free Alignment for Conflicting Objectives

📝 Summary:
This paper introduces RACO, a reward-free alignment framework for LLMs facing multiple conflicting objectives. It uses a novel clipped conflict-averse gradient descent to resolve gradient conflicts directly from pairwise preferences. Experiments show RACO consistently achieves superior Pareto tra...

🔹 Publication Date: Published on Feb 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02495
• PDF: https://arxiv.org/pdf/2602.02495

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤2

213 views15:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FOTBCD: A Large-Scale Building Change Detection Benchmark from French Orthophotos and Topographic Data

📝 Summary:
A large-scale building change detection dataset named FOTBCD is introduced, covering 28 French departments with high-resolution imagery and comprehensive annotations for both binary and instance-level...

🔹 Publication Date: Published on Jan 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.22596
• PDF: https://arxiv.org/pdf/2601.22596
• Github: https://github.com/abdelpy/FOTBCD-datasets

✨ Datasets citing this paper:
• https://huggingface.co/datasets/retgenai/FOTBCD-Binary

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤2

297 views15:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨"I May Not Have Articulated Myself Clearly": Diagnosing Dynamic Instability in LLM Reasoning at Inference Time

📝 Summary:
An instability signal from LLM token log probabilities and entropy predicts reasoning failures. This signal, combining distributional shift and uncertainty, reliably forecasts wrong answers. Early instability can be corrective, but late instability more often leads to failure, indicating timing i...

🔹 Publication Date: Published on Feb 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02863
• PDF: https://arxiv.org/pdf/2602.02863

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

301 views17:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

📝 Summary:
AgentArk distills multi-agent reasoning into a single LLM to overcome the high computational cost of multi-agent systems. This framework enables a single agent to achieve multi-agent intelligence, offering efficient yet powerful reasoning, self-correction, and robustness across diverse tasks.

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03955
• PDF: https://arxiv.org/pdf/2602.03955
• Github: https://github.com/AIFrontierLab/AgentArk

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

212 views23:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨HalluHard: A Hard Multi-Turn Hallucination Benchmark

📝 Summary:
Large language models continue to generate plausible but ungrounded factual claims in multi-turn dialogue, with hallucinations remaining significant even when utilizing web search for verification acr...

🔹 Publication Date: Published on Feb 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.01031
• PDF: https://arxiv.org/pdf/2602.01031

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

170 views23:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Trust The Typical

📝 Summary:
Trust The Typical T3 frames LLM safety as an out-of-distribution detection problem, learning what is safe in semantic space. It achieves state-of-the-art performance without harmful example training, drastically reducing false positives and generalizing across languages with low overhead.

🔹 Publication Date: Published on Feb 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.04581
• PDF: https://arxiv.org/pdf/2602.04581

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

161 views01:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Learning to Repair Lean Proofs from Compiler Feedback

📝 Summary:
A new dataset, APRIL, pairs erroneous Lean proofs with compiler feedback, corrected proofs, and natural language diagnoses. Training language models on APRIL substantially improves proof repair accuracy and feedback-conditioned reasoning, outperforming existing baselines.

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02990
• PDF: https://arxiv.org/pdf/2602.02990

✨ Datasets citing this paper:
• https://huggingface.co/datasets/uw-math-ai/APRIL

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

148 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling

📝 Summary:
MeKi enables efficient large language model deployment on edge devices by injecting pre-stored semantic knowledge through token-level memory experts and re-parameterization techniques. AI-generated su...

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03359
• PDF: https://arxiv.org/pdf/2602.03359

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

156 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Semantic Search over 9 Million Mathematical Theorems

📝 Summary:
Large-scale semantic theorem retrieval system demonstrates superior performance over existing baselines using a 9.2 million theorem corpus with systematic analysis of representation context, language ...

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05216
• PDF: https://arxiv.org/pdf/2602.05216

✨ Datasets citing this paper:
• https://huggingface.co/datasets/uw-math-ai/theorem-search-dataset

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

154 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨RISE-Video: Can Video Generators Decode Implicit World Rules?

📝 Summary:
RISE-Video presents a novel benchmark for evaluating text-image-to-video synthesis models based on cognitive reasoning rather than visual fidelity, using a multi-dimensional metric system and automate...

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05986
• PDF: https://arxiv.org/pdf/2602.05986
• Github: https://github.com/VisionXLab/Rise-Video

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

113 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

📝 Summary:
Research analyzes RLVR algorithms' impact on response length in LLMs and VLMs, proposing LUSPO to eliminate length bias and improve reasoning performance. AI-generated summary Recent applications of R...

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05261
• PDF: https://arxiv.org/pdf/2602.05261
• Github: https://github.com/murphy4122/LUSPO

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

88 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

📝 Summary:
SwimBird is a reasoning-switchable multimodal large language model that dynamically selects between text-only, vision-only, and interleaved vision-text reasoning modes based on input queries, achievin...

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06040
• PDF: https://arxiv.org/pdf/2602.06040
• Project Page: https://accio-lab.github.io/SwimBird
• Github: https://github.com/Accio-Lab/SwimBird

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

98 views04:01

✨ Explore Data Science 📝 Write your paper