ML Research Hub – Telegram

ML Research Hub

32.9K subscribers

5.35K photos

332 videos

24 files

5.78K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.9K subscribers

ML Research Hub

✨A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't)

📝 Summary:
Targeted instruction selection for LLM fine-tuning can be improved by systematically analyzing data representation and selection algorithms, with gradient-based representations and greedy round-robin ...

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14696
• PDF: https://arxiv.org/pdf/2602.14696
• Github: https://github.com/dcml-lab/targeted-instruction-selection

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

159 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

114 views05:02

ML Research Hub

✨BitDance: Scaling Autoregressive Generative Models with Binary Tokens

📝 Summary:
BitDance is a scalable autoregressive image generator using binary visual tokens and a binary diffusion head. It introduces next-patch diffusion for parallel token prediction, significantly improving inference speed and achieving state-of-the-art performance with fewer parameters.

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14041
• PDF: https://arxiv.org/pdf/2602.14041
• Github: https://github.com/shallowdream204/BitDance

🔹 Models citing this paper:
• https://huggingface.co/shallowdream204/BitDance-14B-16x
• https://huggingface.co/shallowdream204/BitDance-14B-64x
• https://huggingface.co/shallowdream204/BitDance-ImageNet

✨ Spaces citing this paper:
• https://huggingface.co/spaces/shallowdream204/BitDance-14B-64x

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token...

117 views05:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WebWorld: A Large-Scale World Model for Web Agent Training

📝 Summary:
WebWorld is an open-web simulator trained on over one million interactions that supports long-horizon reasoning and multi-format data, achieving performance comparable to advanced models like Gemini-3...

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14721
• PDF: https://arxiv.org/pdf/2602.14721

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

131 views05:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation

📝 Summary:
MoRL is a unified multimodal motion model using reinforcement learning with verifiable rewards. It significantly improves human motion understanding and generation through enhanced semantic alignment, reasoning, and physical plausibility, outperforming baselines.

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14534
• PDF: https://arxiv.org/pdf/2602.14534
• Project Page: https://aigeeksgroup.github.io/MoRL/
• Github: https://aigeeksgroup.github.io/MoRL/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

123 views06:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Preliminary sonification of ENSO using traditional Javanese gamelan scales

📝 Summary:
Parameter-mapping sonification of ENSO data preserves dynamical signatures through acoustic phase space analysis, revealing distinct coupling regimes in traditional musical scales. AI-generated summar...

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14560
• PDF: https://arxiv.org/pdf/2602.14560
• Project Page: https://doi.org/10.17605/OSF.IO/QY82M
• Github: https://github.com/sandyherho/suppl-enso-javanese-sonification

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

152 views06:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Query as Anchor: Scenario-Adaptive User Representation via Large Language Model

📝 Summary:
Query-as-Anchor is a novel framework shifting user modeling from static encoding to dynamic query-aware synthesis using large language models. It employs specialized architecture and training, achieving state-of-the-art performance and efficient deployment in industrial settings.

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14492
• PDF: https://arxiv.org/pdf/2602.14492
• Github: https://github.com/JhCircle/Q-Anchor

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

108 views07:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Acoustivision Pro: An Open-Source Interactive Platform for Room Impulse Response Analysis and Acoustic Characterization

📝 Summary:
Room acoustics analysis plays a central role in architectural design, audio engineering, speech intelligibility assessment, and hearing research. Despite the availability of standardized metrics such ...

🔹 Publication Date: Published on Feb 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12299
• PDF: https://arxiv.org/pdf/2602.12299
• Project Page: https://huggingface.co/spaces/mandipgoswami/acoustivision-pro

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

128 views07:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision

📝 Summary:
Conversational image segmentation addresses functional and physical reasoning tasks by introducing a new benchmark and model that combines segmentation priors with language understanding. AI-generated...

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13195
• PDF: https://arxiv.org/pdf/2602.13195
• Project Page: https://glab-caltech.github.io/converseg/
• Github: https://github.com/AadSah/ConverSeg

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

134 views07:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Experiential Reinforcement Learning

📝 Summary:
Experiential Reinforcement Learning ERL addresses challenges in sparse-reward environments by embedding an explicit experience-reflection-consolidation loop. This process converts feedback into structured behavioral revision, significantly improving learning efficiency and performance without add...

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13949
• PDF: https://arxiv.org/pdf/2602.13949

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #MachineLearning #AI #ERL #SparseRewards

143 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks

📝 Summary:
A study reveals prefill attacks as a critical, underexplored vulnerability in open-weight language models. These attacks, which predefine initial response tokens, consistently compromise major models, necessitating urgent defense development.

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14689
• PDF: https://arxiv.org/pdf/2602.14689

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PrefillAttacks #LLMSecurity #AIvulnerability #OpenWeightModels #LanguageModels

204 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem

📝 Summary:
InnoEval offers a new framework for evaluating research ideas, addressing the limitations of current methods. It uses knowledge-grounded, multi-perspective reasoning, employing deep knowledge search and an innovation review board for multi-dimensional assessment. It outperforms baselines and alig...

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14367
• PDF: https://arxiv.org/pdf/2602.14367
• Project Page: https://innoeval.zjukg.cn/
• Github: https://github.com/zjunlp/InnoEval

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ResearchEvaluation #KnowledgeReasoning #AI #Innovation #NLP

173 views09:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation

📝 Summary:
This paper introduces the first systematic benchmark for evaluating knowledge-extraction attacks and defenses on Retrieval-Augmented Generation systems. It standardizes testing across diverse models and strategies to enable comparable evaluation and help build privacy-preserving RAG.

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.09319
• PDF: https://arxiv.org/pdf/2602.09319

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#RAG #KnowledgeExtraction #Cybersecurity #AIPrivacy #Benchmarking

134 views10:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation

📝 Summary:
LLM judges show bias, increasingly preferring AI-generated summaries over human ones as similarity to human references decreases. This widespread bias across models suggests LLM-as-a-judge needs more sophisticated evaluation beyond simple comparison.

🔹 Publication Date: Published on Feb 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.07673
• PDF: https://arxiv.org/pdf/2602.07673

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AIbias #AIEvaluation #NLP #AIethics

167 views10:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training

📝 Summary:
Data Darwinism introduces a ten-level taxonomy for data-model co-evolution. Advanced processing of scientific text, like generative refinement, significantly improves foundation model performance on domain-aligned tasks. This systematic approach unlocks latent data value.

🔹 Publication Date: Published on Feb 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.07824
• PDF: https://arxiv.org/pdf/2602.07824
• Github: https://github.com/GAIR-NLP/Data-Darwinism

🔹 Models citing this paper:
• https://huggingface.co/GAIR/daVinci-origin-3B
• https://huggingface.co/GAIR/daVinci-origin-7B

✨ Datasets citing this paper:
• https://huggingface.co/datasets/GAIR/Darwin-Science

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DataScience #FoundationModels #Pretraining #GenerativeAI #ScientificData

206 views10:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

📝 Summary:
Nanbeige4.1-3B is a 3B-parameter model excelling in agentic behavior, code generation, and reasoning. It outperforms larger models through advanced reward modeling and training, demonstrating broad competence for a small language model.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13367
• PDF: https://arxiv.org/pdf/2602.13367
• Project Page: https://huggingface.co/Nanbeige/Nanbeige4.1-3B

🔹 Models citing this paper:
• https://huggingface.co/Nanbeige/Nanbeige4.1-3B

✨ Spaces citing this paper:
• https://huggingface.co/spaces/PioTio/AIMan

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AI #SmallLanguageModels #AgenticAI #CodeGeneration

❤1

229 views11:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

📝 Summary:
DeepImageSearch introduces an agentic image retrieval paradigm that enables multi-step reasoning over visual histories, moving beyond isolated semantic matching. It uses contextual cues for autonomous exploration. The DISBench benchmark shows current models struggle, proving agentic reasoning is ...

🔹 Publication Date: Published on Feb 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10809
• PDF: https://arxiv.org/pdf/2602.10809
• Github: https://github.com/RUC-NLPIR/DeepImageSearch

✨ Spaces citing this paper:
• https://huggingface.co/spaces/RUC-NLPIR/DISBench-Leaderboard

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageRetrieval #AgenticAI #MultimodalAI #ComputerVision #AIResearch

242 views12:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AutoDev: Automated AI-Driven Development

📝 Summary:
AutoDev is an automated AI framework that uses autonomous agents to perform diverse software engineering tasks like coding, testing, and git operations in a secure Docker environment. It achieved high performance on HumanEval, significantly advancing AI-driven development.

🔹 Publication Date: Published on Mar 13, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2403.08299
• PDF: https://arxiv.org/pdf/2403.08299
• Github: https://github.com/vxcontrol/pentagi

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #SoftwareEngineering #AutomatedDevelopment #AutonomousAgents #GenAI

245 views13:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models

📝 Summary:
McDiffuSE uses Monte Carlo Tree Search to optimize slot infilling order in Masked Diffusion Models, enhancing reasoning performance. It achieved significant gains, revealing non-sequential generation and larger exploration are key to overcoming model biases.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12586
• PDF: https://arxiv.org/pdf/2602.12586

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MonteCarloTreeSearch #DiffusionModels #NLP #LanguageModels #AI

❤1

217 views14:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts

📝 Summary:
LM-Lexicon improves definition modeling using data clustering and a sparse mixture-of-experts architecture. It trains specialized semantic experts, achieving substantial improvements in definition quality and higher BLEU scores. This advances efficient language models for semantic applications.

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14060
• PDF: https://arxiv.org/pdf/2602.14060
• Project Page: https://lm-lexicon.github.io
• Github: https://github.com/jacklanda/LMLexicon

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

171 views15:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling

📝 Summary:
In this resource paper, we present DHPLT, an open collection of diachronic corpora in 41 diverse languages. DHPLT is based on the web-crawled HPLT datasets; we use web crawl timestamps as the approxim...

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.11968
• PDF: https://arxiv.org/pdf/2602.11968
• Project Page: https://data.hplt-project.org/three/diachronic/
• Github: https://github.com/ltgoslo/scdisc_hplt

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

176 views15:06

✨ Explore Data Science 📝 Write your paper