ML Research Hub

✨Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation

📝 Summary:
Effective multilingual teacher models for synthetic data generation are identified through systematic evaluation of data quality metrics rather than model size alone, with findings showing that prompt...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11290
• PDF: https://arxiv.org/pdf/2604.11290
• Github: https://github.com/ljvmiranda921/polyglot-teachers

🔹 Models citing this paper:
• https://huggingface.co/ljvmiranda921/Polyglot-Gemma3-4B-SFT-ar
• https://huggingface.co/ljvmiranda921/Polyglot-OLMo3-7B-SFT-ar
• https://huggingface.co/ljvmiranda921/Polyglot-OLMo3-7B-SFT-cs

✨ Datasets citing this paper:
• https://huggingface.co/datasets/ljvmiranda921/PolyglotTeachers-SFT-Synth

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

123 views08:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models

📝 Summary:
A retrieval-augmented LLM framework improves financial sentiment analysis by tuning LLMs for sentiment prediction and augmenting them with external context, outperforming traditional models and other ...

🔹 Publication Date: Published on Oct 6, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2310.04027
• PDF: https://arxiv.org/pdf/2310.04027
• Github: https://github.com/AI4Finance-Foundation/FinGPT

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

140 views08:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

📝 Summary:
QuanBench+ evaluates large language models on quantum code generation across multiple frameworks using functional testing and repair-based feedback, revealing significant progress but persistent depen...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08570
• PDF: https://arxiv.org/pdf/2604.08570
• Github: https://github.com/JawadKotaichh/quanbench-plus

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

130 views09:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨BMdataset: A Musicologically Curated LilyPond Dataset

📝 Summary:
A curated LilyPond dataset and adapted CodeBERT model demonstrate that expert-curated small datasets can outperform large noisy corpora for music understanding tasks. AI-generated summary Symbolic mus...

🔹 Publication Date: Published on Apr 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10628
• PDF: https://arxiv.org/pdf/2604.10628
• Project Page: https://zenodo.org/records/18723290
• Github: https://github.com/CSCPadova/lilybert

🔹 Models citing this paper:
• https://huggingface.co/csc-unipd/lilybert

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

149 views09:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series

📝 Summary:
The Bielik v3 PL series achieves improved language-specific performance through specialized Polish tokenization, FOCUS-based embeddings, and multi-stage training with supervised fine-tuning, direct pr...

🔹 Publication Date: Published on Apr 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10799
• PDF: https://arxiv.org/pdf/2604.10799
• Project Page: https://bielik.ai/

🔹 Models citing this paper:
• https://huggingface.co/speakleash/Bielik-PL-11B-v3.0-Instruct
• https://huggingface.co/speakleash/Bielik-PL-Minitron-7B-v3.0-Instruct

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

199 views09:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

📝 Summary:
MEDS improves RL for LLMs by addressing reduced sampling diversity. It uses historical behavioral signals and clustering to identify and penalize recurrent error patterns, encouraging broader exploration. This framework consistently boosts performance and behavioral diversity during sampling.

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11297
• PDF: https://arxiv.org/pdf/2604.11297
• Github: https://github.com/Linxi000/MEDS

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

151 views12:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

📝 Summary:
Mobile GUI agents neglect user privacy personalization, as varied execution trajectories hinder standard optimization. This paper proposes Trajectory Induced Preference Optimization TIPO to address this challenge. TIPO improves persona alignment and task executability, outperforming existing meth...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11259
• PDF: https://arxiv.org/pdf/2604.11259

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MobileAI #PrivacyTech #Personalization #GUIAgents #MachineLearning

134 views12:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SHARE: Social-Humanities AI for Research and Education

📝 Summary:
SHARE models are causal language models pre-trained specifically for social sciences and humanities that match general-purpose model performance while MIRROR provides a text review interface that pres...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11152
• PDF: https://arxiv.org/pdf/2604.11152
• Github: https://github.com/Joaoffg/SHARE

🔹 Models citing this paper:
• https://huggingface.co/Joaoffg/SHARE-4B-Base-2604
• https://huggingface.co/Joaoffg/SHARE-14B-Base-2604

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Joaoffg/Cloze-SSH

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

175 views12:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting

📝 Summary:
SCOPE enhances on-policy distillation by adapting supervision paths based on trajectory correctness, using teacher-perplexity-weighted KL distillation for incorrect trajectories and student-perplexity...

🔹 Publication Date: Published on Apr 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10688
• PDF: https://arxiv.org/pdf/2604.10688

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

174 views13:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Learning Long-term Motion Embeddings for Efficient Kinematics Generation

📝 Summary:
Efficient motion generation is achieved through compressed motion embeddings and conditional flow-matching models that produce realistic long-term motions from text prompts or spatial inputs. AI-gener...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11737
• PDF: https://arxiv.org/pdf/2604.11737
• Project Page: https://compvis.github.io/long-term-motion/
• Github: https://github.com/CompVis/long-term-motion

🔹 Models citing this paper:
• https://huggingface.co/CompVis/ZipMo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

153 views14:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

📝 Summary:
The study reveals that policy routing in alignment-trained language models involves attention gates and amplifier heads that control safety responses, with the routing mechanism being early-committing...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.04385
• PDF: https://arxiv.org/pdf/2604.04385
• Github: https://github.com/gregfrank/how-alignment-routes

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

176 views14:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Counting to Four is still a Chore for VLMs

📝 Summary:
Vision-language models exhibit counting failures due to reduced visual evidence utilization in later language layers, which can be mitigated through modality attention share interventions. AI-generate...

🔹 Publication Date: Published on Apr 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10039
• PDF: https://arxiv.org/pdf/2604.10039
• Project Page: https://huggingface.co/papers?q=modality%20projection%20stage
• Github: https://github.com/leduy99/-CVPRW26-Modality-Attention-Share

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1👍1

220 views14:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Panoptic Pairwise Distortion Graph

📝 Summary:
Researchers introduce a novel approach to image assessment by representing image pairs as structured distortion graphs that capture region-level degradation information, challenging existing multimoda...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11004
• PDF: https://arxiv.org/pdf/2604.11004
• Project Page: https://aismartperception.github.io/distortion-graph/
• Github: https://github.com/AISmartPerception/distortion-graphs

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

234 views16:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

📝 Summary:
AggAgent enables efficient parallel test-time scaling for long-horizon agentic tasks by aggregating trajectories through a lightweight agent that navigates and synthesizes information on demand. AI-ge...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11753
• PDF: https://arxiv.org/pdf/2604.11753

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

206 views17:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Efficient RL Training for LLMs with Experience Replay

📝 Summary:
Experience replay techniques for large language model post-training balance staleness variance and computational costs while maintaining performance and policy entropy. AI-generated summary While Expe...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08706
• PDF: https://arxiv.org/pdf/2604.08706

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

194 views18:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TRACE: Capability-Targeted Agentic Training

📝 Summary:
TRACE improves LLM agents by identifying capability gaps from trajectory comparisons. It then creates targeted training environments for specific skills, using LoRA adapters for efficient, environment-specific self-improvement. This boosts performance on customer service and tool use tasks, outpe...

🔹 Publication Date: Published on Apr 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05336
• PDF: https://arxiv.org/pdf/2604.05336
• Project Page: https://scalingintelligence.stanford.edu/blogs/trace/
• Github: https://github.com/ScalingIntelligence/TRACE

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLMAgents #AI #MachineLearning #LoRA #DeepLearning

193 views19:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs

📝 Summary:
IceCache is a novel KV cache management strategy for long-sequence LLMs that uses semantic token clustering with PagedAttention. It significantly improves memory efficiency while maintaining high accuracy, reducing the KV cache budget by 75% and outperforming other offloading methods.

🔹 Publication Date: Published on Apr 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10539
• PDF: https://arxiv.org/pdf/2604.10539
• Project Page: https://yuzhenmao.github.io/IceCache/
• Github: https://github.com/yuzhenmao/IceCache

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

151 views21:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ATANT: An Evaluation Framework for AI Continuity

📝 Summary:
ATANT presents an open framework for evaluating AI system continuity through a 10-checkpoint methodology using a 250-story corpus across 6 life domains, achieving 100% accuracy in cumulative testing. ...

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06710
• PDF: https://arxiv.org/pdf/2604.06710
• Project Page: https://kenoticlabs.com
• Github: https://github.com/Kenotic-Labs/ATANT

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Kenotic-Labs/ATANTV1.0-corpus

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #AIEvaluation #AIConsistency #Research #DataScience

144 views21:09

✨ Explore Data Science 📝 Write your paper

✨Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

📝 Summary:
Audio-Omni presents the first framework unifying audio generation and editing across diverse audio domains. It combines a multimodal LLM and diffusion transformer, introduces AudioEdit, and achieves state-of-the-art results.

🔹 Publication Date: Published on Apr 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.10708
• PDF: https://arxiv.org/pdf/2604.10708
• Project Page: https://zeyuet.github.io/Audio-Omni/
• Github: https://github.com/ZeyueT/Audio-Omni

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

143 views23:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory

📝 Summary:
RoMem introduces a temporal knowledge graph module that uses semantic speed gates and continuous phase rotation to distinguish persistent from evolving facts, achieving superior performance in tempora...

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11544
• PDF: https://arxiv.org/pdf/2604.11544

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#TemporalKnowledgeGraphs #AgenticMemory #PhaseRotation #AIResearch #MachineLearning

152 views00:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

📝 Summary:
On-policy distillation dynamics in large language models depend on compatible thinking patterns between teacher and student models, with successful distillation characterized by alignment on high-prob...

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13016
• PDF: https://arxiv.org/pdf/2604.13016
• Github: https://github.com/thunlp/OPD

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

130 views03:00

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform