ML Research Hub
32.8K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
ReCode: Unify Plan and Action for Universal Granularity Control

📝 Summary:
ReCode unifies planning and action in LLM agents via recursive code generation. It treats plans as abstract functions recursively decomposed into primitive actions, enabling dynamic decision granularity. This significantly improves performance and data efficiency.

🔹 Publication Date: Published on Oct 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.23564
• PDF: https://arxiv.org/pdf/2510.23564
• Github: https://github.com/FoundationAgents/ReCode

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMAgents #AI #CodeGeneration #Planning #GranularityControl
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

📝 Summary:
PaperCoder is a multi-agent LLM framework that automates converting machine learning papers into functional code repositories. It uses planning, analysis, and generation stages with specialized agents. Evaluations show it effectively creates high-quality implementations, outperforming strong base...

🔹 Publication Date: Published on Apr 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.17192
• PDF: https://arxiv.org/pdf/2504.17192
• Project Page: https://huggingface.co/papers/2504.15080
• Github: https://github.com/going-doer/Paper2Code

Datasets citing this paper:
https://huggingface.co/datasets/iaminju/paper2code

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#CodeGeneration #MachineLearning #LLM #AI #Automation
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

📝 Summary:
This study develops a two-stage reinforcement learning method for competitive code generation. It uses tailored data curation and a hard-focus curriculum, achieving state-of-the-art performance on competitive programming benchmarks.

🔹 Publication Date: Published on Nov 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06307
• PDF: https://arxiv.org/pdf/2511.06307

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #CodeGeneration #DataCuration #MachineLearning #AIResearch
1
UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

📝 Summary:
UI2Code^N is a visual language model trained for interactive UI-to-code generation, editing, and polishing. It uses multi-turn feedback to achieve state-of-the-art performance among open-source models, comparable to leading closed-source solutions.

🔹 Publication Date: Published on Nov 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08195
• PDF: https://arxiv.org/pdf/2511.08195
• Project Page: https://zheny2751-dotcom.github.io/ui2code-n.github.io/
• Github: https://zheny2751-dotcom.github.io/ui2code-n.github.io/

🔹 Models citing this paper:
https://huggingface.co/zai-org/UI2Code_N

Spaces citing this paper:
https://huggingface.co/spaces/zai-org/UI2Code_N-demo-case

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#UI2Code #VisualLanguageModels #CodeGeneration #AI #SoftwareEngineering
Code2Video: A Code-centric Paradigm for Educational Video Generation

📝 Summary:
Code2Video is a code-centric agent framework generating educational videos via executable Python code. It uses three collaborative agents to improve coherence and interpretability, outperforming direct code generation by 40% and matching human-crafted tutorials.

🔹 Publication Date: Published on Oct 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.01174
• PDF: https://arxiv.org/pdf/2510.01174
• Project Page: https://showlab.github.io/Code2Video/
• Github: https://github.com/showlab/code2video

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #VideoGeneration #EducationalTech #CodeGeneration #DeepLearning
WizardCoder: Empowering Code Large Language Models with Evol-Instruct

📝 Summary:
WizardCoder is a Code LLM fine-tuned using Evol-Instruct for complex instructions. It significantly outperforms open-source and major closed LLMs on code generation benchmarks.

🔹 Publication Date: Published on Jun 14, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2306.08568
• PDF: https://arxiv.org/pdf/2306.08568
• Github: https://github.com/nlpxucan/WizardLM

🔹 Models citing this paper:
https://huggingface.co/WizardLMTeam/WizardCoder-Python-34B-V1.0
https://huggingface.co/WizardLMTeam/WizardCoder-15B-V1.0
https://huggingface.co/alpindale/WizardLM-2-8x22B

Datasets citing this paper:
https://huggingface.co/datasets/WizardLMTeam/WizardLM_evol_instruct_V2_196k
https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1
https://huggingface.co/datasets/WizardLMTeam/WizardLM_evol_instruct_70k

Spaces citing this paper:
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard
https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard
https://huggingface.co/spaces/FallnAI/Quantize-HF-Models

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#CodeLLM #LLM #AIE #CodeGeneration #EvolInstruct
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

📝 Summary:
SWE-Bench++ is an automated framework generating scalable, multilingual, repository-level coding tasks from live GitHub pull requests. It overcomes manual curation limits and static datasets, offering a benchmark to evaluate and improve code generation models across 11 languages.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17419
• PDF: https://arxiv.org/pdf/2512.17419
• Project Page: https://research.turing.com/swebench
• Github: https://huggingface.co/papers?q=GitHub%20pull%20requests

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SoftwareEngineering #CodeGeneration #AIBenchmarking #MachineLearning #OpenSource
1
SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models

📝 Summary:
SecureCode v2.0 is a production-grade dataset of 1215 security-focused coding examples. It trains AI models to generate secure code by providing real-incident examples with vulnerable and secure implementations, attacks, defense, and operational security context across 11 languages, using a conve...

🔹 Publication Date: Published on Dec 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18542
• PDF: https://arxiv.org/pdf/2512.18542
• Project Page: https://perfecxion.ai/
• Github: https://github.com/scthornton/securecode-v2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Cybersecurity #CodeSecurity #AI #CodeGeneration #Dataset