ML Research Hub
32.6K subscribers
3.41K photos
134 videos
23 files
3.63K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho
Download Telegram
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.

Paper: https://arxiv.org/pdf/2408.01800v1.pdf

Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v

Datasets: Video-MME

#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics

https://t.iss.one/DataScienceT ❤️
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3
CodeClash: Benchmarking Goal-Oriented Software Engineering

📝 Summary:
CodeClash is a benchmark evaluating language models on open-ended, goal-oriented code development through competitive tournaments. It shows LMs struggle with strategic reasoning and long-term codebase maintenance, performing poorly against human experts.

🔹 Publication Date: Published on Nov 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00839
• PDF: https://arxiv.org/pdf/2511.00839

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LanguageModels #SoftwareEngineering #AIEvaluation #CodeDevelopment #Benchmarking
1
HAFixAgent: History-Aware Automated Program Repair Agent

📝 Summary:
HAFixAgent enhances automated program repair for complex multi-hunk bugs by incorporating repository history. It significantly improves bug-fixing effectiveness over existing agent-based systems while maintaining efficiency. This offers a practical approach for history-aware agentic APR.

🔹 Publication Date: Published on Nov 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.01047
• PDF: https://arxiv.org/pdf/2511.01047

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AutomatedProgramRepair #SoftwareEngineering #AI #BugFixing #CodeRepair
Agentic Refactoring: An Empirical Study of AI Coding Agents

📝 Summary:
A study of AI agent-generated refactoring in Java projects found agents frequently perform low-level consistency edits. Driven by maintainability and readability, these refactorings lead to small but significant improvements in code quality metrics like class size and complexity.

🔹 Publication Date: Published on Nov 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.04824
• PDF: https://arxiv.org/pdf/2511.04824

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIagents #CodeRefactoring #SoftwareEngineering #CodeQuality #AIResearch
UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

📝 Summary:
UI2Code^N is a visual language model trained for interactive UI-to-code generation, editing, and polishing. It uses multi-turn feedback to achieve state-of-the-art performance among open-source models, comparable to leading closed-source solutions.

🔹 Publication Date: Published on Nov 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08195
• PDF: https://arxiv.org/pdf/2511.08195
• Project Page: https://zheny2751-dotcom.github.io/ui2code-n.github.io/
• Github: https://zheny2751-dotcom.github.io/ui2code-n.github.io/

🔹 Models citing this paper:
https://huggingface.co/zai-org/UI2Code_N

Spaces citing this paper:
https://huggingface.co/spaces/zai-org/UI2Code_N-demo-case

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#UI2Code #VisualLanguageModels #CodeGeneration #AI #SoftwareEngineering
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?

📝 Summary:
Live-SWE-agent is the first live software engineering agent that autonomously and continuously evolves itself on-the-fly during runtime. It starts with basic tools and refines its own implementation while solving problems. It achieves 75.4% on SWE-bench Verified and 45.8% on SWE-Bench Pro, outper...

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13646
• PDF: https://arxiv.org/pdf/2511.13646

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SoftwareEngineering #AI #AutonomousAgents #SelfEvolvingAI #LiveSWEagent
Agent READMEs: An Empirical Study of Context Files for Agentic Coding

📝 Summary:
This study analyzed 2303 agent context files, finding them complex and evolving like config code. Developers prioritize functional details but rarely specify non-functional requirements like security or performance. This suggests a gap in guardrails for agent-written code quality.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12884
• PDF: https://arxiv.org/pdf/2511.12884

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIAgents #SoftwareEngineering #CodeQuality #LLMs #AIResearch