MiniCPM-V: A GPT-4V Level MLLM on Your Phone
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.
Paper: https://arxiv.org/pdf/2408.01800v1.pdf
Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v
Datasets: Video-MME
#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics
https://t.iss.one/DataScienceTβ€οΈ
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.
Paper: https://arxiv.org/pdf/2408.01800v1.pdf
Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v
Datasets: Video-MME
#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
π3
π€π§ OpenAIβs AgentKit: Transforming How Developers Build and Deploy AI Agents
ποΈ 08 Oct 2025
π AI News & Trends
OpenAI continues to redefine the frontiers of artificial intelligence with the introduction of AgentKit a powerful all-in-one toolkit designed to simplify and accelerate how developers build and deploy AI agents. Unveiled during OpenAIβs Dev Day on October 6, 2025, AgentKit marks a transformative leap in agentic AI technology giving developers and organizations the ability to ...
#OpenAI #AgentKit #AIAgents #ArtificialIntelligence #DeveloperTools #AIInnovation
ποΈ 08 Oct 2025
π AI News & Trends
OpenAI continues to redefine the frontiers of artificial intelligence with the introduction of AgentKit a powerful all-in-one toolkit designed to simplify and accelerate how developers build and deploy AI agents. Unveiled during OpenAIβs Dev Day on October 6, 2025, AgentKit marks a transformative leap in agentic AI technology giving developers and organizations the ability to ...
#OpenAI #AgentKit #AIAgents #ArtificialIntelligence #DeveloperTools #AIInnovation
β€1
π€π§ Sora: OpenAIβs Breakthrough Text-to-Video Model Transforming Visual Creativity
ποΈ 18 Oct 2025
π AI News & Trends
Introduction Artificial Intelligence (AI) is rapidly transforming the creative world. From generating realistic images to composing music and writing code, AI has redefined how humans interact with technology. But one of the most revolutionary advancements in this domain is Sora, OpenAIβs text-to-video generative model that converts written prompts into hyper-realistic video clips. Ithas captured global ...
#Sora #OpenAI #TextToVideo #AI #VisualCreativity #GenerativeModel
ποΈ 18 Oct 2025
π AI News & Trends
Introduction Artificial Intelligence (AI) is rapidly transforming the creative world. From generating realistic images to composing music and writing code, AI has redefined how humans interact with technology. But one of the most revolutionary advancements in this domain is Sora, OpenAIβs text-to-video generative model that converts written prompts into hyper-realistic video clips. Ithas captured global ...
#Sora #OpenAI #TextToVideo #AI #VisualCreativity #GenerativeModel
β€3β€βπ₯1
π€π§ Free for 1 Year: ChatGPT Goβs Big Move in India
ποΈ 28 Oct 2025
π AI News & Trends
On 28 October 2025, OpenAI announced that its mid-tier subscription plan, ChatGPT Go, will be available free for one full year in India starting from 4 November. (www.ndtv.com) What is ChatGPT Go? Whatβs the deal? Why this matters ? Things to check / caveats What should users do? Broader implications This move by OpenAI indicates ...
#ChatGPTGo #OpenAI #India #FreeAccess #ArtificialIntelligence #TechNews
ποΈ 28 Oct 2025
π AI News & Trends
On 28 October 2025, OpenAI announced that its mid-tier subscription plan, ChatGPT Go, will be available free for one full year in India starting from 4 November. (www.ndtv.com) What is ChatGPT Go? Whatβs the deal? Why this matters ? Things to check / caveats What should users do? Broader implications This move by OpenAI indicates ...
#ChatGPTGo #OpenAI #India #FreeAccess #ArtificialIntelligence #TechNews
Instead of a rigidly trained classifier, the model takes your own security policy as input and reasons whether the message complies with this policy.
The result is not just "safe/unsafe," but a chain of reasoning that you can verify and improve.
The models are available in two sizes: 120B and 20B.
β’ gpt-oss-safeguard-120B
β’ gpt-oss-safeguard-20B
π‘ Why they are needed:
β’ Policies can be changed without retraining the model
β’ Suitable for niche or rapidly changing risks (e.g., cheating in games or fake reviews)
β’ Does not require thousands of labeled examples
β’ Ideal when explainability is important rather than minimal latency
Both are available under the Apache 2.0 license - they can be freely used, modified, and deployed.
π Official announcement
π€ Hugging Face
#openai #chatgpt #opensource
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM