ML Research Hub

🌟 BioNeMo: A Framework for Developing AI Models for Drug Design.

NVIDIA BioNeMo2 Framework is a set of tools, libraries, and models for computational drug discovery and design.

It accelerates the most time-consuming and expensive steps in building and adapting biomolecular AI models by providing optimized models and tools that are easily integrated into GPU-based computing resources.

The framework enables the creation, training and tuning of models, and its capabilities span a variety of workloads and therapeutic mechanisms: molecule generation, protein structure prediction, protein-ligand prediction and representation learning.

In addition to pipeline code, scripts and utilities, BioNeMo2 Framework contains:

▶️ Pre-trained models:

🟢

ESM-2 is a pre-trained bidirectional encoder (BERT-like) for amino acid sequences. BioNeMo2 includes checkpoints with parameters 650M and 3B;

🟢

Geneformer is a tabular scoring model that generates a dense representation of a cell's scRNA by examining co-expression patterns in individual cells.

▶️ Datasets:

🟠

CELLxGENE is a collection of publicly available single-cell datasets collected by the CZI (Chan Zuckerberg Initiative) with a total volume of 24 million cells;

🟠

UniProt is a database of clustered sets of protein sequences from UniProtKB, created on the basis of translated genomic data.

📌 Licensing: Apache 2.0 License.

🟡

Project page

🟡

Documentation

🖥

GitHub

#AI #ML #Framework #NVIDIA

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

👍6

6.48K views18:26

ML Research Hub

DeepSeek-V3 Technical Report

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in #DeepSeek V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.

Paper: https://arxiv.org/pdf/2412.19437v1.pdf

Code: https://github.com/deepseek-ai/deepseek-v3

#aiagents #ai #llm #ml #machinelearning #python

https://t.iss.one/DataScienceT

💚

Please open Telegram to view this post

VIEW IN TELEGRAM

👍2❤1

2.54K viewsedited 19:45

ML Research Hub

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.

Paper: https://arxiv.org/pdf/2408.01800v1.pdf

Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v

Datasets: Video-MME

#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics

https://t.iss.one/DataScienceT

❤️

Please open Telegram to view this post

VIEW IN TELEGRAM

👍3

2.06K viewsedited 06:05

ML Research Hub

🐫Tülu 3 (what a name) 405B - another release!

An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks

Scalable to 405B - with performance on par with GPT-4o and outperforming previous models in the same class.

▪ Blog: https://allenai.org/blog/tulu-3-405B
▪You can test it here: https://playground.allenai.org/?model=tulu3-405b
▪ Technical report: https://allenai.org/blog/tulu-3-technical
▪ Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5

#llm #ml #ai #opensource

https://t.iss.one/DataScienceT

❤️

Please open Telegram to view this post

VIEW IN TELEGRAM

👍4

2.29K viewsedited 15:19

ML Research Hub

🔥🔥🔥 SmolVLM developers have released open source code for training SmolVLM from scratch on 256 H100!

Inspired by DeepSeek R1, they have open-sourced the complete code for training the model and weights!

You can now train any of the SmolVLMs or create your own VLMs!

Starting training for SmolVLM 256M is very simple:
./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . sh

▪ Code: https://github.com/huggingface/smollm/tree/main/vision
▪ SmolVLM: https://github.com/huggingface/smollm/tree/main

#SmolVLM #llm #opensource #ml #ai

👍3

1.85K viewsedited 06:47

ML Research Hub

The Hundred-Page Language Models Book

Read it:
https://github.com/aburkov/theLMbook

#LLM #NLP #ML #AI #PYTHON #PYTORCH

https://t.iss.one/DataScienceM

👍4

2.81K views12:13

ML Research Hub

🚀 Release day: Qwen launched Qwen3-Omni — the first native end-to-end *omni-modal AI*

The model processes text, images, audio, and video in a single model.

On benchmarks, it looks like all modalities work with equal quality.

⚡️

Features
- First place in 22 out of 36 audio and multimodal benchmarks
- Support for 119 text languages,
- Minimal latency — 211 ms
- Audio processing up to 30 minutes long
- Allows flexible customization via system prompts
- Built-in tool calling

🌟 Open-source releases
The company released three versions:
- Qwen3-Omni-30B-A3B-Instruct
- Qwen3-Omni-30B-A3B-Thinking
- Qwen3-Omni-30B-A3B-Captioner

👉 You can try it here:

💬 Chat: https://chat.qwen.ai/?models=qwen3-omni-flash

👨‍💻

GitHub: https://github.com/QwenLM/Qwen3-Omni

🤗

Hugging Face: https://huggingface.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe

🤖

ModelScope: https://modelscope.cn/collections/Qwen3-Omni-867aef131e7d4f

🎬

Demo: https://huggingface.co/spaces/Qwen/Qwen3-Omni-Demo

#qwen #opensource #llm #ml

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM