This media is not supported in your browser
VIEW IN TELEGRAM
Vico is a no-training framework that analyzes how individual tokens from prompt input tokens affect the generated video, and adjusts the model to prevent dominance by considering all prompt words equally.
To do this, Vico builds a spatio-temporal attention graph, with which it evaluates and adjusts the representation of all input concepts in the video.
git clone https://github.com/Adamdad/vico.git
pip install diffusers==0.26.3
git lfs install
git clone https://huggingface.co/adamdad/videocrafterv2_diffusers
export PYTHONPATH="$PWD"
python videocrafterv2_vico.py \
--prompts XXX \
--unet_path $PATH_TO_VIDEOCRAFTERV2 \
--attribution_mode "latent_attention_flow_st_soft"#T2V #Framework #ML
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
π1
When training generative models, the training dataset plays an important role in the quality of reference of ready-made models.
One of the good sources can be MiraData from Tencent - a ready-made dataset with a total video duration of 16 thousand hours, designed for training models for generating text in videos. It includes long videos (average 72.1 seconds) with high motion intensity and detailed structured annotations (average 318 words per video).
To assess the quality of the dataset, a system of MiraBench benchmarks was even specially created, consisting of 17 metrics that evaluate temporal consistency, movement in the frame, video quality, and other parameters. According to their results, MiroData outperforms other well-known datasets available in open sources, which mainly consist of short videos with floating quality and short descriptions.
#Text2Video #Dataset #ML
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
π2β€1
Mamba Vision is an implementation of the Mamba architecture using Selective State Space Models (SSM) in image processing from Nvidia Lab.
MambaVision demonstrates more efficient use of computing resources compared to traditional transformer-based architectures (VIT and Swin), and the use of SSM opens up new ways of extracting and processing visual features. The proposed architecture shows good scalability, maintaining efficiency as the model size increases.
MambaVision is applicable to a variety of computer vision tasks, including image classification and semantic segmentation.
The project is in its early stages and its effectiveness on real-world computer vision tasks has yet to be fully assessed.
At the moment, it has only been used in the image classification task.
MambaVision-T (32M)
MambaVision-T2 (35M)
MambaVision-S (50M)
MambaVision-B (98M)
MambaVision-L (228M)
MambaVision-L2 (241M)
β οΈ Licensing:
For non-commercial projects: CC-BY-NC-SA-4.0
For commercial use: request via form
#MambaVision #ML
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
β€2π2π2
DG-Mesh reconstructs a high-quality dynamic 3D vertex-matched mesh from monocular video. The pipeline uses 3D Gaussian wavelets to represent dynamic scenes and differentiable algorithms to construct polygons.
DG-Mesh allows you to track the movement of vertices, simplifying the texturing of dynamic objects.
The method is memory efficient and fully differentiable, allowing optimization of the target object's 3D mesh directly.
The Github repository contains code for local training using datasets:
- D-NeRF
- DG-Mesh
- NeuralActor
- Custom dataset , shot on Iphone 14 Pro and processed in Record3D, RealityCheck and masked in DEVA.
conda create -n dg-mesh python=3.9
conda activate dg-mesh
conda install pytorch torchvision torcaudio pytorch-cuda=11.8 -c pytorch -c nvidia
#Install nvdiffrast
pip install git+https://github.com/NVlabs/tiny-cuda-nn#subdirectory=bindings/torch
pip install git+https://github.com/NVlabs/nvdiffrast/
# Install pytorch3d
export FORCE_CUDA=1
conda install -c fvcore -c iopath -c conda-forge fvcore iopath -y
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
# Clone this repository
git clone https://github.com/Isabella98Liu/DG-Mesh.git
cd DG-Mesh
# Install submodules
pip install dgmesh/submodules/diff-gaussian-rasterization
pip install dgmesh/submodules/simple-knn
# Install other dependencies
pip install -r requirements.txt
#Video2Mesh #3D #ML #NeRF
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
π4β€1
Forwarded from Python | Machine Learning | Coding | R
Data Science Cheat Sheets
Quick help to make a data scientist's life easier
About Dataset
A collection of cheat sheets for various data-science related languages and topics
https://t.iss.one/codeprogrammerπ
π‘ #deeplearning #AI #ML #python
Quick help to make a data scientist's life easier
About Dataset
A collection of cheat sheets for various data-science related languages and topics
https://t.iss.one/codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
π4β€1
Forwarded from Python | Machine Learning | Coding | R
@CodeProgrammer Data Science Cheat Sheets.zip
596.3 MB
Data Science Cheat Sheets
Quick help to make a data scientist's life easierβ
https://t.iss.one/codeprogrammerπ
π‘ #deeplearning #AI #ML #python
Quick help to make a data scientist's life easier
https://t.iss.one/codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
π4
This media is not supported in your browser
VIEW IN TELEGRAM
Recurrent Neural Network (RNN) by hand βοΈ Excel
π Tags: #python #ML #RNN
https://t.iss.one/codeprogrammerβοΈ
Download Excel fileπ
https://t.iss.one/+Tdshx2j5cZ00N2Ji
https://t.iss.one/codeprogrammer
Download Excel file
https://t.iss.one/+Tdshx2j5cZ00N2Ji
Please open Telegram to view this post
VIEW IN TELEGRAM
π3β€2
This media is not supported in your browser
VIEW IN TELEGRAM
Transformer by Hand βοΈ in 5 Minutes with Anna Rahn
π Tags: #python #ML #Transformer
https://t.iss.one/codeprogrammerβοΈ
https://t.iss.one/codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
π6β€1
NVIDIA BioNeMo2 Framework is a set of tools, libraries, and models for computational drug discovery and design.
It accelerates the most time-consuming and expensive steps in building and adapting biomolecular AI models by providing optimized models and tools that are easily integrated into GPU-based computing resources.
The framework enables the creation, training and tuning of models, and its capabilities span a variety of workloads and therapeutic mechanisms: molecule generation, protein structure prediction, protein-ligand prediction and representation learning.
In addition to pipeline code, scripts and utilities, BioNeMo2 Framework contains:
#AI #ML #Framework #NVIDIA
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
π6
DeepSeek-V3 Technical Report
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in #DeepSeek V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
Paper: https://arxiv.org/pdf/2412.19437v1.pdf
Code: https://github.com/deepseek-ai/deepseek-v3
#aiagents #ai #llm #ml #machinelearning #python
https://t.iss.one/DataScienceTπ
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in #DeepSeek V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
Paper: https://arxiv.org/pdf/2412.19437v1.pdf
Code: https://github.com/deepseek-ai/deepseek-v3
#aiagents #ai #llm #ml #machinelearning #python
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
π2β€1
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.
Paper: https://arxiv.org/pdf/2408.01800v1.pdf
Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v
Datasets: Video-MME
#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics
https://t.iss.one/DataScienceTβ€οΈ
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of #AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient #MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong #OCR capability and 1.8M pixel high-resolution #image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.
Paper: https://arxiv.org/pdf/2408.01800v1.pdf
Codes:
https://github.com/OpenBMB/MiniCPM-o
https://github.com/openbmb/minicpm-v
Datasets: Video-MME
#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras #SQL #Statistics
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
π3
π«TΓΌlu 3 (what a name) 405B - ββanother release!
An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks
Scalable to 405B - ββwith performance on par with GPT-4o and outperforming previous models in the same class.
βͺ Blog: https://allenai.org/blog/tulu-3-405B
βͺYou can test it here: https://playground.allenai.org/?model=tulu3-405b
βͺ Technical report: https://allenai.org/blog/tulu-3-technical
βͺ Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
#llm #ml #ai #opensource
https://t.iss.one/DataScienceTβ€οΈ
An open source model (and no, it's not a Chinese model) that outperforms the DeepSeek-V3! on multiple benchmarks
Scalable to 405B - ββwith performance on par with GPT-4o and outperforming previous models in the same class.
βͺ Blog: https://allenai.org/blog/tulu-3-405B
βͺYou can test it here: https://playground.allenai.org/?model=tulu3-405b
βͺ Technical report: https://allenai.org/blog/tulu-3-technical
βͺ Hugging Face : https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
#llm #ml #ai #opensource
https://t.iss.one/DataScienceT
Please open Telegram to view this post
VIEW IN TELEGRAM
π4
π₯π₯π₯ SmolVLM developers have released open source code for training SmolVLM from scratch on 256 H100!
Inspired by DeepSeek R1, they have open-sourced the complete code for training the model and weights!
You can now train any of the SmolVLMs or create your own VLMs!
Starting training for SmolVLM 256M is very simple:
βͺ Code: https://github.com/huggingface/smollm/tree/main/vision
βͺ SmolVLM: https://github.com/huggingface/smollm/tree/main
#SmolVLM #llm #opensource #ml #ai
Inspired by DeepSeek R1, they have open-sourced the complete code for training the model and weights!
You can now train any of the SmolVLMs or create your own VLMs!
Starting training for SmolVLM 256M is very simple:
./vision/experiments/pretraining/vloom/tr_341_smolvlm_025b_1st_stage/01_launch . shβͺ Code: https://github.com/huggingface/smollm/tree/main/vision
βͺ SmolVLM: https://github.com/huggingface/smollm/tree/main
#SmolVLM #llm #opensource #ml #ai
π3
The Hundred-Page Language Models Book
Read it:
https://github.com/aburkov/theLMbook
Read it:
https://github.com/aburkov/theLMbook
#LLM #NLP #ML #AI #PYTHON #PYTORCH
https://t.iss.one/DataScienceM
π4
The model processes text, images, audio, and video in a single model.
On benchmarks, it looks like all modalities work with equal quality.
- First place in 22 out of 36 audio and multimodal benchmarks
- Support for 119 text languages,
- Minimal latency β 211 ms
- Audio processing up to 30 minutes long
- Allows flexible customization via system prompts
- Built-in tool calling
The company released three versions:
- Qwen3-Omni-30B-A3B-Instruct
- Qwen3-Omni-30B-A3B-Thinking
- Qwen3-Omni-30B-A3B-Captioner
#qwen #opensource #llm #ml
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM