Forwarded from DeepMind AI Expert (Farzad 🦅)
این دوره بشدت خوب و کامل از سباستین راسچکا بزرگ که دارای چندین کتاب خوب و کاربردی دیگری هم هست این دوره #یادگیری_عمیق و مدلهای مولد رو ارائه داده علاقمندان این حوزه ببینید
▪️ Intro to Deep Learning and Generative Models by Sebastian Raschka.
#منابع #مدل_مولد #مولد #هوش_مصنوعی #پایتون #برنامه_نویسی
🔸 مطالب بیشتر 👇👇
✅ @AI_DeepMind
🔸 @AI_Person
▪️ Intro to Deep Learning and Generative Models by Sebastian Raschka.
#منابع #مدل_مولد #مولد #هوش_مصنوعی #پایتون #برنامه_نویسی
🔸 مطالب بیشتر 👇👇
✅ @AI_DeepMind
🔸 @AI_Person
❤1👍1
Strong recommend for this book and the JAX/TPU docs, even if you are using Torch / GPUs. Clean notation and mental model for some challenging ideas.
https://github.com/jax-ml/scaling-book/
https://github.com/jax-ml/scaling-book/discussions/25
https://docs.jax.dev/en/latest/notebooks/shard_map.html
Post: https://x.com/srush_nlp/status/1925942348082516432
https://github.com/jax-ml/scaling-book/
https://github.com/jax-ml/scaling-book/discussions/25
https://docs.jax.dev/en/latest/notebooks/shard_map.html
Post: https://x.com/srush_nlp/status/1925942348082516432
❤1
I feel like half of my social media feed is composed of AI grifters saying software developers are not going to make it. Combine that sentiment with some economic headwinds and it's easy to feel like we're all screwed. I think that's bullshit. The best days of our industry lie ahead.
https://dustinewers.com/ignore-the-grifters/
https://dustinewers.com/ignore-the-grifters/
🤔1
Why you should read this book?
Most deep learning projects start out by training a model to convergence on data specific to your task, then using that model to drive predictions on future data. The more difficult the task, the larger the model needed to perform it, and the longer it takes to train. Many deep learning models used in production systems today have training times measured in days. This has multiple adverse effects:
- High training costs.
- Slow iteration cycles.
- Hardware resource constraints.
- Poor prediction latency once deployed.
These issues are well-documented in production at companies like Google, Facebook, Reddit, etcetera. Luckily they have driven research and development into tools and techniques that can help reduce these costs and remove these barriers.
This book attempts to serve as a simple introduction to the world of “architecture-independent” model optimization from the perspective of a PyTorch practitioner. We cover techniques like model quantization, model pruning, data-distributed training, mixed-precision training, and just-in-time compilation. We include code samples to help you get started and benchmarks showing the power of these techniques on example models of interest.
https://residentmario.github.io/pytorch-training-performance-guide/intro.html
Most deep learning projects start out by training a model to convergence on data specific to your task, then using that model to drive predictions on future data. The more difficult the task, the larger the model needed to perform it, and the longer it takes to train. Many deep learning models used in production systems today have training times measured in days. This has multiple adverse effects:
- High training costs.
- Slow iteration cycles.
- Hardware resource constraints.
- Poor prediction latency once deployed.
These issues are well-documented in production at companies like Google, Facebook, Reddit, etcetera. Luckily they have driven research and development into tools and techniques that can help reduce these costs and remove these barriers.
This book attempts to serve as a simple introduction to the world of “architecture-independent” model optimization from the perspective of a PyTorch practitioner. We cover techniques like model quantization, model pruning, data-distributed training, mixed-precision training, and just-in-time compilation. We include code samples to help you get started and benchmarks showing the power of these techniques on example models of interest.
https://residentmario.github.io/pytorch-training-performance-guide/intro.html
❤1
یه کتاب به نظر جامع برای یادگیری سریع Diffusion
خودم هنوز فرصت نکردم بخونم ولی به نظر به عنوان یه منبع تقریبا آکادمیک و کتابطور، منبع مناسبیه
https://arxiv.org/pdf/2406.08929
خودم هنوز فرصت نکردم بخونم ولی به نظر به عنوان یه منبع تقریبا آکادمیک و کتابطور، منبع مناسبیه
https://arxiv.org/pdf/2406.08929
❤3
سایت Scholar inbox به شما این قابلیت رو میده که personal digest داشته باشین؛ یعنی پیپرهای مرتبط به فیلدتون رو روزانه بهتون بده (مثل scholar alert ولی همراه با قابلیتهای دیگه مثل مپ و ...)
https://arxiv.org/pdf/2504.08385v1
https://arxiv.org/pdf/2504.08385v1
🔥2👍1
Flow matching in 4 mins
https://x.com/jbhuang0604/status/1950883022942978254?t=BsQv2hm_9VQGHNF0gQsK7A&s=35
https://x.com/jbhuang0604/status/1950883022942978254?t=BsQv2hm_9VQGHNF0gQsK7A&s=35
From GPT-2 to gpt-oss: Analyzing the Architectural Advances
By: Sebastian Raschka
https://magazine.sebastianraschka.com/p/from-gpt-2-to-gpt-oss-analyzing-the
By: Sebastian Raschka
https://magazine.sebastianraschka.com/p/from-gpt-2-to-gpt-oss-analyzing-the
Sebastianraschka
From GPT-2 to gpt-oss: Analyzing the Architectural Advances
And How They Stack Up Against Qwen3
🔥2
Forwarded from Tensorflow(@CVision)
بالاخره صدای زبان فارسی هم شنیده شد!😳
مدل Whisper رو خیلیها میشناسن؛ یکی از قویترین مدلها برای تبدیل صدا به متنه.
اما یه مشکلی که داشت این بود که وقتی نوبت زبان فارسی میشد، دقتش پایین میومد و خیلی از کلمات رو درست نمتونست بنویسه.
اما حالا یه نسخه جدید به اسم Whisper-large-fa-v1 منتشر کرده که میتونه زبان فارسی رو به متن تبدیل کنه.
یه فرقی که این نسخه داره اینکه این نسخه روی یه دیتاست تازه به اسم Persian-Voice-v1 دوباره آموزش داده شده. دیتاستی که لهجههای مختلف فارسی و اصطلاحات خاص فارسی رو شامل میشه.
نتیجه چیشده؟
تشخیص و رونویسی گفتار فارسی خیلی دقیقتر شده.
این یعنی توی کاربردهایی مثل:
✅زیرنویسگذاری خودکار
✅ساخت دستیارهای صوتی
✅ابزارهای NLP فارسی
و مهم از همه اینکه این همهچی متنباز منتشر شده؛ یعنی هر پژوهشگر یا تیمی میتونه راحت استفاده کنه، تغییر بده و پروژههای جدید بسازه.
لینک مدل: https://huggingface.co/vhdm/whisper-large-fa-v1
لینک دیتاست: https://huggingface.co/datasets/vhdm/persian-voice-v1
منبع: https://www.linkedin.com/feed/update/urn:li:activity:7364194597717073925/
مدل Whisper رو خیلیها میشناسن؛ یکی از قویترین مدلها برای تبدیل صدا به متنه.
اما یه مشکلی که داشت این بود که وقتی نوبت زبان فارسی میشد، دقتش پایین میومد و خیلی از کلمات رو درست نمتونست بنویسه.
اما حالا یه نسخه جدید به اسم Whisper-large-fa-v1 منتشر کرده که میتونه زبان فارسی رو به متن تبدیل کنه.
یه فرقی که این نسخه داره اینکه این نسخه روی یه دیتاست تازه به اسم Persian-Voice-v1 دوباره آموزش داده شده. دیتاستی که لهجههای مختلف فارسی و اصطلاحات خاص فارسی رو شامل میشه.
نتیجه چیشده؟
تشخیص و رونویسی گفتار فارسی خیلی دقیقتر شده.
این یعنی توی کاربردهایی مثل:
✅زیرنویسگذاری خودکار
✅ساخت دستیارهای صوتی
✅ابزارهای NLP فارسی
و مهم از همه اینکه این همهچی متنباز منتشر شده؛ یعنی هر پژوهشگر یا تیمی میتونه راحت استفاده کنه، تغییر بده و پروژههای جدید بسازه.
لینک مدل: https://huggingface.co/vhdm/whisper-large-fa-v1
لینک دیتاست: https://huggingface.co/datasets/vhdm/persian-voice-v1
منبع: https://www.linkedin.com/feed/update/urn:li:activity:7364194597717073925/
❤2
Diffusion models demystified, once and for all!
https://www.youtube.com/watch?v=Fk2I6pa6UeA&list=WL&index=19
https://www.youtube.com/watch?v=Fk2I6pa6UeA&list=WL&index=19
YouTube
More Than Image Generators: A Science of Problem-Solving using Probability | Diffusion Models
This is my entry to #SoME4, 3Blue1Brown's Summer of Math Exposition Competition!
Diffusion models are typically portrayed as models that learn to denoise a corrupted image. This way, they can generate new images by gradually removing noise from a sample…
Diffusion models are typically portrayed as models that learn to denoise a corrupted image. This way, they can generate new images by gradually removing noise from a sample…
🔥4
ML & AI resources
https://www.youtube.com/watch?v=R0uMcXsfo2o
YouTube
But how do AI images and videos actually work? | Guest video by Welch Labs
Diffusion models, CLIP, and the math of turning text into images
Welch Labs Book: https://www.welchlabs.com/resources/imaginary-numbers-book
Sections
0:00 - Intro
3:37 - CLIP
6:25 - Shared Embedding Space
8:16 - Diffusion Models & DDPM
11:44 - Learning Vector…
Welch Labs Book: https://www.welchlabs.com/resources/imaginary-numbers-book
Sections
0:00 - Intro
3:37 - CLIP
6:25 - Shared Embedding Space
8:16 - Diffusion Models & DDPM
11:44 - Learning Vector…
🔥1
Forwarded from DeepMind AI Expert (Farzad 🦅)
اندرو کارپثی گفته بود:
Can you take my 2h13m tokenizer video and translate [into] a book chapter.
We've done it! It includes prose, code & key images. It's a great way to learn this key piece of how LLMs work.
https://www.fast.ai/posts/2025-10-16-karpathy-tokenizers
https://solve.it
Can you take my 2h13m tokenizer video and translate [into] a book chapter.
We've done it! It includes prose, code & key images. It's a great way to learn this key piece of how LLMs work.
https://www.fast.ai/posts/2025-10-16-karpathy-tokenizers
https://solve.it
fast.ai
Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs – fast.ai
A text and code version of Karpathy’s famous tokenizer video.