Data Science | Machine Learning with Python for Researchers

Data Science | Machine Learning with Python for Researchers pinned a photo

06:02

Constrained Diffusion Implicit Models!

We use diffusion models to solve noisy inverse problems like inpainting, sparse-recovery, and colorization. 10-50x faster than previous methods!

Paper: arxiv.org/pdf/2411.00359

Demo: https://t.co/m6o9GLnnZF

https://t.iss.one/DataScienceT

👍6

3.96K viewsedited 07:06

Data Science | Machine Learning with Python for Researchers

Forwarded from Python | Machine Learning | Coding | R

A promising digital wallet will distribute $40 for free to every user who creates an account on this wallet

Terms of creating an account: Subscribe to their channel only.

https://t.iss.one/TronKeeperBot/app?startapp=418788114

TronKeeper

TronKeeper is your secure Tron wallet on Telegram. invite friends, and earn USDT rewards! Simple, fast, and secure . 🚀

2.58K views18:54

Data Science | Machine Learning with Python for Researchers

📌 Practical exercises and additional materials for the book "Build a Large Language Model (From Scratch)"

A Github repository with practical exercises, notebooks with code for developing, pre-training, and fine-tuning a GPT-type LLM model based on one of the best books on building an LLM from scratch.

▶️

About the book:
In this book, you will learn and understand how large language models work from the inside, creating your own LLM step by step, with a detailed explanation of each stage in clear language, diagrams and examples.

The method described in the book demonstrates the approach used to create large fundamental models such as those underlying ChatGPT.

In the repository, each chapter of the book has several (3-4) applied examples in ipynb format or as an executable python script. The code is aimed at a wide audience, is designed to run on regular laptops and does not require specialized equipment.

▶️ The main value of the repository is additional practical materials that will help you to study in more depth the subtleties and nuances of the process of setting up and learning LLM:

Setting

🟢 Tips on Setting Up Python
🟢 Installing Python Packages and Libraries
🟢 Docker Environment Setup Guide

Chapter 2: Working with Text Data

🟠 Comparison of different implementations of Byte Pair Encoding (BPE)
🟠 Understanding the difference between embedding and line layers
🟠 Dataloader Intuition with Prime Numbers

Chapter 3: Code of Attention Mechanisms

🟢 Comparison of Effective Implementations of Multi-Head Attention
🟢 PyTorch Buffers

Chapter 4: Implementing the GPT Model from Scratch

🟠 FLOPS Analysis

Chapter 5: Pre-training on unlabeled data

🟢 Alternative Loading of HuggingFace Scales Using Transformers
🟢 Pre-training GPT on the Project Gutenberg dataset
🟢 Adding more features to the learning cycle
🟢 Hyperparameter optimization for pretraining
🟢 Creating a user interface for interacting with LLM
🟢 Convert GPT to Llama
🟢 Llama 3.2 from scratch
🟢 Memory-efficient model loading

Chapter 6: Fine-tuning for Classification

🟠 More experiments on fine-tuning the different layers and using larger models
🟠 Fine-tuning various models based on a 50K row IMDB movie review dataset.
🟠 Building a User Interface for Interacting with a GPT-Based Spam Classifier

Chapter 7: Fine-tuning to Follow Instructions

🟢 Dataset utilities for finding close duplicates and creating passive voice entries
🟢 Evaluating responses to instructions using OpenAI and Ollama APIs
🟢 Creating a dataset for fine-tuning instructions
🟢 Improving the dataset for fine-tuning instructions
🟢 Creating a Preference Dataset with Llama 3.1 70B and Ollama
🟢 DPO for LLM Alignment procedure
🟢 Creating a user interface for interacting with a GPT model with fine-tuning of instructions

🖥

Github

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

👍10

3.95K views20:25

Data Science | Machine Learning with Python for Researchers

Docling Technical Report

Paper: https://arxiv.org/pdf/2408.09869v3.pdf

Code 1: https://github.com/DS4SD/docling
Code 2: https://github.com/DS4SD/docling-core

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

2.91K viewsedited 14:42

Data Science | Machine Learning with Python for Researchers

OmniGen: Unified Image Generation

Paper: https://arxiv.org/pdf/2409.11340v1.pdf

Code: https://github.com/vectorspacelab/omnigen

Datasets: DreamBooth - MagicBrush

https://t.iss.one/DataScienceT

⭐️

Please open Telegram to view this post

VIEW IN TELEGRAM

❤1👍1

3.17K viewsedited 14:44

Data Science | Machine Learning with Python for Researchers

Forwarded from Tomas

🤑EARN YOUR $100 TODAY! EASY!

Lisa Trader has launched a free marathon on her VIP channel.

Now absolutely everyone can earn from trading. It has become even easier to earn in the cryptocurrency market, you can start today!

WHAT DO YOU NEED TO START?

1. Subscribe to the channel SIGNALS BY LISA TRADER 📈.
2. Write “MARATHON” in private messages. She will then tell you how to get on the vip channel for absolutely FREE!

👉CLICK HERE👈
👉CLICK HERE👈
👉CLICK HERE👈

2.99K views12:00

Data Science | Machine Learning with Python for Researchers

Most classical ML algorithms cannot be trained with a batch implementation.

This is concerning because enterprises typically deal with tabular data and classical ML algorithms, such as tree-based methods, are frequently used for modeling.

For instance, to train a random forest from sklearn, the entire dataset must be present in memory. This limits its usage to only small/intermediate datasets.

There are two ways to extend random forests to large datasets.

1) Use big-data frameworks like Spark MLlib to train them.

2) Use random patches, which I learned from the PhD thesis of Dr. Gilles Louppe — Understanding Random Forests.

> Here’s what he proposed.

Note: This approach only works in an ensemble setting. So, you would have to train multiple models.

The idea is to sample random data patches (both rows and columns) and train a decision tree model on the patch.

Repeat this step multiple times to obtain the entire random forest model.

> Here's why it works.

The core objective of Bagging is to build trees that are as different as possible.

In this case, the dataset overlap between any two trees is NOT expected to be huge compared to the typical random forest. This aids in the Bagging objective.

His thesis presented benchmarks on 13 datasets:
- Random patches performed better than the random forest on 11 datasets.
- On the other two datasets, the difference was quite small (~0.05).

And this is how we can train a random forest model on large datasets that do not fit into memory.

https://t.iss.one/DataScienceT

⭐️

Please open Telegram to view this post

VIEW IN TELEGRAM

👍4❤2

3.55K viewsedited 06:53

Data Science | Machine Learning with Python for Researchers

OpenCoder doesn't get enough love

They open-sourced the entire pipeline to create QwenCoder-level code models.

This includes:
- Large datasets
- High-quality models
- Eval framework

Tons of great lessons and observations in the paper

📝 Paper: arxiv.org/abs/2411.04905

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

👍6

3.92K viewsedited 05:45

Data Science | Machine Learning with Python for Researchers

🧹🪣 MOP+MiHo+NCC 🖼️👀: Image Matching Filtering and Refinement by Planes and Beyond

🖥

Github: https://github.com/fb82/miho

📕

Paper: https://arxiv.org/abs/2411.09484v1

🌟 Dataset: https://paperswithcode.com/dataset/scannet

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

3.13K viewsedited 09:53

Data Science | Machine Learning with Python for Researchers

Explore "Pretraining LLMs," a short course developed with upstageai.

The course covers pretraining from scratch, continuing pretraining on custom data, and how using smaller open-source models can reduce costs.

Take the course for free: https://hubs.la/Q02YFKyx0

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

👍7

3.67K views05:43

Data Science | Machine Learning with Python for Researchers

Hey guys,

As you all know, the purpose of this community is to share notes and grow together. Hence, today I am sharing with you an app called DevBytes. It keeps you updated about dev and tech news.

This brilliant app provides curated, bite-sized updates on the latest tech news/dev content. Whether it’s new frameworks, AI breakthroughs, or cloud services, DevBytes brings the essentials straight to you.

If you're tired of information overload and want a smarter way to stay informed, give DevBytes a try.

Download here: https://play.google.com/store/apps/details?id=com.candelalabs.devbytes&hl=en-IN
It’s time to read less and know more!

Google Play

DevBytes-For Busy Developers – Apps on Google Play

Get the latest tech news, coding tips, and programming insights for developers.

👍4❤2

3.91K views07:00

Data Science | Machine Learning with Python for Researchers

I highly recommend downloading the app, there is a solid guide to mastering AI.

👍3

3.8K views10:34

Data Science | Machine Learning with Python for Researchers

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

🖥

Github: https://github.com/gair-nlp/o1-journey

📕

Paper: https://arxiv.org/abs/2411.16489v1

🌟 Dataset: https://paperswithcode.com/dataset/lima

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

👍1

3.29K viewsedited 06:00

Data Science | Machine Learning with Python for Researchers

Forwarded from Tomas

❗️ WITH LISA YOU WILL START EARNING MONEY

Lisa will leave a link with free entry to a channel that draws money every day. Each subscriber gets between $100 and $5,000.

👉🏻CLICK HERE TO JOIN THE CHANNEL 👈🏻
👉🏻CLICK HERE TO JOIN THE CHANNEL!👈🏻
👉🏻CLICK HERE TO JOIN THE CHANNEL 👈🏻

🚨FREE FOR THE FIRST 500 SUBSCRIBERS ONLY!

👍6❤1

3.74K views12:23

Data Science | Machine Learning with Python for Researchers

⭐️

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

RAG-Diffusion now supports FLUX.1 Redux!

🔥 Ready to take control? Customize your region-based images with our training-free solution and achieve powerful, precise results!

🔗 Code: https://github.com/NJU-PCALab/RAG-Diffusion

https://t.iss.one/DataScienceT

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

Please open Telegram to view this post

VIEW IN TELEGRAM

👍4❤1

4.03K views18:33

About

Blog

Apps

Platform