Python | Machine Learning | Coding

What is torch.nn really?

When I started working with PyTorch, my biggest question was: "What is torch.nn?".

This article explains it quite well.

📌 Read

#pytorch #AIEngineering #MachineLearning #DeepLearning #LLMs #RAG #MLOps #Python #GitHubProjects #AIForBeginners #ArtificialIntelligence #NeuralNetworks #OpenSourceAI #DataScienceCareers

✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

Please open Telegram to view this post

VIEW IN TELEGRAM

❤5

8.49K viewsedited 11:51

Python | Machine Learning | Coding | R

😉

A list of the best YouTube videos

✅

To learn data science

1️⃣

SQL language

⬅️

Learning

💰

4-hour SQL course from zero to one hundred

💰

Window functions tutorial

⬅️

Projects

📎

Starting your first SQL project

💰

Data cleansing project

💰

Restaurant order analysis

⬅️

Interview

💰

How to crack the SQL interview?

➖

2️⃣

Python

⬅️

Learning

💰

12-hour Python for Data Science course

⬅️

Projects

💰

Python project for beginners

💰

Analyzing Corona Data with Python

⬅️

Interview

💰

Python interview golden tricks

💰

Python Interview Questions

➖

3️⃣

Statistics and machine learning

⬅️

Learning

💰

7-hour course in applied statistics

💰

Machine Learning Training Playlist

⬅️

Projects

💰

Practical ML Project

⬅️

Interview

💰

ML Interview Questions and Answers

💰

How to pass a statistics interview?

➖

4️⃣

Product and business case studies

⬅️

Learning

💰

Building strong product understanding

💰

Product Metric Definition

⬅️

Interview

💰

Case Study Analysis Framework

💰

How to shine in a business interview?

#DataScience #SQL #Python #MachineLearning #Statistics #BusinessAnalytics #ProductCaseStudies #DataScienceProjects #InterviewPrep #LearnDataScience #YouTubeLearning #CodingInterview #MLInterview #SQLProjects #PythonForDataScience

✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

Please open Telegram to view this post

VIEW IN TELEGRAM

❤16👍3🎉1

22.2K views17:34

Python | Machine Learning | Coding | R

📚 JaidedAI/EasyOCR — an open-source Python library for Optical Character Recognition (OCR) that's easy to use and supports over 80 languages out of the box.

### 🔍 Key Features:

🔸 Extracts text from images and scanned documents — including handwritten notes and unusual fonts
🔸 Supports a wide range of languages like English, Russian, Chinese, Arabic, and more
🔸 Built on PyTorch — uses modern deep learning models (not the old-school Tesseract)
🔸 Simple to integrate into your Python projects

### ✅ Example Usage:

import easyocr

reader = easyocr.Reader(['en', 'ru'])  # Choose supported languages
result = reader.readtext('image.png')

### 📌 Ideal For:

✅ Text extraction from photos, scans, and documents
✅ Embedding OCR capabilities in apps (e.g. automated data entry)

🔗 GitHub: https://github.com/JaidedAI/EasyOCR

👉 Follow us for more: @DataScienceN

#Python #OCR #MachineLearning #ComputerVision #EasyOCR

❤3👎1🎉1

5.82K views06:39

Python | Machine Learning | Coding | R

🔥

Master Vision Transformers with 65+ MCQs!

🔥

Are you preparing for AI interviews or want to test your knowledge in Vision Transformers (ViT)?

🧠 Dive into 65+ curated Multiple Choice Questions covering the fundamentals, architecture, training, and applications of ViT — all with answers!

🌐 Explore Now: https://hackmd.io/@husseinsheikho/vit-mcq

🔹 Table of Contents
Basic Concepts (Q1–Q15)
Architecture & Components (Q16–Q30)
Attention & Transformers (Q31–Q45)
Training & Optimization (Q46–Q55)
Advanced & Real-World Applications (Q56–Q65)
Answer Key & Explanations

#VisionTransformer #ViT #DeepLearning #ComputerVision #Transformers #AI #MachineLearning #MCQ #InterviewPrep

✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

❤6

6.78K viewsedited 10:19

Python | Machine Learning | Coding | R

🚀 Comprehensive Guide: How to Prepare for an Image Processing Job Interview – 500 Most Common Interview Questions

Let's start: https://hackmd.io/@husseinsheikho/IP

#ImageProcessing #ComputerVision #OpenCV #Python #InterviewPrep #DigitalImageProcessing #MachineLearning #AI #SignalProcessing #ComputerGraphics

✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

❤4👎1🔥1

5.53K views09:59

Python | Machine Learning | Coding | R

🚀 Comprehensive Guide: How to Prepare for a Graph Neural Networks (GNN) Job Interview – 350 Most Common Interview Questions

Read: https://hackmd.io/@husseinsheikho/GNN-interview

#GNN #GraphNeuralNetworks #MachineLearning #DeepLearning #AI #DataScience #PyTorchGeometric #DGL #NodeClassification #LinkPrediction #GraphML

✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

❤8

6.16K views15:21

Python | Machine Learning | Coding | R

0:09

This media is not supported in your browser

VIEW IN TELEGRAM

🥇

This repo is like gold for every data scientist!

✅ Just open your browser; a ton of interactive exercises and real experiences await you. Any question about statistics, probability, Python, or machine learning, you'll get the answer right there! With code, charts, even animations. This way, you don't waste time, and what you learn really sticks in your mind!

⬅️ Data science statistics and probability topics
⬅️ Clustering
⬅️ Principal Component Analysis (PCA)
⬅️ Bagging and Boosting techniques
⬅️ Linear regression
⬅️ Neural networks and more...

┌ 📂 Int Data Science Python Dash
└ 🐱 GitHub-Repos

👉

@codeprogrammer

#Python #OpenCV #Automation #ML #AI #DEEPLEARNING #MACHINELEARNING #ComputerVision

Please open Telegram to view this post

VIEW IN TELEGRAM

❤9👍4💯1🏆1

8.69K viewsedited 20:18

Python | Machine Learning | Coding | R

𝗣𝗿𝗲𝗽𝗮𝗿𝗲 𝗳𝗼𝗿 𝗝𝗼𝗯 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄𝘀.

In DS or AI/ML interviews, you need to be able to explain models, debug them live, and design AI/ML systems from scratch. If you can’t demonstrate this during an interview, expect to hear, “We’ll get back to you.”

The attached person's name is Chip Huyen. Hopefully you know her; if not, then I can't help you here. She is probably one of the finest authors in the field of AI/ML.

She designed proper documentation/a book for common ML interview questions.

Target Audiences: ML engineer, a platform engineer, a research scientist, or you want to do ML but don’t yet know the differences among those titles.Check the comment section for links and repos.

📌 link:
https://huyenchip.com/ml-interviews-book/

#JobInterview #MachineLearning #AI #DataScience #MLEngineer #AIInterview #TechCareers #DeepLearning #AICommunity #MLSystems #CareerGrowth #AIJobs #ChipHuyen #InterviewPrep #DataScienceCommunit

https://t.iss.one/CodeProgrammer

🌟

Please open Telegram to view this post

VIEW IN TELEGRAM

❤6💯2

6.85K viewsedited 04:57

Python | Machine Learning | Coding | R

🤖🧠 The Little Book of Deep Learning – A Complete Summary and Chapter-Wise Overview

🗓️ 08 Oct 2025
📚 AI News & Trends

In the ever-evolving world of Artificial Intelligence, deep learning continues to be the driving force behind breakthroughs in computer vision, speech recognition and natural language processing. For those seeking a clear, structured and accessible guide to understanding how deep learning really works, “The Little Book of Deep Learning” by François Fleuret is a gem. This ...

#DeepLearning #ArtificialIntelligence #MachineLearning #NeuralNetworks #AIGuides #FrancoisFleuret

❤6

2.89K viewsedited 09:17

📖 Read More

📣 BEST TELEGRAM CHANNELS

Python | Machine Learning | Coding | R

🤖🧠 Build a Large Language Model From Scratch: A Step-by-Step Guide to Understanding and Creating LLMs

🗓️ 08 Oct 2025
📚 AI News & Trends

In recent years, Large Language Models (LLMs) have revolutionized the world of Artificial Intelligence (AI). From ChatGPT and Claude to Llama and Mistral, these models power the conversational systems, copilots, and generative tools that dominate today’s AI landscape. However, for most developers and learners, the inner workings of these systems remain a mystery until now. ...

#LargeLanguageModels #LLM #ArtificialIntelligence #DeepLearning #MachineLearning #AIGuides

❤3

3.24K views10:17

📖 Read More

📣 BEST TELEGRAM CHANNELS

Python | Machine Learning | Coding | R

🤖🧠 Mastering Large Language Models: Top #1 Complete Guide to Maxime Labonne’s LLM Course

🗓️ 22 Oct 2025
📚 AI News & Trends

In the rapidly evolving landscape of artificial intelligence, large language models (LLMs) have become the foundation of modern AI innovation powering tools like ChatGPT, Claude, Gemini and countless enterprise AI applications. However, building, fine-tuning and deploying these models require deep technical understanding and hands-on expertise. To bridge this knowledge gap, Maxime Labonne, a leading AI ...

#LLM #ArtificialIntelligence #MachineLearning #DeepLearning #AIEngineering #LargeLanguageModels

❤3🎉1

2.28K views12:53

📖 Read More

📣 BEST TELEGRAM CHANNELS

Python | Machine Learning | Coding | R

🤖🧠 The Ultimate #1 Collection of AI Books In Awesome-AI-Books Repository

🗓️ 22 Oct 2025
📚 AI News & Trends

Artificial Intelligence (AI) has emerged as one of the most transformative technologies of the 21st century. From powering self-driving cars to enabling advanced conversational AI like ChatGPT, AI is redefining how humans interact with machines. However, mastering AI requires a strong foundation in theory, mathematics, programming and hands-on experimentation. For enthusiasts, students and professionals seeking ...

#ArtificialIntelligence #AIBooks #MachineLearning #DeepLearning #AIResources #TechBooks

❤2🔥1

2.49K views14:54

📖 Read More

📣 BEST TELEGRAM CHANNELS

Python | Machine Learning | Coding | R

🤖🧠 Master Machine Learning: Explore the Ultimate “Machine-Learning-Tutorials” Repository

🗓️ 23 Oct 2025
📚 AI News & Trends

In today’s data-driven world, Machine Learning (ML) has become the cornerstone of modern technology from intelligent chatbots to predictive analytics and recommendation systems. However, mastering ML isn’t just about coding, it requires a structured understanding of algorithms, statistics, optimization techniques and real-world problem-solving. That’s where Ujjwal Karn’s Machine-Learning-Tutorials GitHub repository stands out. This open-source, topic-wise ...

#MachineLearning #MLTutorials #ArtificialIntelligence #DataScience #OpenSource #AIEducation

❤5👍1

3.06K views14:56

📖 Read More

📣 BEST TELEGRAM CHANNELS

Python | Machine Learning | Coding | R

Forwarded from Python Data Science Jobs & Interviews

In Python, NumPy is the cornerstone of scientific computing, offering high-performance multidimensional arrays and tools for working with them—critical for data science interviews and real-world applications! 📊

import numpy as np

# Array Creation - The foundation of NumPy
arr = np.array([1, 2, 3])
zeros = np.zeros((2, 3))        # 2x3 matrix of zeros
ones = np.ones((2, 2), dtype=int)  # Integer matrix
arange = np.arange(0, 10, 2)    # [0 2 4 6 8]
linspace = np.linspace(0, 1, 5) # [0.  0.25 0.5  0.75 1.  ]
print(linspace)

# Array Attributes - Master your data's structure
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.shape)  # Output: (2, 3)
print(matrix.ndim)   # Output: 2
print(matrix.dtype)  # Output: int64
print(matrix.size)   # Output: 6

# Indexing & Slicing - Precision data access
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(data[1, 2])      # Output: 6 (row 1, col 2)
print(data[0:2, 1:3])  # Output: [[2 3], [5 6]]
print(data[:, -1])     # Output: [3 6 9] (last column)

# Reshaping Arrays - Transform dimensions effortlessly
flat = np.arange(6)
reshaped = flat.reshape(2, 3)
raveled = reshaped.ravel()
print(reshaped)
# Output: [[0 1 2], [3 4 5]]
print(raveled)  # Output: [0 1 2 3 4 5]

# Stacking Arrays - Combine datasets vertically/horizontally
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.vstack((a, b)))  # Vertical stack
# Output: [[1 2 3], [4 5 6]]
print(np.hstack((a, b)))  # Horizontal stack
# Output: [1 2 3 4 5 6]

# Mathematical Operations - Vectorized calculations
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print(x + y)       # Output: [5 7 9]
print(x * 2)       # Output: [2 4 6]
print(np.dot(x, y)) # Output: 32 (1*4 + 2*5 + 3*6)

# Broadcasting Magic - Operate on mismatched shapes
matrix = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10
print(matrix + scalar)
# Output: [[11 12 13], [14 15 16]]

# Aggregation Functions - Statistical power in one line
values = np.array([1, 5, 3, 9, 7])
print(np.sum(values))   # Output: 25
print(np.mean(values))  # Output: 5.0
print(np.max(values))   # Output: 9
print(np.std(values))   # Output: 2.8284271247461903

# Boolean Masking - Filter data like a pro
temperatures = np.array([18, 25, 12, 30, 22])
hot_days = temperatures > 24
print(temperatures[hot_days])  # Output: [25 30]

# Random Number Generation - Simulate real-world data
print(np.random.rand(2, 2))      # Uniform distribution
print(np.random.randn(3))        # Normal distribution
print(np.random.randint(0, 10, (2, 3)))  # Random integers

# Linear Algebra Essentials - Solve equations like a physicist
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
x = np.linalg.solve(A, b)
print(x)  # Output: [2. 3.] (Solution to 3x+y=9 and x+2y=8)

# Matrix inverse and determinant
print(np.linalg.inv(A))   # Output: [[ 0.4 -0.2], [-0.2  0.6]]
print(np.linalg.det(A))   # Output: 5.0

# File Operations - Save/load your computational work
data = np.array([[1, 2], [3, 4]])
np.save('array.npy', data)
loaded = np.load('array.npy')
print(np.array_equal(data, loaded))  # Output: True

# Interview Power Move: Vectorization vs Loops
# 10x faster than native Python loops!
def square_sum(n):
    arr = np.arange(n)
    return np.sum(arr ** 2)

print(square_sum(5))  # Output: 30 (0²+1²+2²+3²+4²)

# Pro Tip: Memory-efficient data processing
# Process 1GB array without loading entire dataset
large_array = np.memmap('large_data.bin', dtype='float32', mode='r', shape=(1000000, 100))
print(large_array[0:5, 0:3])  # Process small slice

By: @DataScienceQ 🚀

#Python #NumPy #DataScience #CodingInterview #MachineLearning #ScientificComputing #DataAnalysis #Programming #TechJobs #DeveloperTips

❤6

1.51K views12:29

Python | Machine Learning | Coding | R

🤖🧠 AI Projects : A Comprehensive Showcase of Machine Learning, Deep Learning and Generative AI

🗓️ 27 Oct 2025
📚 AI News & Trends

Artificial Intelligence (AI) is transforming industries across the globe, driving innovation through automation, data-driven insights and intelligent decision-making. Whether it’s predicting house prices, detecting diseases or building conversational chatbots, AI is at the core of modern digital solutions. The AI Project Gallery by Hema Kalyan Murapaka is an exceptional GitHub repository that curates a wide ...

#AI #MachineLearning #DeepLearning #GenerativeAI #ArtificialIntelligence #GitHub

❤3🔥1

2.21K views16:49

📖 Read More

📣 BEST TELEGRAM CHANNELS

Python | Machine Learning | Coding | R

In Python, image processing unlocks powerful capabilities for computer vision, data augmentation, and automation—master these techniques to excel in ML engineering interviews and real-world applications! 🖼

# PIL/Pillow Basics - The essential image library
from PIL import Image

# Open and display image
img = Image.open("input.jpg")
img.show()

# Convert formats
img.save("output.png")
img.convert("L").save("grayscale.jpg")  # RGB to grayscale

# Basic transformations
img.rotate(90).save("rotated.jpg")
img.resize((300, 300)).save("resized.jpg")
img.transpose(Image.FLIP_LEFT_RIGHT).save("mirrored.jpg")

more explain: https://hackmd.io/@husseinsheikho/imageprocessing

#Python #ImageProcessing #ComputerVision #Pillow #OpenCV #MachineLearning #CodingInterview #DataScience #Programming #TechJobs #DeveloperTips #AI #DeepLearning #CloudComputing #Docker #BackendDevelopment #SoftwareEngineering #CareerGrowth #TechTips #Python3

❤5👍1

2.39K views19:48

Python | Machine Learning | Coding | R

🤖🧠 MLOps Basics: A Complete Guide to Building, Deploying and Monitoring Machine Learning Models

🗓️ 30 Oct 2025
📚 AI News & Trends

Machine Learning models are powerful but building them is only half the story. The true challenge lies in deploying, scaling and maintaining these models in production environments – a process that requires collaboration between data scientists, developers and operations teams. This is where MLOps (Machine Learning Operations) comes in. MLOps combines the principles of DevOps ...

#MLOps #MachineLearning #DevOps #ModelDeployment #DataScience #ProductionAI

1.24K views20:14

📖 Read More

📣 BEST TELEGRAM CHANNELS

Python | Machine Learning | Coding | R

🤖🧠 MiniMax-M2: The Open-Source Revolution Powering Coding and Agentic Intelligence

🗓️ 30 Oct 2025
📚 AI News & Trends

Artificial intelligence is evolving faster than ever, but not every innovation needs to be enormous to make an impact. MiniMax-M2, the latest release from MiniMax-AI, demonstrates that efficiency and power can coexist within a streamlined framework. MiniMax-M2 is an open-source Mixture of Experts (MoE) model designed for coding tasks, multi-agent collaboration and automation workflows. With ...

#MiniMaxM2 #OpenSource #MachineLearning #CodingAI #AgenticIntelligence #MixtureOfExperts

❤1👍1🔥1

1.31K views21:14

📖 Read More

📣 BEST TELEGRAM CHANNELS

Python | Machine Learning | Coding | R

💡 Keras: Building Neural Networks Simply

Keras is a high-level deep learning API, now part of TensorFlow, designed for fast and easy experimentation. This guide covers the fundamental workflow: defining, compiling, training, and using a neural network model.

from tensorflow import keras
from tensorflow.keras import layers

# Define a Sequential model
model = keras.Sequential([
    # Input layer with 64 neurons, expecting flat input data
    layers.Dense(64, activation="relu", input_shape=(784,)),
    # A hidden layer with 32 neurons
    layers.Dense(32, activation="relu"),
    # Output layer with 10 neurons for 10-class classification
    layers.Dense(10, activation="softmax")
])

model.summary()

• Model Definition: keras.Sequential creates a simple, layer-by-layer model.
• layers.Dense is a standard fully-connected layer. The first layer must specify the input_shape.
• activation functions like "relu" introduce non-linearity, while "softmax" is used on the output layer for multi-class classification to produce probabilities.

# (Continuing from the previous step)
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print("Model compiled successfully.")

• Compilation: .compile() configures the model for training.
• optimizer is the algorithm used to update the model's weights (e.g., 'adam' is a popular choice).
• loss is the function the model tries to minimize during training. sparse_categorical_crossentropy is common for integer-based classification labels.
• metrics are used to monitor the training and testing steps. Here, we track accuracy.

import numpy as np

# Create dummy training data
x_train = np.random.random((1000, 784))
y_train = np.random.randint(10, size=(1000,))

# Train the model
history = model.fit(
    x_train,
    y_train,
    epochs=5,
    batch_size=32,
    verbose=0 # Hides the progress bar for a cleaner output
)

print(f"Training complete. Final accuracy: {history.history['accuracy'][-1]:.4f}")
# Output (will vary):
# Training complete. Final accuracy: 0.4570

• Training: The .fit() method trains the model on your data.
• x_train and y_train are your input features and target labels.
• epochs defines how many times the model will see the entire dataset.
• batch_size is the number of samples processed before the model is updated.

# Create a single dummy sample to test
x_test = np.random.random((1, 784))

# Get the model's prediction
predictions = model.predict(x_test)
predicted_class = np.argmax(predictions[0])

print(f"Predicted class: {predicted_class}")
print(f"Confidence scores: {predictions[0].round(2)}")
# Output (will vary):
# Predicted class: 3
# Confidence scores: [0.09 0.1  0.1  0.12 0.1  0.09 0.11 0.1  0.09 0.1 ]

• Prediction: .predict() is used to make predictions on new, unseen data.
• For a classification model with a softmax output, this returns an array of probabilities for each class.
• np.argmax() is used to find the index (the class) with the highest probability score.

#Keras #TensorFlow #DeepLearning #MachineLearning #Python

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤3🔥3👍1

1.16K views20:41

Python | Machine Learning | Coding | R

#NLP #Lesson #SentimentAnalysis #MachineLearning

Building an NLP Model from Scratch: Sentiment Analysis

This lesson will guide you through creating a complete Natural Language Processing (NLP) project. We will build a sentiment analysis classifier that can determine if a piece of text is positive or negative.

---

Step 1: Setup and Data Preparation

First, we need to import the necessary libraries and prepare our dataset. For simplicity, we'll use a small, hard-coded list of sentences. In a real-world project, you would load this data from a file (e.g., a CSV).

#Python #DataPreparation

# Imports and Data
import re
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import nltk
from nltk.corpus import stopwords

# You may need to download stopwords for the first time
# nltk.download('stopwords')

# Sample Data (In a real project, load this from a file)
texts = [
    "I love this movie, it's fantastic!",
    "This was a terrible film.",
    "The acting was superb and the plot was great.",
    "I would not recommend this to anyone.",
    "It was an okay movie, not the best but enjoyable.",
    "Absolutely brilliant, a must-see!",
    "A complete waste of time and money.",
    "The story was compelling and engaging."
]
# Labels: 1 for Positive, 0 for Negative
labels = [1, 0, 1, 0, 1, 1, 0, 1]

---

Step 2: Text Preprocessing

Computers don't understand words, so we must clean and process our text data first. This involves making text lowercase, removing punctuation, and filtering out common "stop words" (like 'the', 'a', 'is') that don't add much meaning.

#TextPreprocessing #DataCleaning

# Text Preprocessing Function
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    # Make text lowercase
    text = text.lower()
    # Remove punctuation
    text = re.sub(r'[^\w\s]', '', text)
    # Tokenize and remove stopwords
    tokens = text.split()
    filtered_tokens = [word for word in tokens if word not in stop_words]
    return " ".join(filtered_tokens)

# Apply preprocessing to our dataset
processed_texts = [preprocess_text(text) for text in texts]
print("--- Original vs. Processed ---")
for i in range(3):
    print(f"Original: {texts[i]}")
    print(f"Processed: {processed_texts[i]}\n")

---

Step 3: Splitting the Data

We must split our data into a training set (to teach the model) and a testing set (to evaluate its performance on unseen data).

#MachineLearning #TrainTestSplit

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    processed_texts, 
    labels, 
    test_size=0.25, # Use 25% of data for testing
    random_state=42 # for reproducibility
)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

---

Step 4: Feature Extraction (Vectorization)

We need to convert our cleaned text into a numerical format. We'll use TF-IDF (Term Frequency-Inverse Document Frequency). This technique converts text into vectors of numbers, giving more weight to words that are important to a document but not common across all documents.

#FeatureEngineering #TFIDF #Vectorization

800 views11:33

About

Blog

Apps

Platform