Data Science Machine Learning Data Analysis
37.1K subscribers
1.3K photos
27 videos
39 files
1.24K links
This channel is for Programmers, Coders, Software Engineers.

1- Data Science
2- Machine Learning
3- Data Visualization
4- Artificial Intelligence
5- Data Analysis
6- Statistics
7- Deep Learning

Cross promotion and ads: @hussein_sheikho
Download Telegram
Topic: Handling Datasets of All Types – Part 2 of 5: Data Cleaning and Preprocessing

---

1. Importance of Data Cleaning

• Real-world data is often noisy, incomplete, or inconsistent.

• Cleaning improves data quality and model performance.

---

2. Handling Missing Data

Detect missing values using isnull() or isna() in pandas.

• Strategies to handle missing data:

* Remove rows or columns with missing values:

df.dropna(inplace=True)


* Impute missing values with mean, median, or mode:

df['column'].fillna(df['column'].mean(), inplace=True)


---

3. Handling Outliers

• Outliers can skew analysis and model results.

• Detect outliers using:

* Boxplots
* Z-score method
* IQR (Interquartile Range)

• Handle by removal or transformation.

---

4. Data Normalization and Scaling

• Many ML models require features to be on a similar scale.

• Common techniques:

* Min-Max Scaling (scales values between 0 and 1)

* Standardization (mean = 0, std = 1)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = scaler.fit_transform(df[['feature1', 'feature2']])


---

5. Encoding Categorical Variables

• Convert categorical data into numerical:

* Label Encoding: Assigns an integer to each category.

* One-Hot Encoding: Creates binary columns for each category.

pd.get_dummies(df['category_column'])


---

6. Summary

• Data cleaning is essential for reliable modeling.

• Handling missing values, outliers, scaling, and encoding are key preprocessing steps.

---

Exercise

• Load a dataset, identify missing values, and apply mean imputation.

• Detect outliers using IQR and remove them.

• Normalize numeric features using standardization.

---

#DataCleaning #DataPreprocessing #MachineLearning #Python #DataScience

https://t.iss.one/DataScienceM
5👍1
Topic: Handling Datasets of All Types – Part 2 of 5: Data Cleaning and Preprocessing

---

1. Importance of Data Cleaning

• Real-world data is often noisy, incomplete, or inconsistent.

• Cleaning improves data quality and model performance.

---

2. Handling Missing Data

Detect missing values using isnull() or isna() in pandas.

• Strategies to handle missing data:

* Remove rows or columns with missing values:

df.dropna(inplace=True)


* Impute missing values with mean, median, or mode:

df['column'].fillna(df['column'].mean(), inplace=True)


---

3. Handling Outliers

• Outliers can skew analysis and model results.

• Detect outliers using:

* Boxplots
* Z-score method
* IQR (Interquartile Range)

• Handle by removal or transformation.

---

4. Data Normalization and Scaling

• Many ML models require features to be on a similar scale.

• Common techniques:

* Min-Max Scaling (scales values between 0 and 1)

* Standardization (mean = 0, std = 1)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = scaler.fit_transform(df[['feature1', 'feature2']])


---

5. Encoding Categorical Variables

• Convert categorical data into numerical:

* Label Encoding: Assigns an integer to each category.

* One-Hot Encoding: Creates binary columns for each category.

pd.get_dummies(df['category_column'])


---

6. Summary

• Data cleaning is essential for reliable modeling.

• Handling missing values, outliers, scaling, and encoding are key preprocessing steps.

---

Exercise

• Load a dataset, identify missing values, and apply mean imputation.

• Detect outliers using IQR and remove them.

• Normalize numeric features using standardization.

---

#DataCleaning #DataPreprocessing #MachineLearning #Python #DataScience

https://t.iss.one/DataScience4M
4👍1
Topic: Handling Datasets of All Types – Part 4 of 5: Text Data Processing and Natural Language Processing (NLP)

---

1. Understanding Text Data

• Text data is unstructured and requires preprocessing to convert into numeric form for ML models.

• Common tasks: classification, sentiment analysis, language modeling.

---

2. Text Preprocessing Steps

Tokenization: Splitting text into words or subwords.

Lowercasing: Convert all text to lowercase for uniformity.

Removing Punctuation and Stopwords: Clean unnecessary words.

Stemming and Lemmatization: Reduce words to their root form.

---

3. Encoding Text Data

Bag-of-Words (BoW): Represents text as word count vectors.

TF-IDF (Term Frequency-Inverse Document Frequency): Weighs words based on importance.

Word Embeddings: Dense vector representations capturing semantic meaning (e.g., Word2Vec, GloVe).

---

4. Loading and Processing Text Data in Python

from sklearn.feature_extraction.text import TfidfVectorizer

texts = ["I love data science.", "Data science is fun."]
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(texts)


---

5. Handling Large Text Datasets

• Use libraries like NLTK, spaCy, and Transformers.

• For deep learning, tokenize using models like BERT or GPT.

---

6. Summary

• Text data needs extensive preprocessing and encoding.

• Choosing the right representation is crucial for model success.

---

Exercise

• Clean a set of sentences by tokenizing and removing stopwords.

• Convert cleaned text into TF-IDF vectors.

---

#NLP #TextProcessing #DataScience #MachineLearning #Python

https://t.iss.one/DataScienceM
3👍1
Topic: Handling Datasets of All Types – Part 5 of 5: Working with Time Series and Tabular Data

---

1. Understanding Time Series Data

• Time series data is a sequence of data points collected over time intervals.

• Examples: stock prices, weather data, sensor readings.

---

2. Loading and Exploring Time Series Data

import pandas as pd

df = pd.read_csv('time_series.csv', parse_dates=['date'], index_col='date')
print(df.head())


---

3. Key Time Series Concepts

Trend: Long-term increase or decrease in data.

Seasonality: Repeating patterns at regular intervals.

Noise: Random variations.

---

4. Preprocessing Time Series

• Handle missing data using forward/backward fill.

df.fillna(method='ffill', inplace=True)


• Resample data to different frequencies (daily, monthly).

df_resampled = df.resample('M').mean()


---

5. Working with Tabular Data

• Tabular data consists of rows (samples) and columns (features).

• Often requires handling missing values, encoding categorical variables, and scaling features (covered in previous parts).

---

6. Summary

• Time series data requires special preprocessing due to temporal order.

• Tabular data is the most common format, needing cleaning and feature engineering.

---

Exercise

• Load a time series dataset, fill missing values, and resample it monthly.

• For tabular data, encode categorical variables and scale numerical features.

---

#TimeSeries #TabularData #DataScience #MachineLearning #Python

https://t.iss.one/DataScienceM
5
Topic: 25 Important Questions on Handling Datasets of All Types in Python

---

1. What are the common types of datasets?
Structured, unstructured, and semi-structured.

---

2. How do you load a CSV file in Python?
Using pandas.read_csv() function.

---

3. How to check for missing values in a dataset?
Using df.isnull().sum() in pandas.

---

4. What methods can you use to handle missing data?
Remove rows/columns, mean/median/mode imputation, interpolation.

---

5. How to detect outliers in data?
Using boxplots, z-score, or interquartile range (IQR) methods.

---

6. What is data normalization?
Scaling data to a specific range, often \[0,1].

---

7. What is data standardization?
Rescaling data to have zero mean and unit variance.

---

8. How to encode categorical variables?
Label encoding or one-hot encoding.

---

9. What libraries help with image data processing in Python?
OpenCV, Pillow, scikit-image.

---

10. How do you load and preprocess images for ML models?
Resize, normalize pixel values, data augmentation.

---

11. How can audio data be loaded in Python?
Using libraries like librosa or scipy.io.wavfile.

---

12. What are MFCCs in audio processing?
Mel-frequency cepstral coefficients – features extracted from audio signals.

---

13. How do you preprocess text data?
Tokenization, removing stopwords, stemming, lemmatization.

---

14. What is TF-IDF?
A technique to weigh words based on frequency and importance.

---

15. How do you handle variable-length sequences in text or time series?
Padding sequences or using packed sequences.

---

16. How to handle time series missing data?
Forward fill, backward fill, interpolation.

---

17. What is data augmentation?
Creating new data samples by transforming existing data.

---

18. How to split datasets into training and testing sets?
Using train_test_split from scikit-learn.

---

19. What is batch processing in ML?
Processing data in small batches during training for efficiency.

---

20. How to save and load datasets efficiently?
Using formats like HDF5, pickle, or TFRecord.

---

21. What is feature scaling and why is it important?
Adjusting features to a common scale to improve model training.

---

22. How to detect and remove duplicate data?
Using df.duplicated() and df.drop_duplicates().

---

23. What is one-hot encoding and when to use it?
Converting categorical variables to binary vectors, used for nominal categories.

---

24. How to handle imbalanced datasets?
Techniques like oversampling, undersampling, or synthetic data generation (SMOTE).

---

25. How to visualize datasets in Python?
Using matplotlib, seaborn, or plotly for charts and graphs.

---

#DataScience #DataHandling #Python #MachineLearning #DataPreprocessing

https://t.iss.one/DataScience4M
6
Topic: Python PySpark Data Sheet – Part 1 of 3: Introduction, Setup, and Core Concepts

---

### 1. What is PySpark?

PySpark is the Python API for Apache Spark, a powerful distributed computing engine for big data processing.

PySpark allows you to leverage the full power of Apache Spark using Python, making it easier to:

• Handle massive datasets
• Perform distributed computing
• Run parallel data transformations

---

### 2. PySpark Ecosystem Components

Spark SQL – Structured data queries with DataFrame and SQL APIs
Spark Core – Fundamental engine for task scheduling and memory management
Spark Streaming – Real-time data processing
MLlib – Machine learning at scale
GraphX – Graph computation

---

### 3. Why PySpark over Pandas?

| Feature | Pandas | PySpark |
| -------------- | --------------------- | ----------------------- |
| Scale | Single machine | Distributed (Cluster) |
| Speed | Slower for large data | Optimized execution |
| Language | Python | Python on JVM via Py4J |
| Learning Curve | Easier | Medium (Big Data focus) |

---

### 4. PySpark Setup in Local Machine

#### Install PySpark via pip:

pip install pyspark


#### Start PySpark Shell:

pyspark


#### Sample Code to Initialize SparkSession:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
.appName("MyApp") \
.getOrCreate()


---

### 5. RDD vs DataFrame

| Feature | RDD | DataFrame |
| ------------ | ----------------------- | ------------------------------ |
| Type | Low-level API (objects) | High-level API (structured) |
| Optimization | Manual | Catalyst Optimizer (automatic) |
| Usage | Complex transformations | SQL-like operations |

---

### 6. Creating DataFrames

#### From Python List:

data = [("Alice", 25), ("Bob", 30)]
df = spark.createDataFrame(data, ["Name", "Age"])
df.show()


#### From CSV File:

df = spark.read.csv("file.csv", header=True, inferSchema=True)
df.show()


---

### 7. Inspecting DataFrames

df.printSchema()     # Schema info  
df.columns # List column names
df.describe().show() # Summary stats
df.head(5) # First 5 rows


---

### 8. Basic Transformations

df.select("Name").show()
df.filter(df["Age"] > 25).show()
df.withColumn("AgePlus10", df["Age"] + 10).show()
df.drop("Age").show()


---

### 9. Working with SQL

df.createOrReplaceTempView("people")
spark.sql("SELECT * FROM people WHERE Age > 25").show()


---

### 10. Writing Data

df.write.csv("output.csv", header=True)
df.write.parquet("output_parquet/")


---

### 11. Summary of Concepts Covered

• Spark architecture & PySpark setup
• Core components of PySpark
• Differences between RDD and DataFrames
• How to create, inspect, and manipulate DataFrames
• SQL support in Spark
• Reading/writing to/from storage

---

### Exercise

1. Load a sample CSV file and display the schema
2. Add a new column with a calculated value
3. Filter the rows based on a condition
4. Save the result as a new CSV or Parquet file

---

#Python #PySpark #BigData #ApacheSpark #DataEngineering #ETL

https://t.iss.one/DataScienceM
4
Topic: Python Matplotlib – From Easy to Top: Part 1 of 6: Introduction and Basic Plotting

---

### 1. What is Matplotlib?

Matplotlib is the most widely used Python library for data visualization.

• It provides an object-oriented API for embedding plots into applications and supports a wide variety of graphs: line charts, bar charts, scatter plots, histograms, etc.

---

### 2. Installing and Importing Matplotlib

Install Matplotlib if you haven't:

pip install matplotlib


Import the main module and pyplot interface:

import matplotlib.pyplot as plt
import numpy as np


---

### 3. Plotting a Basic Line Chart

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

plt.plot(x, y)
plt.title("Simple Line Plot")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.grid(True)
plt.show()


---

### 4. Customizing Line Style, Color, and Markers

plt.plot(x, y, color='green', linestyle='--', marker='o', label='Data')
plt.title("Styled Line Plot")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.legend()
plt.show()


---

### 5. Adding Multiple Lines to a Plot

x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.plot(x, y1, label="sin(x)", color='blue')
plt.plot(x, y2, label="cos(x)", color='red')
plt.title("Multiple Line Plot")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.legend()
plt.grid(True)
plt.show()


---

### 6. Scatter Plot

Used to show relationships between two variables.

x = np.random.rand(100)
y = np.random.rand(100)

plt.scatter(x, y, color='purple', alpha=0.6)
plt.title("Scatter Plot")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.show()


---

### 7. Bar Chart

categories = ['A', 'B', 'C', 'D']
values = [4, 7, 2, 5]

plt.bar(categories, values, color='skyblue')
plt.title("Bar Chart Example")
plt.xlabel("Category")
plt.ylabel("Value")
plt.show()


---

### 8. Histogram

data = np.random.randn(1000)

plt.hist(data, bins=30, color='orange', edgecolor='black')
plt.title("Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()


---

### 9. Saving the Plot to a File

plt.plot([1, 2, 3], [4, 5, 6])
plt.savefig("plot.png")


---

### 10. Summary

matplotlib.pyplot is the key module for creating all kinds of plots.
• You can customize styles, add labels, titles, and legends.
• Understanding basic plots is the foundation for creating advanced visualizations.

---

Exercise

• Plot y = x^2 and y = x^3 on the same figure.
• Create a scatter plot of 100 random points.
• Create and save a histogram from a normal distribution sample of 500 points.

---

#Python #Matplotlib #DataVisualization #Plots #Charts

https://t.iss.one/DataScienceM
3
Topic: Python Matplotlib – From Easy to Top: Part 2 of 6: Subplots, Figures, and Layout Management

---

### 1. Introduction to Figures and Axes

• In Matplotlib, a Figure is the entire image or window on which everything is drawn.
• An Axes is a part of the figure where data is plotted — it contains titles, labels, ticks, lines, etc.

Basic hierarchy:

* Figure ➝ contains one or more Axes
* Axes ➝ the area where the data is actually plotted
* Axis ➝ x-axis and y-axis inside an Axes

import matplotlib.pyplot as plt
import numpy as np


---

### 2. Creating Multiple Subplots using `plt.subplot()`

x = np.linspace(0, 2*np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)

plt.subplot(2, 1, 1)
plt.plot(x, y1, label="sin(x)")
plt.title("First Subplot")

plt.subplot(2, 1, 2)
plt.plot(x, y2, label="cos(x)", color='green')
plt.title("Second Subplot")

plt.tight_layout()
plt.show()


Explanation:

* subplot(2, 1, 1) means 2 rows, 1 column, this is the first plot.
* tight_layout() prevents overlap between plots.

---

### 3. Creating Subplots with `plt.subplots()` (Recommended)

fig, axs = plt.subplots(2, 2, figsize=(8, 6))

x = np.linspace(0, 10, 100)

axs[0, 0].plot(x, np.sin(x))
axs[0, 0].set_title("sin(x)")

axs[0, 1].plot(x, np.cos(x))
axs[0, 1].set_title("cos(x)")

axs[1, 0].plot(x, np.tan(x))
axs[1, 0].set_title("tan(x)")
axs[1, 0].set_ylim(-10, 10)

axs[1, 1].plot(x, np.exp(-x))
axs[1, 1].set_title("exp(-x)")

plt.tight_layout()
plt.show()


---

### 4. Sharing Axes Between Subplots

fig, axs = plt.subplots(1, 2, sharey=True)

x = np.linspace(0, 10, 100)

axs[0].plot(x, np.sin(x))
axs[0].set_title("sin(x)")

axs[1].plot(x, np.cos(x), color='red')
axs[1].set_title("cos(x)")

plt.show()


---

### 5. Adjusting Spacing with `subplots_adjust()`

fig, axs = plt.subplots(2, 2)

fig.subplots_adjust(hspace=0.4, wspace=0.3)


---

### 6. Nested Plots Using `inset_axes`

You can add a small plot inside another:

from mpl_toolkits.axes_grid1.inset_locator import inset_axes

fig, ax = plt.subplots()
x = np.linspace(0, 10, 100)
y = np.sin(x)

ax.plot(x, y)
ax.set_title("Main Plot")

inset_ax = inset_axes(ax, width="30%", height="30%", loc=1)
inset_ax.plot(x, np.cos(x), color='orange')
inset_ax.set_title("Inset", fontsize=8)

plt.show()


---

### 7. Advanced Layout: Gridspec

import matplotlib.gridspec as gridspec

fig = plt.figure(figsize=(8, 6))
gs = gridspec.GridSpec(3, 3)

ax1 = fig.add_subplot(gs[0, :])
ax2 = fig.add_subplot(gs[1, :-1])
ax3 = fig.add_subplot(gs[1:, -1])
ax4 = fig.add_subplot(gs[2, 0])
ax5 = fig.add_subplot(gs[2, 1])

ax1.set_title("Top")
ax2.set_title("Left")
ax3.set_title("Right")
ax4.set_title("Bottom Left")
ax5.set_title("Bottom Center")

plt.tight_layout()
plt.show()


---

### 8. Summary

• Use subplot() for quick layouts and subplots() for flexibility.
• Share axes to align multiple plots.
• Use inset_axes and gridspec for custom and complex layouts.
• Always use tight_layout() or subplots_adjust() to clean up spacing.

---

### Exercise

• Create a 2x2 grid of subplots showing different trigonometric functions.
• Add an inset plot inside a sine wave chart.
• Use Gridspec to create an asymmetric layout with at least 5 different plots.

---

#Python #Matplotlib #Subplots #DataVisualization #Gridspec #LayoutManagement

https://t.iss.one/DataScienceM
1
Topic: Python Matplotlib – From Easy to Top: Part 3 of 6: Plot Customization and Styling

---

### 1. Why Customize Plots?

• Customization improves readability and presentation.
• You can control everything from fonts and colors to axis ticks and legend placement.

---

### 2. Customizing Titles, Labels, and Ticks

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.title("Sine Wave", fontsize=16, color='navy')
plt.xlabel("Time (s)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.linspace(-1, 1, 5))
plt.grid(True)
plt.show()


---

### 3. Changing Line Styles and Markers

plt.plot(x, y, color='red', linestyle='--', linewidth=2, marker='o', markersize=5, label='sin(x)')
plt.title("Styled Sine Curve")
plt.legend()
plt.grid(True)
plt.show()


Common styles:

• Line styles: '-', '--', ':', '-.'
• Markers: 'o', '^', 's', '*', 'D', etc.
• Colors: 'r', 'g', 'b', 'c', 'm', 'y', 'k', etc.

---

### 4. Adding Legends

plt.plot(x, np.sin(x), label="Sine")
plt.plot(x, np.cos(x), label="Cosine")
plt.legend(loc='upper right', fontsize=10)
plt.title("Legend Example")
plt.show()


---

### 5. Using Annotations

Annotations help highlight specific points:

plt.plot(x, y)
plt.annotate('Peak', xy=(np.pi/2, 1), xytext=(2, 1.2),
arrowprops=dict(facecolor='black', shrink=0.05))
plt.title("Annotated Peak")
plt.show()


---

### 6. Customizing Axes Appearance

fig, ax = plt.subplots()
ax.plot(x, y)

# Remove top and right border
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Customize axis colors and widths
ax.spines['left'].set_color('blue')
ax.spines['left'].set_linewidth(2)

plt.title("Customized Axes")
plt.show()


---

### 7. Setting Plot Limits

plt.plot(x, y)
plt.xlim(0, 10)
plt.ylim(-1.5, 1.5)
plt.title("Limit Axes")
plt.show()


---

### 8. Using Style Sheets

Matplotlib has built-in style sheets for quick beautification.

plt.style.use('ggplot')

plt.plot(x, np.sin(x))
plt.title("ggplot Style")
plt.show()


Popular styles: seaborn, fivethirtyeight, bmh, dark_background, etc.

---

### 9. Creating Grids and Minor Ticks

plt.plot(x, y)
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.minorticks_on()
plt.title("Grid with Minor Ticks")
plt.show()


---

### 10. Summary

• Customize everything: lines, axes, colors, labels, and grid.
• Use legends and annotations for clarity.
• Apply styles and themes for professional looks.
• Small changes improve the quality of your plots significantly.

---

### Exercise

• Plot sin(x) with red dashed lines and circle markers.
• Add a title, custom x/y labels, and set axis ranges manually.
• Apply the 'seaborn-darkgrid' style and highlight the peak with an annotation.

---

#Python #Matplotlib #Customization #DataVisualization #PlotStyling

https://t.iss.one/DataScienceM
3
Topic: Python PySpark Data Sheet – Part 2 of 3: DataFrame Transformations, Joins, and Group Operations

---

### 1. Column Operations

PySpark supports various column-wise operations using expressions.

#### Select Specific Columns:

df.select("Name", "Age").show()


#### Create/Modify Column:

from pyspark.sql.functions import col

df.withColumn("AgePlus5", col("Age") + 5).show()


#### Rename a Column:

df.withColumnRenamed("Age", "UserAge").show()


#### Drop Column:

df.drop("Age").show()


---

### 2. Filtering and Conditional Logic

#### Filter Rows:

df.filter(col("Age") > 25).show()


#### Multiple Conditions:

df.filter((col("Age") > 25) & (col("Name") != "Alice")).show()


#### Using `when` for Conditional Columns:

from pyspark.sql.functions import when

df.withColumn("Category", when(col("Age") < 30, "Young").otherwise("Adult")).show()


---

### 3. Aggregations and Grouping

#### GroupBy + Aggregations:

df.groupBy("Department").count().show()
df.groupBy("Department").agg({"Salary": "avg"}).show()


#### Using Aggregate Functions:

from pyspark.sql.functions import avg, max, min, count

df.groupBy("Department").agg(
avg("Salary").alias("AvgSalary"),
max("Salary").alias("MaxSalary")
).show()


---

### 4. Sorting and Ordering

#### Sort by One or More Columns:

df.orderBy("Age").show()
df.orderBy(col("Salary").desc()).show()


---

### 5. Dropping Duplicates & Handling Missing Data

#### Drop Duplicates:

df.dropDuplicates(["Name", "Age"]).show()


#### Drop Rows with Nulls:

df.dropna().show()


#### Fill Null Values:

df.fillna({"Salary": 0}).show()


---

### 6. Joins in PySpark

PySpark supports various join types like SQL.

#### Types of Joins:

inner
left
right
outer
left_semi
left_anti

#### Example – Inner Join:

df1.join(df2, on="id", how="inner").show()


#### Left Join Example:

df1.join(df2, on="id", how="left").show()


---

### 7. Working with Dates and Timestamps

from pyspark.sql.functions import current_date, current_timestamp

df.withColumn("today", current_date()).show()
df.withColumn("now", current_timestamp()).show()


#### Date Formatting:

from pyspark.sql.functions import date_format

df.withColumn("formatted", date_format(col("Date"), "yyyy-MM-dd")).show()


---

### 8. Window Functions (Advanced Aggregations)

Used for operations like ranking, cumulative sum, and moving average.

from pyspark.sql.window import Window
from pyspark.sql.functions import row_number

window_spec = Window.partitionBy("Department").orderBy("Salary")
df.withColumn("rank", row_number().over(window_spec)).show()


---

### 9. Caching and Persistence

Use caching for performance when reusing data:

df.cache()
df.show()


Or use:

df.persist()


---

### 10. Summary of Concepts Covered

• Column transformations and renaming
• Filtering and conditional logic
• Grouping, aggregating, and sorting
• Handling nulls and duplicates
• All types of joins
• Working with dates and window functions
• Caching for performance

---

### Exercise

1. Load two CSV datasets and perform different types of joins
2. Add a new column with a custom label based on a condition
3. Aggregate salary data by department and show top-paid employees per department using window functions
4. Practice caching and observe performance

---

#Python #PySpark #DataEngineering #BigData #ETL #ApacheSpark

https://t.iss.one/DataScienceM
2
Topic: Python Matplotlib – From Easy to Top: Part 4 of 6: Advanced Charts – Histograms, Pie, Box, Area, and Error Bars

---

### 1. Histogram: Visualizing Data Distribution

Histograms show frequency distribution of numerical data.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randn(1000)

plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.title("Normal Distribution Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.grid(True)
plt.show()


Customizations:

bins=30 – controls granularity
density=True – normalize the histogram
alpha=0.7 – transparency

---

### 2. Pie Chart: Showing Proportions

labels = ['Python', 'JavaScript', 'C++', 'Java']
sizes = [45, 30, 15, 10]
colors = ['gold', 'lightgreen', 'lightcoral', 'lightskyblue']
explode = (0.1, 0, 0, 0) # explode the 1st slice

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%',
startangle=140, explode=explode, shadow=True)
plt.title("Programming Language Popularity")
plt.axis('equal') # Equal aspect ratio ensures pie is circular
plt.show()


---

### 3. Box Plot: Summarizing Distribution Stats

Box plots show min, Q1, median, Q3, max, and outliers.

data = [np.random.normal(0, std, 100) for std in range(1, 4)]

plt.boxplot(data, patch_artist=True, labels=['std=1', 'std=2', 'std=3'])
plt.title("Box Plot Example")
plt.grid(True)
plt.show()


Tip: Use vert=False to make a horizontal boxplot.

---

### 4. Area Chart: Cumulative Trends

x = np.arange(1, 6)
y1 = np.array([1, 3, 4, 5, 7])
y2 = np.array([1, 2, 4, 6, 8])

plt.fill_between(x, y1, color="skyblue", alpha=0.5, label="Y1")
plt.fill_between(x, y2, color="orange", alpha=0.5, label="Y2")
plt.title("Area Chart")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.legend()
plt.show()


---

### 5. Error Bar Plot: Showing Uncertainty

x = np.arange(0.1, 4, 0.5)
y = np.exp(-x)
error = 0.1 + 0.2 * x

plt.errorbar(x, y, yerr=error, fmt='-o', color='teal', ecolor='red', capsize=5)
plt.title("Error Bar Plot")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.grid(True)
plt.show()


---

### 6. Horizontal Bar Chart

langs = ['Python', 'Java', 'C++', 'JavaScript']
popularity = [50, 40, 30, 45]

plt.barh(langs, popularity, color='plum')
plt.title("Programming Language Popularity")
plt.xlabel("Popularity")
plt.show()


---

### 7. Stacked Bar Chart

labels = ['2019', '2020', '2021']
men = [20, 35, 30]
women = [25, 32, 34]

x = np.arange(len(labels))
width = 0.5

plt.bar(x, men, width, label='Men')
plt.bar(x, women, width, bottom=men, label='Women')

plt.ylabel('Scores')
plt.title('Scores by Year and Gender')
plt.xticks(x, labels)
plt.legend()
plt.show()


---

### 8. Summary

Histograms show frequency distribution
Pie charts are good for proportions
Box plots summarize spread and outliers
Area charts visualize trends over time
Error bars indicate uncertainty in measurements
Stacked and horizontal bars enhance categorical data clarity

---

### Exercise

• Create a pie chart showing budget allocation of 5 departments.
• Plot 3 histograms on the same figure with different distributions.
• Build a stacked bar chart for monthly expenses across 3 categories.
• Add error bars to a decaying function and annotate the max point.

---

#Python #Matplotlib #DataVisualization #AdvancedCharts #Histograms #PieCharts #BoxPlots

https://t.iss.one/DataScienceM
Topic: Python PySpark Data Sheet – Part 3 of 3: Advanced Operations, MLlib, and Deployment

---

### 1. Working with UDFs (User Defined Functions)

UDFs allow custom Python functions to be used in PySpark transformations.

#### Define and Use a UDF:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

def label_age(age):
return "Senior" if age > 50 else "Adult"

label_udf = udf(label_age, StringType())

df.withColumn("AgeGroup", label_udf(df["Age"])).show()


> ⚠️ Note: UDFs are less optimized than built-in functions. Use built-ins when possible.

---

### 2. Working with JSON and Parquet Files

#### Read JSON File:

df_json = spark.read.json("data.json")
df_json.show()


#### Read & Write Parquet File:

df_parquet = spark.read.parquet("data.parquet")
df_parquet.write.parquet("output_folder/")


---

### 3. Using PySpark MLlib (Machine Learning Library)

MLlib is Spark's scalable ML library with tools for classification, regression, clustering, and more.

---

#### Steps in a Typical ML Pipeline:

• Load and prepare data
• Feature engineering
• Model training
• Evaluation
• Prediction

---

### 4. Example: Logistic Regression in PySpark

#### Step 1: Prepare Data

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression

# Sample DataFrame
data = spark.createDataFrame([
(1.0, 2.0, 3.0, 1.0),
(2.0, 3.0, 4.0, 0.0),
(1.5, 2.5, 3.5, 1.0)
], ["f1", "f2", "f3", "label"])

# Combine features into a single vector
vec = VectorAssembler(inputCols=["f1", "f2", "f3"], outputCol="features")
data = vec.transform(data)


#### Step 2: Train Model

lr = LogisticRegression(featuresCol="features", labelCol="label")
model = lr.fit(data)


#### Step 3: Make Predictions

predictions = model.transform(data)
predictions.select("features", "label", "prediction").show()


---

### 5. Model Evaluation

from pyspark.ml.evaluation import BinaryClassificationEvaluator

evaluator = BinaryClassificationEvaluator()
print("Accuracy:", evaluator.evaluate(predictions))


---

### 6. Save and Load Models

# Save
model.save("models/logistic_model")

# Load
from pyspark.ml.classification import LogisticRegressionModel
loaded_model = LogisticRegressionModel.load("models/logistic_model")


---

### 7. PySpark with Pandas API on Spark

For small-medium data (pandas-compatible), use pyspark.pandas:

import pyspark.pandas as ps

pdf = ps.read_csv("data.csv")
pdf.head()


> Works like Pandas, but with Spark backend.

---

### 8. Scheduling & Cluster Deployment

PySpark can run:

• Locally
• On YARN (Hadoop)
Mesos
Kubernetes
• In Databricks, AWS EMR, Google Cloud Dataproc

Use spark-submit for production scripts:

spark-submit my_script.py


---

### 9. Tuning and Optimization Tips

• Cache reused DataFrames
• Use built-in functions instead of UDFs
• Repartition if data is skewed
• Avoid using collect() on large datasets

---

### 10. Summary of Part 3

• Custom logic with UDFs
• Working with JSON, Parquet, and other formats
• Machine Learning with MLlib (Logistic Regression)
• Model evaluation and saving
• Integration with Pandas
• Deployment and optimization techniques

---

### Exercise

1. Load a dataset and train a logistic regression model
2. Add feature engineering using VectorAssembler
3. Save and reload the model
4. Use UDFs to label predictions as “Yes/No”
5. Deploy your pipeline using spark-submit

---

#Python #PySpark #MLlib #BigData #MachineLearning #ETL #ApacheSpark

https://t.iss.one/DataScienceM
3
Topic: Python Matplotlib – From Easy to Top: Part 5 of 6: Images, Heatmaps, and Colorbars

---

### 1. Introduction

Matplotlib can handle images, heatmaps, and color mapping effectively, making it a great tool for visualizing:

• Image data (grayscale or color)
• Matrix-like data with heatmaps
• Any data that needs a gradient of colors

---

### 2. Displaying Images with `imshow()`

import matplotlib.pyplot as plt
import numpy as np

# Create a random grayscale image
img = np.random.rand(10, 10)

plt.imshow(img, cmap='gray')
plt.title("Grayscale Image")
plt.colorbar()
plt.show()


Key parameters:

cmap – color map (gray, hot, viridis, coolwarm, etc.)
interpolation – for smoothing pixelation (nearest, bilinear, bicubic)

---

### 3. Displaying Color Images

import matplotlib.image as mpimg

img = mpimg.imread('example.png') # image must be in your directory
plt.imshow(img)
plt.title("Color Image")
plt.axis('off') # Hide axes
plt.show()


Note: Image should be PNG or JPG. For real projects, use PIL or OpenCV for more control.

---

### 4. Creating a Heatmap from a 2D Matrix

matrix = np.random.rand(6, 6)

plt.imshow(matrix, cmap='viridis', interpolation='nearest')
plt.title("Heatmap Example")
plt.colorbar(label="Intensity")
plt.xticks(range(6), ['A', 'B', 'C', 'D', 'E', 'F'])
plt.yticks(range(6), ['P', 'Q', 'R', 'S', 'T', 'U'])
plt.show()


---

### 5. Customizing Color Maps

You can reverse or customize color maps:

plt.imshow(matrix, cmap='coolwarm_r')  # Reversed coolwarm


You can also create custom color ranges using vmin and vmax:

plt.imshow(matrix, cmap='hot', vmin=0.2, vmax=0.8)


---

### 6. Using `matshow()` for Matrix-Like Data

matshow() is optimized for visualizing 2D arrays:

plt.matshow(matrix)
plt.title("Matrix View with matshow()")
plt.colorbar()
plt.show()


---

### 7. Annotating Heatmaps

fig, ax = plt.subplots()
cax = ax.imshow(matrix, cmap='plasma')

# Add text annotations
for i in range(matrix.shape[0]):
for j in range(matrix.shape[1]):
ax.text(j, i, f'{matrix[i, j]:.2f}', ha='center', va='center', color='white')

plt.title("Annotated Heatmap")
plt.colorbar(cax)
plt.show()


---

### 8. Displaying Multiple Images in Subplots

fig, axs = plt.subplots(1, 2, figsize=(10, 4))

axs[0].imshow(matrix, cmap='Blues')
axs[0].set_title("Blues")

axs[1].imshow(matrix, cmap='Greens')
axs[1].set_title("Greens")

plt.tight_layout()
plt.show()


---

### 9. Saving Heatmaps and Figures

plt.imshow(matrix, cmap='magma')
plt.title("Save This Heatmap")
plt.colorbar()
plt.savefig("heatmap.png", dpi=300)
plt.close()


---

### 10. Summary

imshow() and matshow() visualize 2D data or images
• Heatmaps are great for matrix or correlation data
• Use colorbars and annotations to add context
• Customize colormaps with cmap, vmin, vmax
• Save your visualizations easily using savefig()

---

### Exercise

• Load a grayscale image using NumPy and display it.
• Create a 10×10 heatmap with annotations.
• Display 3 subplots of the same matrix using 3 different colormaps.
• Save one of the heatmaps with high resolution.

---

#Python #Matplotlib #Heatmaps #DataVisualization #Images #ColorMapping

https://t.iss.one/DataScienceM
6
Topic: Python Matplotlib – From Easy to Top: Part 6 of 6: 3D Plotting, Animation, and Interactive Visuals

---

### 1. Introduction

Matplotlib supports advanced visualizations including:

3D plots using mpl_toolkits.mplot3d
Animations with FuncAnimation
Interactive plots using widgets and event handling

---

### 2. Creating 3D Plots

You need to import the 3D toolkit:

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np


---

### 3. 3D Line Plot

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

z = np.linspace(0, 15, 100)
x = np.sin(z)
y = np.cos(z)

ax.plot3D(x, y, z, 'purple')
ax.set_title("3D Line Plot")
plt.show()


---

### 4. 3D Surface Plot

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

X = np.linspace(-5, 5, 50)
Y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(X, Y)
Z = np.sin(np.sqrt(X**2 + Y**2))

surf = ax.plot_surface(X, Y, Z, cmap='viridis')
fig.colorbar(surf)

ax.set_title("3D Surface Plot")
plt.show()


---

### 5. 3D Scatter Plot

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = np.random.rand(100)
y = np.random.rand(100)
z = np.random.rand(100)

ax.scatter(x, y, z, c=z, cmap='plasma')
ax.set_title("3D Scatter Plot")
plt.show()


---

### 6. Creating Animations

Use FuncAnimation for animated plots.

import matplotlib.animation as animation

fig, ax = plt.subplots()
x = np.linspace(0, 2*np.pi, 128)
line, = ax.plot(x, np.sin(x))

def update(frame):
line.set_ydata(np.sin(x + frame / 10))
return line,

ani = animation.FuncAnimation(fig, update, frames=100, interval=50)
plt.title("Sine Wave Animation")
plt.show()


---

### 7. Save Animation as a File

ani.save("sine_wave.gif", writer='pillow')


Make sure to install pillow using:

pip install pillow


---

### 8. Adding Interactivity with Widgets

import matplotlib.widgets as widgets

fig, ax = plt.subplots()
plt.subplots_adjust(left=0.1, bottom=0.25)

x = np.linspace(0, 2*np.pi, 100)
freq = 1
line, = ax.plot(x, np.sin(freq * x))

ax_slider = plt.axes([0.25, 0.1, 0.65, 0.03])
slider = widgets.Slider(ax_slider, 'Frequency', 0.1, 5.0, valinit=freq)

def update(val):
line.set_ydata(np.sin(slider.val * x))
fig.canvas.draw_idle()

slider.on_changed(update)
plt.title("Interactive Sine Wave")
plt.show()


---

### 9. Mouse Interaction with Events

def onclick(event):
print(f'You clicked at x={event.xdata:.2f}, y={event.ydata:.2f}')

fig, ax = plt.subplots()
ax.plot([1, 2, 3], [4, 5, 6])
fig.canvas.mpl_connect('button_press_event', onclick)
plt.title("Click to Print Coordinates")
plt.show()


---

### 10. Summary

3D plots are ideal for visualizing spatial data and surfaces
Animations help convey dynamic changes in data
Widgets and events add interactivity for data exploration
• Mastering these tools enables the creation of interactive dashboards and visual storytelling

---

### Exercise

• Plot a 3D surface of z = cos(sqrt(x² + y²)).
• Create a slider to change frequency of a sine wave in real-time.
• Animate a circle that rotates along time.
• Build a 3D scatter plot of 3 correlated variables.

---

#Python #Matplotlib #3DPlots #Animations #InteractiveVisuals #DataVisualization

https://t.iss.one/DataScienceM
3
Topic: Python Matplotlib – Important 20 Interview Questions with Answers

---

### 1. What is Matplotlib in Python?

Answer:
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is highly customizable and works well with NumPy and pandas.

---

### 2. What is the difference between `plt.plot()` and `plt.scatter()`?

Answer:
plt.plot() is used for line plots.
plt.scatter() is used for creating scatter (dot) plots.

---

### 3. How do you add a title and axis labels to a plot?

Answer:

plt.title("My Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")


---

### 4. How can you create multiple subplots in one figure?

Answer:
Use plt.subplots() to create a grid layout of subplots.

fig, axs = plt.subplots(2, 2)


---

### 5. How do you save a plot to a file?

Answer:

plt.savefig("myplot.png", dpi=300)


---

### 6. What is the role of `plt.show()`?

Answer:
It displays the figure window containing the plot. Required for interactive sessions or scripts.

---

### 7. What is a histogram in Matplotlib?

Answer:
A histogram is used to visualize the frequency distribution of numeric data using plt.hist().

---

### 8. What does `plt.figure(figsize=(8,6))` do?

Answer:
It creates a new figure with a specified width and height (in inches).

---

### 9. How do you add a legend to your plot?

Answer:

plt.legend()


You must specify label='something' in your plot function.

---

### 10. What are some common `cmap` (color map) options?

Answer:
'viridis', 'plasma', 'hot', 'coolwarm', 'gray', 'jet', etc.

---

### 11. How do you create a bar chart?

Answer:

plt.bar(categories, values)


---

### 12. How can you rotate x-axis tick labels?

Answer:

plt.xticks(rotation=45)


---

### 13. How do you add a grid to the plot?

Answer:

plt.grid(True)


---

### 14. What is the difference between `imshow()` and `matshow()`?

Answer:
imshow() is general-purpose for image data.
matshow() is optimized for 2D matrices and auto-configures the axes.

---

### 15. How do you change the style of a plot globally?

Answer:

plt.style.use('ggplot')


---

### 16. How can you add annotations to specific data points?

Answer:

plt.annotate('label', xy=(x, y), xytext=(x+1, y+1), arrowprops=dict(arrowstyle='->'))


---

### 17. How do you create a pie chart in Matplotlib?

Answer:

plt.pie(data, labels=labels, autopct='%1.1f%%')


---

### 18. How do you plot a heatmap in Matplotlib?

Answer:

plt.imshow(matrix, cmap='hot')
plt.colorbar()


---

### 19. Can Matplotlib create 3D plots?

Answer:
Yes. Use:

from mpl_toolkits.mplot3d import Axes3D


Then:

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')


---

### 20. How do you add error bars to your data?

Answer:

plt.errorbar(x, y, yerr=errors, fmt='o')


---

### Exercise

Choose 5 of the above functions and implement a mini-dashboard with line, bar, and pie plots in one figure layout.

---

#Python #Matplotlib #InterviewQuestions #DataVisualization #TechInterview

https://t.iss.one/DataScienceM
4👍1
🙏💸 500$ FOR THE FIRST 500 WHO JOIN THE CHANNEL! 🙏💸

Join our channel today for free! Tomorrow it will cost 500$!

https://t.iss.one/+QHlfCJcO2lRjZWVl

You can join at this link! 👆👇

https://t.iss.one/+QHlfCJcO2lRjZWVl
This channels is for Programmers, Coders, Software Engineers.

0️⃣ Python
1️⃣ Data Science
2️⃣ Machine Learning
3️⃣ Data Visualization
4️⃣ Artificial Intelligence
5️⃣ Data Analysis
6️⃣ Statistics
7️⃣ Deep Learning
8️⃣ programming Languages

https://t.iss.one/addlist/8_rRW2scgfRhOTc0

https://t.iss.one/Codeprogrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
Data Science Machine Learning Data Analysis
Photo
# 📚 PyTorch Tutorial for Beginners - Part 1/6: Fundamentals & Tensors
#PyTorch #DeepLearning #MachineLearning #NeuralNetworks #Tensors

Welcome to Part 1 of our comprehensive PyTorch series! This beginner-friendly lesson covers core concepts, tensor operations, and your first neural network.

---

## 🔹 What is PyTorch?
PyTorch is an open-source deep learning framework developed by Facebook's AI Research Lab (FAIR). Key features:

✔️ Dynamic computation graphs (define-by-run)
✔️ GPU acceleration with CUDA
✔️ Pythonic syntax for intuitive coding
✔️ Automatic differentiation (autograd)
✔️ Rich ecosystem (TorchVision, TorchText, etc.)

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")


---

## 🔹 Tensors: The Building Blocks
Tensors are PyTorch's multi-dimensional arrays (like NumPy but with GPU support).

### 1. Creating Tensors
# From Python list
a = torch.tensor([1, 2, 3]) # 1D tensor (vector)

# 2D tensor (matrix)
b = torch.tensor([[1., 2.], [3., 4.]])

# Special tensors
zeros = torch.zeros(2, 3) # 2x3 matrix of zeros
ones = torch.ones_like(zeros) # Same shape as zeros, filled with 1s
rand = torch.rand(3, 3) # 3x3 matrix with uniform random values (0-1)


### 2. Tensor Attributes
x = torch.rand(2, 3)
print(f"Shape: {x.shape}") # torch.Size([2, 3])
print(f"Data type: {x.dtype}") # torch.float32
print(f"Device: {x.device}") # cpu/cuda:0


### 3. Moving Tensors to GPU
if torch.cuda.is_available():
x = x.to('cuda') # Move to GPU
print(f"Now on: {x.device}") # cuda:0


---

## 🔹 Tensor Operations
### 1. Basic Math
x = torch.tensor([1., 2., 3.])
y = torch.tensor([4., 5., 6.])

# Element-wise operations
add = x + y # or torch.add(x, y)
sub = x - y
mul = x * y
div = x / y

# Matrix multiplication
mat1 = torch.rand(2, 3)
mat2 = torch.rand(3, 2)
matmul = torch.mm(mat1, mat2) # or mat1 @ mat2


### 2. Reshaping Tensors
x = torch.arange(6)          # [0, 1, 2, 3, 4, 5]
x_reshaped = x.view(2, 3) # [[0, 1, 2], [3, 4, 5]]
x_flattened = x.flatten() # Back to 1D


### 3. Indexing & Slicing
x = torch.tensor([[1, 2], [3, 4], [5, 6]])
print(x[0, 1]) # 2 (first row, second column)
print(x[:, 0]) # [1, 3, 5] (all rows, first column)


---

## 🔹 Autograd: Automatic Differentiation
PyTorch automatically computes gradients for tensors with requires_grad=True.

### 1. Basic Example
x = torch.tensor(2.0, requires_grad=True)
y = x**2 + 3*x + 1
y.backward() # Compute gradients
print(x.grad) # dy/dx = 2x + 3 → 7.0


### 2. Neural Network Context
# Simple linear regression
w = torch.randn(1, requires_grad=True)
b = torch.zeros(1, requires_grad=True)

# Forward pass
inputs = torch.tensor([[1.0], [2.0], [3.0]])
targets = torch.tensor([[2.0], [4.0], [6.0]])
predictions = inputs * w + b

# Loss and backward pass
loss = torch.mean((predictions - targets)**2)
loss.backward() # Computes dloss/dw, dloss/db

print(f"Gradient of w: {w.grad}")
print(f"Gradient of b: {b.grad}")


---

## **🔹 Your First Neural Network**
Let's build a single-layer perceptron for binary classification.

### 1. Define the Model
import torch.nn as nn

class Perceptron(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.linear = nn.Linear(input_dim, 1) # 1 output neuron

def forward(self, x):
return torch.sigmoid(self.linear(x)) # Sigmoid for probability

model = Perceptron(input_dim=2)
print(model)


### 2. Synthetic Dataset
# XOR-like dataset
X = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
y = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)
🔥1
Data Science Machine Learning Data Analysis
Photo
### 3. Training Loop
criterion = nn.BCELoss()          # Binary Cross Entropy
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

for epoch in range(1000):
# Forward pass
outputs = model(X)
loss = criterion(outputs, y)

# Backward pass
optimizer.zero_grad() # Clear old gradients
loss.backward() # Compute gradients
optimizer.step() # Update weights

if (epoch+1) % 100 == 0:
print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')

# Test
with torch.no_grad():
predictions = model(X).round()
print(f"Final predictions: {predictions.squeeze()}")


---

## 🔹 Best Practices for Beginners
1. Always clear gradients with optimizer.zero_grad() before backward()
2. Use `with torch.no_grad():` for inference (disables gradient tracking)
3. Normalize input data (e.g., scale to [0, 1] or standardize)
4. Start simple before using complex architectures
5. Leverage GPU for larger models/datasets

---

### 📌 What's Next?
In Part 2, we'll cover:
➡️ Deep Neural Networks (DNNs)
➡️ Activation Functions
➡️ Batch Normalization
➡️ Handling Real Datasets

#PyTorch #DeepLearning #MachineLearning 🚀

Practice Exercise:
1. Create a tensor of shape (3, 4) with random values (0-1)
2. Compute the mean of each column
3. Build a perceptron for OR gate (modify the XOR example)
4. Plot the loss curve during training

# Solution for exercise 1-2
x = torch.rand(3, 4)
col_means = x.mean(dim=0) # dim=0 → average along rows
3🔥2