Machine Learning
39.2K subscribers
3.83K photos
32 videos
41 files
1.3K links
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
πŸ“š Data Cleaning and Exploration with Machine Learning (2022)

1⃣ Join Channel Download:
https://t.iss.one/+MhmkscCzIYQ2MmM8

2⃣ Download Book: https://t.iss.one/c/1854405158/119

πŸ’¬ Tags: #DataCleaning #ML

USEFUL CHANNELS FOR YOU
❀7πŸ‘3
πŸ“š Python Data Cleaning Cookbook (2023)

1⃣ Join Channel Download:
https://t.iss.one/+MhmkscCzIYQ2MmM8

2⃣ Download Book: https://t.iss.one/c/1854405158/866

πŸ’¬ Tags: #DataCleaning

πŸ‘‰ BEST DATA SCIENCE CHANNELS ON TELEGRAM πŸ‘ˆ
πŸ‘12❀1
Pandas Data Cleaning (Guide)

πŸ”‘ Tags: #Pandas #DataCleaning #ML

https://t.iss.one/DataScienceM βœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘11
Pandas.pdf
14.9 MB
Pandas Data Cleaning (Guide)

πŸ”‘ Tags: #Pandas #DataCleaning #ML

https://t.iss.one/DataScienceM βœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
πŸ‘21
Topic: Handling Datasets of All Types – Part 2 of 5: Data Cleaning and Preprocessing

---

1. Importance of Data Cleaning

β€’ Real-world data is often noisy, incomplete, or inconsistent.

β€’ Cleaning improves data quality and model performance.

---

2. Handling Missing Data

β€’ Detect missing values using isnull() or isna() in pandas.

β€’ Strategies to handle missing data:

* Remove rows or columns with missing values:

df.dropna(inplace=True)


* Impute missing values with mean, median, or mode:

df['column'].fillna(df['column'].mean(), inplace=True)


---

3. Handling Outliers

β€’ Outliers can skew analysis and model results.

β€’ Detect outliers using:

* Boxplots
* Z-score method
* IQR (Interquartile Range)

β€’ Handle by removal or transformation.

---

4. Data Normalization and Scaling

β€’ Many ML models require features to be on a similar scale.

β€’ Common techniques:

* Min-Max Scaling (scales values between 0 and 1)

* Standardization (mean = 0, std = 1)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = scaler.fit_transform(df[['feature1', 'feature2']])


---

5. Encoding Categorical Variables

β€’ Convert categorical data into numerical:

* Label Encoding: Assigns an integer to each category.

* One-Hot Encoding: Creates binary columns for each category.

pd.get_dummies(df['category_column'])


---

6. Summary

β€’ Data cleaning is essential for reliable modeling.

β€’ Handling missing values, outliers, scaling, and encoding are key preprocessing steps.

---

Exercise

β€’ Load a dataset, identify missing values, and apply mean imputation.

β€’ Detect outliers using IQR and remove them.

β€’ Normalize numeric features using standardization.

---

#DataCleaning #DataPreprocessing #MachineLearning #Python #DataScience

https://t.iss.one/DataScienceM
❀6πŸ‘1
Topic: Handling Datasets of All Types – Part 2 of 5: Data Cleaning and Preprocessing

---

1. Importance of Data Cleaning

β€’ Real-world data is often noisy, incomplete, or inconsistent.

β€’ Cleaning improves data quality and model performance.

---

2. Handling Missing Data

β€’ Detect missing values using isnull() or isna() in pandas.

β€’ Strategies to handle missing data:

* Remove rows or columns with missing values:

df.dropna(inplace=True)


* Impute missing values with mean, median, or mode:

df['column'].fillna(df['column'].mean(), inplace=True)


---

3. Handling Outliers

β€’ Outliers can skew analysis and model results.

β€’ Detect outliers using:

* Boxplots
* Z-score method
* IQR (Interquartile Range)

β€’ Handle by removal or transformation.

---

4. Data Normalization and Scaling

β€’ Many ML models require features to be on a similar scale.

β€’ Common techniques:

* Min-Max Scaling (scales values between 0 and 1)

* Standardization (mean = 0, std = 1)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = scaler.fit_transform(df[['feature1', 'feature2']])


---

5. Encoding Categorical Variables

β€’ Convert categorical data into numerical:

* Label Encoding: Assigns an integer to each category.

* One-Hot Encoding: Creates binary columns for each category.

pd.get_dummies(df['category_column'])


---

6. Summary

β€’ Data cleaning is essential for reliable modeling.

β€’ Handling missing values, outliers, scaling, and encoding are key preprocessing steps.

---

Exercise

β€’ Load a dataset, identify missing values, and apply mean imputation.

β€’ Detect outliers using IQR and remove them.

β€’ Normalize numeric features using standardization.

---

#DataCleaning #DataPreprocessing #MachineLearning #Python #DataScience

https://t.iss.one/DataScience4M
❀4πŸ‘1
Please open Telegram to view this post
VIEW IN TELEGRAM
❀2
Age
count 5.000000
mean 30.000000
std 6.363961
min 22.000000
25% 26.000000
50% 29.000000
75% 35.000000
max 38.000000


---

10. df.columns
Returns the column labels of the DataFrame.

import pandas as pd
df = pd.DataFrame({'Name': [], 'Age': [], 'City': []})
print(df.columns)

Index(['Name', 'Age', 'City'], dtype='object')


---

11. df.dtypes
Returns the data type of each column.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [25], 'Salary': [75000.50]})
print(df.dtypes)

Name       object
Age int64
Salary float64
dtype: object


---

12. Selecting a Column
Select a single column, which returns a Pandas Series.

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
ages = df['Age']
print(ages)

0    25
1 30
Name: Age, dtype: int64

#DataSelection #Indexing #Statistics

---

13. df.loc[]
Access a group of rows and columns by label(s) or a boolean array.

import pandas as pd
data = {'Age': [25, 30, 35], 'City': ['NY', 'LA', 'CH']}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print(df.loc['Bob'])

Age     30
City LA
Name: Bob, dtype: object


---

14. df.iloc[]
Access a group of rows and columns by integer position(s).

import pandas as pd
data = {'Age': [25, 30, 35], 'City': ['NY', 'LA', 'CH']}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print(df.iloc[1]) # Get the second row (index 1)

Age     30
City LA
Name: Bob, dtype: object


---

15. df.isnull()
Returns a DataFrame of the same shape with boolean values indicating if a value is missing (NaN).

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan], 'B': [3, 4]})
print(df.isnull())

A      B
0 False False
1 True False


---

16. df.dropna()
Removes missing values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, 6]})
cleaned_df = df.dropna()
print(cleaned_df)

A  B
0 1.0 4
2 3.0 6

#DataCleaning #MissingData

---

17. df.fillna()
Fills missing (NaN) values with a specified value or method.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Score': [90, 85, np.nan, 92]})
filled_df = df.fillna(0)
print(filled_df)

Score
0 90.0
1 85.0
2 0.0
3 92.0


---

18. df.drop_duplicates()
Removes duplicate rows from the DataFrame.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Age': [25, 30, 25]}
df = pd.DataFrame(data)
unique_df = df.drop_duplicates()
print(unique_df)

Name  Age
0 Alice 25
1 Bob 30


---

19. df.rename()
Alters axes labels (e.g., column names).

import pandas as pd
df = pd.DataFrame({'A': [1], 'B': [2]})
renamed_df = df.rename(columns={'A': 'Column_A', 'B': 'Column_B'})
print(renamed_df)

Column_A  Column_B
0 1 2


---

20. series.value_counts()
Returns a Series containing counts of unique values.
πŸ“Œ I Cleaned a Messy CSV File Using Pandasβ€Š. β€ŠHere’s the Exact Process I Follow Every Time.

πŸ—‚ Category: DATA SCIENCE

πŸ•’ Date: 2025-11-26 | ⏱️ Read time: 17 min read

Stop guessing when cleaning messy CSV files. This article details a repeatable 5-step workflow using Python's Pandas library to systematically diagnose and fix data quality issues. Learn a structured, practical process to transform your data preparation, moving from haphazard fixes to a reliable methodology for any data professional.

#Python #Pandas #DataCleaning #DataScience
❀3