๐ก Pandas Cheatsheet
A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.
1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.
โข A dictionary is defined where keys become column names and values become the data in those columns.
2. Selecting Data with
Use
โข
โข
3. Filtering Data
Select subsets of data based on conditions.
โข The expression
โข Using this Series as an index
4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.
โข
โข
#Python #Pandas #DataAnalysis #DataScience #Programming
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.
1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 32, 28],
'City': ['New York', 'Paris', 'New York']}
df = pd.DataFrame(data)
print(df)
# Name Age City
# 0 Alice 25 New York
# 1 Bob 32 Paris
# 2 Charlie 28 New York
โข A dictionary is defined where keys become column names and values become the data in those columns.
pd.DataFrame() converts it into a tabular structure.2. Selecting Data with
.loc and .ilocUse
.loc for label-based selection and .iloc for integer-position based selection.# Select the first row by its integer position (0)
print(df.iloc[0])
# Select the row with index label 1 and only the 'Name' column
print(df.loc[1, 'Name'])
# Output for df.iloc[0]:
# Name Alice
# Age 25
# City New York
# Name: 0, dtype: object
#
# Output for df.loc[1, 'Name']:
# Bob
โข
.iloc[0] gets all data from the row at index position 0.โข
.loc[1, 'Name'] gets the data at the intersection of index label 1 and column label 'Name'.3. Filtering Data
Select subsets of data based on conditions.
# Select rows where Age is greater than 27
filtered_df = df[df['Age'] > 27]
print(filtered_df)
# Name Age City
# 1 Bob 32 Paris
# 2 Charlie 28 New York
โข The expression
df['Age'] > 27 creates a boolean Series (True/False).โข Using this Series as an index
df[...] returns only the rows where the value was True.4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.
# Group by 'City' and calculate the mean age for each city
city_ages = df.groupby('City')['Age'].mean()
print(city_ages)
# City
# New York 26.5
# Paris 32.0
# Name: Age, dtype: float64
โข
.groupby('City') splits the DataFrame into groups based on unique city values.โข
['Age'].mean() then calculates the mean of the 'Age' column for each of these groups.#Python #Pandas #DataAnalysis #DataScience #Programming
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
โค1๐1
๐ก SciPy: Scientific Computing in Python
SciPy is a fundamental library for scientific and technical computing in Python. Built on NumPy, it provides a wide range of user-friendly and efficient numerical routines for tasks like optimization, integration, linear algebra, and statistics.
โข Optimization:
โข We provide the function (
โข The result object (
โข Numerical Integration:
โข It returns a tuple containing the integral result and an estimate of the absolute error.
โข Linear Algebra:
โข
โข Statistics:
โข
โข The p-value helps determine if the difference between sample means is statistically significant (a low p-value, e.g., < 0.05, suggests it is).
#SciPy #Python #DataScience #ScientificComputing #Statistics
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
SciPy is a fundamental library for scientific and technical computing in Python. Built on NumPy, it provides a wide range of user-friendly and efficient numerical routines for tasks like optimization, integration, linear algebra, and statistics.
import numpy as np
from scipy.optimize import minimize
# Define a function to minimize: f(x) = (x - 3)^2
def f(x):
return (x - 3)**2
# Find the minimum of the function with an initial guess
res = minimize(f, x0=0)
print(f"Minimum found at x = {res.x[0]:.4f}")
# Output:
# Minimum found at x = 3.0000
โข Optimization:
scipy.optimize.minimize is used to find the minimum value of a function.โข We provide the function (
f) and an initial guess (x0=0).โข The result object (
res) contains the solution in the .x attribute.from scipy.integrate import quad
# Define the function to integrate: f(x) = sin(x)
def integrand(x):
return np.sin(x)
# Integrate sin(x) from 0 to pi
result, error = quad(integrand, 0, np.pi)
print(f"Integral result: {result:.4f}")
print(f"Estimated error: {error:.2e}")
# Output:
# Integral result: 2.0000
# Estimated error: 2.22e-14
โข Numerical Integration:
scipy.integrate.quad calculates the definite integral of a function over a given interval.โข It returns a tuple containing the integral result and an estimate of the absolute error.
from scipy.linalg import solve
# Solve the linear system Ax = b
# 3x + 2y = 12
# x - y = 1
A = np.array([[3, 2], [1, -1]])
b = np.array([12, 1])
solution = solve(A, b)
print(f"Solution (x, y): {solution}")
# Output:
# Solution (x, y): [2.8 1.8]
โข Linear Algebra:
scipy.linalg provides more advanced linear algebra routines than NumPy.โข
solve(A, b) efficiently finds the solution vector x for a system of linear equations defined by a matrix A and a vector b.from scipy import stats
# Create two independent samples
sample1 = np.random.normal(loc=5, scale=2, size=100)
sample2 = np.random.normal(loc=5.5, scale=2, size=100)
# Perform an independent t-test
t_stat, p_value = stats.ttest_ind(sample1, sample2)
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
# Output (will vary):
# T-statistic: -1.7432
# P-value: 0.0829
โข Statistics:
scipy.stats is a powerful module for statistical analysis.โข
ttest_ind calculates the T-test for the means of two independent samples.โข The p-value helps determine if the difference between sample means is statistically significant (a low p-value, e.g., < 0.05, suggests it is).
#SciPy #Python #DataScience #ScientificComputing #Statistics
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
โค3
#CNN #DeepLearning #Python #Tutorial
Lesson: Building a Convolutional Neural Network (CNN) for Image Classification
This lesson will guide you through building a CNN from scratch using TensorFlow and Keras to classify images from the CIFAR-10 dataset.
---
Part 1: Setup and Data Loading
First, we import the necessary libraries and load the CIFAR-10 dataset. This dataset contains 60,000 32x32 color images in 10 classes.
#TensorFlow #Keras #DataLoading
---
Part 2: Data Exploration and Preprocessing
We need to prepare the data before feeding it to the network. This involves:
โข Normalization: Scaling pixel values from the 0-255 range to the 0-1 range.
โข One-Hot Encoding: Converting class vectors (integers) to a binary matrix.
Let's also visualize some images to understand our data.
#DataPreprocessing #Normalization #Visualization
---
Part 3: Building the CNN Model
Now, we'll construct our CNN model. A common architecture consists of a stack of
โข Conv2D: Extracts features (like edges, corners) from the input image.
โข MaxPooling2D: Reduces the spatial dimensions (downsampling), which helps in making the feature detection more robust.
โข Flatten: Converts the 2D feature maps into a 1D vector.
โข Dense: A standard fully-connected neural network layer.
#ModelBuilding #CNN #KerasLayers
---
Part 4: Compiling the Model
Before training, we need to configure the learning process. This is done via the
โข Optimizer: An algorithm to update the model's weights (e.g., 'adam').
โข Loss Function: A function to measure how inaccurate the model is during training (e.g., 'categorical_crossentropy' for multi-class classification).
โข Metrics: Used to monitor the training and testing steps (e.g., 'accuracy').
#ModelCompilation #Optimizer #LossFunction
---
Lesson: Building a Convolutional Neural Network (CNN) for Image Classification
This lesson will guide you through building a CNN from scratch using TensorFlow and Keras to classify images from the CIFAR-10 dataset.
---
Part 1: Setup and Data Loading
First, we import the necessary libraries and load the CIFAR-10 dataset. This dataset contains 60,000 32x32 color images in 10 classes.
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np
# Load the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
# Check the shape of the data
print("Training data shape:", x_train.shape)
print("Test data shape:", x_test.shape)
#TensorFlow #Keras #DataLoading
---
Part 2: Data Exploration and Preprocessing
We need to prepare the data before feeding it to the network. This involves:
โข Normalization: Scaling pixel values from the 0-255 range to the 0-1 range.
โข One-Hot Encoding: Converting class vectors (integers) to a binary matrix.
Let's also visualize some images to understand our data.
# Define class names for CIFAR-10
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
# Visualize a few images
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i])
plt.xlabel(class_names[y_train[i][0]])
plt.show()
# Normalize pixel values to be between 0 and 1
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
# One-hot encode the labels
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)
#DataPreprocessing #Normalization #Visualization
---
Part 3: Building the CNN Model
Now, we'll construct our CNN model. A common architecture consists of a stack of
Conv2D and MaxPooling2D layers, followed by Dense layers for classification.โข Conv2D: Extracts features (like edges, corners) from the input image.
โข MaxPooling2D: Reduces the spatial dimensions (downsampling), which helps in making the feature detection more robust.
โข Flatten: Converts the 2D feature maps into a 1D vector.
โข Dense: A standard fully-connected neural network layer.
model = models.Sequential()
# Convolutional Base
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Flatten and Dense Layers
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax')) # 10 output classes
# Print the model summary
model.summary()
#ModelBuilding #CNN #KerasLayers
---
Part 4: Compiling the Model
Before training, we need to configure the learning process. This is done via the
compile() method, which requires:โข Optimizer: An algorithm to update the model's weights (e.g., 'adam').
โข Loss Function: A function to measure how inaccurate the model is during training (e.g., 'categorical_crossentropy' for multi-class classification).
โข Metrics: Used to monitor the training and testing steps (e.g., 'accuracy').
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
#ModelCompilation #Optimizer #LossFunction
---
#YOLOv8 #ComputerVision #ObjectDetection #IndustrialAI #Python
Applying YOLOv8 for Industrial Automation: Counting Plastic Bottles
This lesson will guide you through a complete computer vision project using YOLOv8. The goal is to detect and count plastic bottles in an image from an industrial setting, such as a conveyor belt or a storage area.
---
Step 1: Setup and Installation
First, we need to install the necessary libraries. The
#Setup #Installation
---
Step 2: Loading the Model and the Target Image
We will load a pre-trained YOLOv8 model. These models are trained on the large COCO dataset, which already knows how to identify common objects like 'bottle'. Then, we'll load our industrial image. Ensure you have an image named
#ModelLoading #DataHandling
---
Step 3: Performing Detection on the Image
With the model and image loaded, we can now run the detection. The
#Inference #ObjectDetection
---
Step 4: Filtering and Counting the Bottles
The model detects many types of objects. Our task is to go through the results, filter for only the 'bottle' class, and count how many there are. We'll also store the locations (bounding boxes) of each detected bottle for visualization.
#DataProcessing #Filtering
---
Step 5: Visualizing the Results
A number is good, but seeing what the model detected is better. We will draw the bounding boxes and the final count directly onto the image to create a clear visual output.
#Visualization #OpenCV
Applying YOLOv8 for Industrial Automation: Counting Plastic Bottles
This lesson will guide you through a complete computer vision project using YOLOv8. The goal is to detect and count plastic bottles in an image from an industrial setting, such as a conveyor belt or a storage area.
---
Step 1: Setup and Installation
First, we need to install the necessary libraries. The
ultralytics library provides the YOLOv8 model, and opencv-python is essential for image processing tasks.#Setup #Installation
# Open your terminal or command prompt and run this command:
pip install ultralytics opencv-python
---
Step 2: Loading the Model and the Target Image
We will load a pre-trained YOLOv8 model. These models are trained on the large COCO dataset, which already knows how to identify common objects like 'bottle'. Then, we'll load our industrial image. Ensure you have an image named
factory_bottles.jpg in your project folder.#ModelLoading #DataHandling
import cv2
from ultralytics import YOLO
# Load a pre-trained YOLOv8 model (yolov8n.pt is the smallest and fastest)
model = YOLO('yolov8n.pt')
# Load the image from the industrial setting
image_path = 'factory_bottles.jpg' # Make sure this image is in your directory
img = cv2.imread(image_path)
# A quick check to ensure the image was loaded correctly
if img is None:
print(f"Error: Could not load image at {image_path}")
else:
print("YOLOv8 model and image loaded successfully.")
---
Step 3: Performing Detection on the Image
With the model and image loaded, we can now run the detection. The
ultralytics library makes this process incredibly simple. The model will analyze the image and identify all the objects it recognizes.#Inference #ObjectDetection
# Run the model on the image to get detection results
results = model(img)
print("Detection complete. Processing results...")
---
Step 4: Filtering and Counting the Bottles
The model detects many types of objects. Our task is to go through the results, filter for only the 'bottle' class, and count how many there are. We'll also store the locations (bounding boxes) of each detected bottle for visualization.
#DataProcessing #Filtering
# Initialize a counter for the bottles
bottle_count = 0
bottle_boxes = []
# The model's results is a list, so we loop through it
for result in results:
# Each result has a 'boxes' attribute with the detections
boxes = result.boxes
for box in boxes:
# Get the class ID of the detected object
class_id = int(box.cls)
# Check if the class name is 'bottle'
if model.names[class_id] == 'bottle':
bottle_count += 1
# Store the bounding box coordinates (x1, y1, x2, y2)
bottle_boxes.append(box.xyxy[0])
print(f"Total plastic bottles detected: {bottle_count}")
---
Step 5: Visualizing the Results
A number is good, but seeing what the model detected is better. We will draw the bounding boxes and the final count directly onto the image to create a clear visual output.
#Visualization #OpenCV
๐ฅ1
#Pandas #DataAnalysis #Python #DataScience #Tutorial
Top 30 Pandas Functions & Methods
This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.
---
1.
Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.
---
2.
Creates a new Series (a 1D labeled array).
---
3.
Reads data from a CSV file into a DataFrame. (Assuming a file
---
4.
Writes a DataFrame to a CSV file.
#PandasIO #DataFrame #Series
---
5.
Returns the first
---
6.
Returns the last
---
7.
Provides a concise summary of the DataFrame, including data types and non-null values.
---
8.
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.
#DataInspection #PandasBasics
---
9.
Generates descriptive statistics for numerical columns (count, mean, std, min, max, etc.).
Top 30 Pandas Functions & Methods
This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.
---
1.
pd.DataFrame()Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.
import pandas as pd
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
print(df)
col1 col2
0 1 3
1 2 4
---
2.
pd.Series()Creates a new Series (a 1D labeled array).
import pandas as pd
s = pd.Series([10, 20, 30, 40], name='MyNumbers')
print(s)
0 10
1 20
2 30
3 40
Name: MyNumbers, dtype: int64
---
3.
pd.read_csv()Reads data from a CSV file into a DataFrame. (Assuming a file
data.csv exists).# Create a dummy csv file first
with open('data.csv', 'w') as f:
f.write('Name,Age\nAlice,25\nBob,30')
df = pd.read_csv('data.csv')
print(df)
Name Age
0 Alice 25
1 Bob 30
---
4.
df.to_csv()Writes a DataFrame to a CSV file.
import pandas as pd
df = pd.DataFrame({'Name': ['Charlie'], 'Age': [35]})
# index=False prevents writing the DataFrame index to the file
df.to_csv('output.csv', index=False)
# You can check that 'output.csv' has been created.
print("File 'output.csv' created.")
File 'output.csv' created.
#PandasIO #DataFrame #Series
---
5.
df.head()Returns the first
n rows of the DataFrame (default is 5).import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.head(3))
Name Value
0 A 1
1 B 2
2 C 3
---
6.
df.tail()Returns the last
n rows of the DataFrame (default is 5).import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.tail(2))
Name Value
4 E 5
5 F 6
---
7.
df.info()Provides a concise summary of the DataFrame, including data types and non-null values.
import pandas as pd
import numpy as np
data = {'col1': [1, 2, 3], 'col2': [4.0, 5.0, np.nan], 'col3': ['A', 'B', 'C']}
df = pd.DataFrame(data)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null int64
1 col2 2 non-null float64
2 col3 3 non-null object
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
---
8.
df.shapeReturns a tuple representing the dimensionality (rows, columns) of the DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)
(2, 3)
#DataInspection #PandasBasics
---
9.
df.describe()Generates descriptive statistics for numerical columns (count, mean, std, min, max, etc.).
import pandas as pd
df = pd.DataFrame({'Age': [22, 38, 26, 35, 29]})
print(df.describe())
โค2
Top 100 Data Analyst Interview Questions & Answers
#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience
Part 1: SQL Questions (Q1-30)
#1. What is the difference between
A:
โข
โข
โข
#2. Select all unique departments from the
A: Use the
#3. Find the top 5 highest-paid employees.
A: Use
#4. What is the difference between
A:
โข
โข
#5. What are the different types of SQL joins?
A:
โข
โข
โข
โข
โข
#6. Write a query to find the second-highest salary.
A: Use
#7. Find duplicate emails in a
A: Group by the email column and use
#8. What is a primary key vs. a foreign key?
A:
โข A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
โข A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.
#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.
#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a
#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience
Part 1: SQL Questions (Q1-30)
#1. What is the difference between
DELETE, TRUNCATE, and DROP?A:
โข
DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.โข
TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.โข
DROP is a DDL command that removes the entire table, including its structure, data, and indexes.#2. Select all unique departments from the
employees table.A: Use the
DISTINCT keyword.SELECT DISTINCT department
FROM employees;
#3. Find the top 5 highest-paid employees.
A: Use
ORDER BY and LIMIT.SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;
#4. What is the difference between
WHERE and HAVING?A:
โข
WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).โข
HAVING is used to filter groups after aggregations (GROUP BY) have been performed.-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;
#5. What are the different types of SQL joins?
A:
โข
(INNER) JOIN: Returns records that have matching values in both tables.โข
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.โข
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.โข
FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.โข
SELF JOIN: A regular join, but the table is joined with itself.#6. Write a query to find the second-highest salary.
A: Use
OFFSET or a subquery.-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
#7. Find duplicate emails in a
customers table.A: Group by the email column and use
HAVING to find groups with a count greater than 1.SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;
#8. What is a primary key vs. a foreign key?
A:
โข A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
โข A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.
#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.
-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a
SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.๐ก Applying Image Filters with Pillow
Pillow's
Code explanation: The script opens an image file, applies a
#Python #Pillow #ImageProcessing #ImageFilter #PIL
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
Pillow's
ImageFilter module provides a set of pre-defined filters you can apply to your images with a single line of code. This example demonstrates how to apply a Gaussian blur effect, which is useful for softening images or creating depth-of-field effects.from PIL import Image, ImageFilter
try:
# Open an existing image
with Image.open("your_image.jpg") as img:
# Apply the Gaussian Blur filter
# The radius parameter controls the blur intensity
blurred_img = img.filter(ImageFilter.GaussianBlur(radius=5))
# Display the blurred image
blurred_img.show()
# Save the new image
blurred_img.save("blurred_image.png")
except FileNotFoundError:
print("Error: 'your_image.jpg' not found. Please provide an image.")
Code explanation: The script opens an image file, applies a
GaussianBlur filter from the ImageFilter module using the .filter() method, and then displays and saves the resulting blurred image. The blur intensity is controlled by the radius argument.#Python #Pillow #ImageProcessing #ImageFilter #PIL
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
โข Get raw audio data as a NumPy array.
โข Create a Pydub segment from a NumPy array.
โข Read a WAV file directly into a NumPy array.
โข Write a NumPy array to a WAV file.
โข Generate a sine wave.
VIII. Audio Analysis with Librosa
โข Load audio with Librosa.
โข Estimate tempo (Beats Per Minute).
โข Get beat event times in seconds.
โข Decompose into harmonic and percussive components.
โข Compute a spectrogram.
โข Compute Mel-Frequency Cepstral Coefficients (MFCCs).
โข Compute Chroma features (related to musical pitch).
โข Detect onset events (the start of notes).
โข Pitch shifting.
โข Time stretching (change speed without changing pitch).
IX. More Utilities
โข Detect leading silence.
โข Get the root mean square (RMS) energy.
โข Get the maximum possible RMS for the audio format.
โข Find the loudest section of an audio file.
โข Change the frame rate (resample).
โข Create a simple band-pass filter.
โข Convert file format in one line.
โข Get the raw bytes of the audio data.
โข Get the maximum amplitude.
โข Match the volume of two segments.
#Python #AudioProcessing #Pydub #Librosa #SignalProcessing
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
import numpy as np
samples = np.array(audio.get_array_of_samples())
โข Create a Pydub segment from a NumPy array.
new_audio = AudioSegment(
samples.tobytes(),
frame_rate=audio.frame_rate,
sample_width=audio.sample_width,
channels=audio.channels
)
โข Read a WAV file directly into a NumPy array.
from scipy.io.wavfile import read
rate, data = read("sound.wav")
โข Write a NumPy array to a WAV file.
from scipy.io.wavfile import write
write("new_sound.wav", rate, data)
โข Generate a sine wave.
import numpy as np
sample_rate = 44100
frequency = 440 # A4 note
duration = 5
t = np.linspace(0., duration, int(sample_rate * duration))
amplitude = np.iinfo(np.int16).max * 0.5
data = amplitude * np.sin(2. * np.pi * frequency * t)
# This array can now be written to a file
VIII. Audio Analysis with Librosa
โข Load audio with Librosa.
import librosa
y, sr = librosa.load("sound.mp3")
โข Estimate tempo (Beats Per Minute).
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
โข Get beat event times in seconds.
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
โข Decompose into harmonic and percussive components.
y_harmonic, y_percussive = librosa.effects.hpss(y)
โข Compute a spectrogram.
import numpy as np
D = librosa.stft(y)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)
โข Compute Mel-Frequency Cepstral Coefficients (MFCCs).
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
โข Compute Chroma features (related to musical pitch).
chroma = librosa.feature.chroma_stft(y=y, sr=sr)
โข Detect onset events (the start of notes).
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
onset_times = librosa.frames_to_time(onset_frames, sr=sr)
โข Pitch shifting.
y_pitched = librosa.effects.pitch_shift(y, sr=sr, n_steps=4) # Shift up 4 semitones
โข Time stretching (change speed without changing pitch).
y_fast = librosa.effects.time_stretch(y, rate=2.0) # Double speed
IX. More Utilities
โข Detect leading silence.
from pydub.silence import detect_leading_silence
trim_ms = detect_leading_silence(audio)
trimmed_audio = audio[trim_ms:]
โข Get the root mean square (RMS) energy.
rms = audio.rms
โข Get the maximum possible RMS for the audio format.
max_possible_rms = audio.max_possible_amplitude
โข Find the loudest section of an audio file.
from pydub.scipy_effects import normalize
loudest_part = normalize(audio.strip_silence(silence_len=1000, silence_thresh=-32))
โข Change the frame rate (resample).
resampled = audio.set_frame_rate(16000)
โข Create a simple band-pass filter.
from pydub.scipy_effects import band_pass_filter
filtered = band_pass_filter(audio, 400, 2000) # Pass between 400Hz and 2000Hz
โข Convert file format in one line.
AudioSegment.from_file("music.ogg").export("music.mp3", format="mp3")โข Get the raw bytes of the audio data.
raw_data = audio.raw_data
โข Get the maximum amplitude.
max_amp = audio.max
โข Match the volume of two segments.
matched_audio2 = audio2.apply_gain(audio1.dBFS - audio2.dBFS)
#Python #AudioProcessing #Pydub #Librosa #SignalProcessing
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
โค2
segment = sine_wave[0:51]
windowed_segment = segment * window
VI. Convolution & Correlation
โข Perform linear convolution.
sig1 = np.repeat([0., 1., 0.], 100)
sig2 = np.repeat([0., 1., 1., 0.], 100)
convolved = signal.convolve(sig1, sig2, mode='same')
โข Compute cross-correlation.
# Useful for finding delays between signals
correlation = signal.correlate(sig1, sig2, mode='full')
โข Compute auto-correlation.
# Useful for finding periodicities in a signal
autocorr = signal.correlate(sine_wave, sine_wave, mode='full')
VII. Time-Frequency Analysis
โข Compute and plot a spectrogram.
f, t_spec, Sxx = signal.spectrogram(chirp_signal, fs)
plt.pcolormesh(t_spec, f, Sxx, shading='gouraud')
plt.show()
โข Perform Continuous Wavelet Transform (CWT).
widths = np.arange(1, 31)
cwt_matrix = signal.cwt(chirp_signal, signal.ricker, widths)
โข Perform Hilbert transform to get the analytic signal.
analytic_signal = signal.hilbert(sine_wave)
โข Calculate instantaneous frequency.
instant_phase = np.unwrap(np.angle(analytic_signal))
instant_freq = (np.diff(instant_phase) / (2.0*np.pi) * fs)
VIII. Feature Extraction
โข Find peaks in a signal.
peaks, _ = signal.find_peaks(sine_wave, height=0.5)
โข Find peaks with prominence criteria.
peaks_prom, _ = signal.find_peaks(noisy_signal, prominence=1)
โข Differentiate a signal (e.g., to find velocity from position).
derivative = np.diff(sine_wave)
โข Integrate a signal.
from scipy.integrate import cumulative_trapezoid
integral = cumulative_trapezoid(sine_wave, t, initial=0)
โข Detrend a signal to remove a linear trend.
trend = np.linspace(0, 1, fs)
trended_signal = sine_wave + trend
detrended = signal.detrend(trended_signal)
IX. System Analysis
โข Define a system via a transfer function (numerator, denominator).
# Example: 2nd order low-pass filter
system = signal.TransferFunction([1], [1, 1, 1])
โข Compute the step response of a system.
t_step, y_step = signal.step(system)
โข Compute the impulse response of a system.
t_impulse, y_impulse = signal.impulse(system)
โข Compute the Bode plot of a system's frequency response.
w, mag, phase = signal.bode(system)
X. Signal Generation from Data
โข Generate a signal from a function.
t = np.linspace(0, 1, 500)
custom_signal = np.sinc(2 * np.pi * 4 * t)
โข Convert a list of values to a signal array.
my_data = [0, 1, 2, 3, 2, 1, 0, -1, -2, -1, 0]
data_signal = np.array(my_data)
โข Read signal data from a WAV file.
from scipy.io import wavfile
samplerate, data = wavfile.read('audio.wav')
โข Create a pulse train signal.
pulse_train = np.zeros(fs)
pulse_train[::100] = 1 # Impulse every 100 samples
#Python #SignalProcessing #SciPy #NumPy #DSP
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots
โข Plot on a specific subplot (Axes object).
axes[0, 0].plot(x, np.sin(x))
โข Set the title for a specific subplot.
axes[0, 0].set_title('Subplot 1')โข Set labels for a specific subplot.
axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')โข Add a legend to a specific subplot.
axes[0, 0].legend(['Sine'])
โข Add a main title for the entire figure.
fig.suptitle('Main Figure Title')โข Automatically adjust subplot parameters for a tight layout.
plt.tight_layout()
โข Share x or y axes between subplots.
fig, axes = plt.subplots(2, 1, sharex=True)
โข Get the current Axes instance.
ax = plt.gca()
โข Create a second y-axis that shares the x-axis.
ax2 = ax.twinx()
VI. Specialized Plots
โข Create a contour plot.
X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)
โข Create a filled contour plot.
plt.contourf(X, Y, Z)
โข Create a stream plot for vector fields.
U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)
โข Create a 3D surface plot.
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)
#Python #Matplotlib #DataVisualization #DataScience #Plotting
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
โข Group data by a column.
โข Group by a column and get the sum.
โข Apply multiple aggregation functions at once.
โข Get the size of each group.
โข Get the frequency counts of unique values in a Series.
โข Create a pivot table.
VI. Merging, Joining & Concatenating
โข Merge two DataFrames (like a SQL join).
โข Concatenate (stack) DataFrames along an axis.
โข Join DataFrames on their indexes.
VII. Input & Output
โข Write a DataFrame to a CSV file.
โข Write a DataFrame to an Excel file.
โข Read data from an Excel file.
โข Read from a SQL database.
VIII. Time Series & Special Operations
โข Use the string accessor (
โข Use the datetime accessor (
โข Create a rolling window calculation.
โข Create a basic plot from a Series or DataFrame.
#Python #Pandas #DataAnalysis #DataScience #Programming
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
df.groupby('col1')โข Group by a column and get the sum.
df.groupby('col1').sum()โข Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])โข Get the size of each group.
df.groupby('col1').size()โข Get the frequency counts of unique values in a Series.
df['col1'].value_counts()
โข Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
VI. Merging, Joining & Concatenating
โข Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')
โข Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows
โข Join DataFrames on their indexes.
left_df.join(right_df, how='outer')
VII. Input & Output
โข Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)โข Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')โข Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')โข Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)VIII. Time Series & Special Operations
โข Use the string accessor (
.str) for Series operations.s.str.lower()
s.str.contains('pattern')
โข Use the datetime accessor (
.dt) for Series operations.s.dt.year
s.dt.day_name()
โข Create a rolling window calculation.
df['col1'].rolling(window=3).mean()
โข Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')
#Python #Pandas #DataAnalysis #DataScience #Programming
โโโโโโโโโโโโโโโ
By: @DataScienceM โจ
โค6๐1๐ฅ1
๐ NumPy for Absolute Beginners: A Project-Based Approach to Data Analysis
๐ Category: DATA SCIENCE
๐ Date: 2025-11-04 | โฑ๏ธ Read time: 14 min read
Master NumPy for data analysis with this project-based guide for absolute beginners. Learn to build a high-performance sensor data pipeline from scratch and unlock the true speed of Python for data-intensive applications.
#NumPy #Python #DataAnalysis #DataScience
๐ Category: DATA SCIENCE
๐ Date: 2025-11-04 | โฑ๏ธ Read time: 14 min read
Master NumPy for data analysis with this project-based guide for absolute beginners. Learn to build a high-performance sensor data pipeline from scratch and unlock the true speed of Python for data-intensive applications.
#NumPy #Python #DataAnalysis #DataScience
๐ Train a Humanoid Robot with AI and Python
๐ Category: ROBOTICS
๐ Date: 2025-11-04 | โฑ๏ธ Read time: 9 min read
Explore how to train a humanoid robot using Python and AI. This guide covers the application of 3D simulations and Reinforcement Learning, leveraging powerful tools like the MuJoCo physics engine and the Gym toolkit to create and manage sophisticated learning environments for robotics.
#AI #Robotics #Python #ReinforcementLearning #MachineLearning
๐ Category: ROBOTICS
๐ Date: 2025-11-04 | โฑ๏ธ Read time: 9 min read
Explore how to train a humanoid robot using Python and AI. This guide covers the application of 3D simulations and Reinforcement Learning, leveraging powerful tools like the MuJoCo physics engine and the Gym toolkit to create and manage sophisticated learning environments for robotics.
#AI #Robotics #Python #ReinforcementLearning #MachineLearning
โค1