Data Science Machine Learning Data Analysis
38.7K subscribers
3.64K photos
31 videos
39 files
1.27K links
ads: @HusseinSheikho

This channel is for Programmers, Coders, Software Engineers.

1- Data Science
2- Machine Learning
3- Data Visualization
4- Artificial Intelligence
5- Data Analysis
6- Statistics
7- Deep Learning
Download Telegram
๐Ÿ’ก Pandas Cheatsheet

A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.

1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 32, 28],
'City': ['New York', 'Paris', 'New York']}
df = pd.DataFrame(data)

print(df)
# Name Age City
# 0 Alice 25 New York
# 1 Bob 32 Paris
# 2 Charlie 28 New York

โ€ข A dictionary is defined where keys become column names and values become the data in those columns. pd.DataFrame() converts it into a tabular structure.

2. Selecting Data with .loc and .iloc
Use .loc for label-based selection and .iloc for integer-position based selection.
# Select the first row by its integer position (0)
print(df.iloc[0])

# Select the row with index label 1 and only the 'Name' column
print(df.loc[1, 'Name'])

# Output for df.iloc[0]:
# Name Alice
# Age 25
# City New York
# Name: 0, dtype: object
#
# Output for df.loc[1, 'Name']:
# Bob

โ€ข .iloc[0] gets all data from the row at index position 0.
โ€ข .loc[1, 'Name'] gets the data at the intersection of index label 1 and column label 'Name'.

3. Filtering Data
Select subsets of data based on conditions.
# Select rows where Age is greater than 27
filtered_df = df[df['Age'] > 27]
print(filtered_df)
# Name Age City
# 1 Bob 32 Paris
# 2 Charlie 28 New York

โ€ข The expression df['Age'] > 27 creates a boolean Series (True/False).
โ€ข Using this Series as an index df[...] returns only the rows where the value was True.

4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.
# Group by 'City' and calculate the mean age for each city
city_ages = df.groupby('City')['Age'].mean()
print(city_ages)
# City
# New York 26.5
# Paris 32.0
# Name: Age, dtype: float64

โ€ข .groupby('City') splits the DataFrame into groups based on unique city values.
โ€ข ['Age'].mean() then calculates the mean of the 'Age' column for each of these groups.

#Python #Pandas #DataAnalysis #DataScience #Programming

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceM โœจ
โค1๐Ÿ‘1
๐Ÿ’ก SciPy: Scientific Computing in Python

SciPy is a fundamental library for scientific and technical computing in Python. Built on NumPy, it provides a wide range of user-friendly and efficient numerical routines for tasks like optimization, integration, linear algebra, and statistics.

import numpy as np
from scipy.optimize import minimize

# Define a function to minimize: f(x) = (x - 3)^2
def f(x):
return (x - 3)**2

# Find the minimum of the function with an initial guess
res = minimize(f, x0=0)

print(f"Minimum found at x = {res.x[0]:.4f}")
# Output:
# Minimum found at x = 3.0000

โ€ข Optimization: scipy.optimize.minimize is used to find the minimum value of a function.
โ€ข We provide the function (f) and an initial guess (x0=0).
โ€ข The result object (res) contains the solution in the .x attribute.

from scipy.integrate import quad

# Define the function to integrate: f(x) = sin(x)
def integrand(x):
return np.sin(x)

# Integrate sin(x) from 0 to pi
result, error = quad(integrand, 0, np.pi)

print(f"Integral result: {result:.4f}")
print(f"Estimated error: {error:.2e}")
# Output:
# Integral result: 2.0000
# Estimated error: 2.22e-14

โ€ข Numerical Integration: scipy.integrate.quad calculates the definite integral of a function over a given interval.
โ€ข It returns a tuple containing the integral result and an estimate of the absolute error.

from scipy.linalg import solve

# Solve the linear system Ax = b
# 3x + 2y = 12
# x - y = 1

A = np.array([[3, 2], [1, -1]])
b = np.array([12, 1])

solution = solve(A, b)
print(f"Solution (x, y): {solution}")
# Output:
# Solution (x, y): [2.8 1.8]

โ€ข Linear Algebra: scipy.linalg provides more advanced linear algebra routines than NumPy.
โ€ข solve(A, b) efficiently finds the solution vector x for a system of linear equations defined by a matrix A and a vector b.

from scipy import stats

# Create two independent samples
sample1 = np.random.normal(loc=5, scale=2, size=100)
sample2 = np.random.normal(loc=5.5, scale=2, size=100)

# Perform an independent t-test
t_stat, p_value = stats.ttest_ind(sample1, sample2)

print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
# Output (will vary):
# T-statistic: -1.7432
# P-value: 0.0829

โ€ข Statistics: scipy.stats is a powerful module for statistical analysis.
โ€ข ttest_ind calculates the T-test for the means of two independent samples.
โ€ข The p-value helps determine if the difference between sample means is statistically significant (a low p-value, e.g., < 0.05, suggests it is).

#SciPy #Python #DataScience #ScientificComputing #Statistics

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceM โœจ
โค3
#CNN #DeepLearning #Python #Tutorial

Lesson: Building a Convolutional Neural Network (CNN) for Image Classification

This lesson will guide you through building a CNN from scratch using TensorFlow and Keras to classify images from the CIFAR-10 dataset.

---

Part 1: Setup and Data Loading

First, we import the necessary libraries and load the CIFAR-10 dataset. This dataset contains 60,000 32x32 color images in 10 classes.

import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np

# Load the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()

# Check the shape of the data
print("Training data shape:", x_train.shape)
print("Test data shape:", x_test.shape)

#TensorFlow #Keras #DataLoading

---

Part 2: Data Exploration and Preprocessing

We need to prepare the data before feeding it to the network. This involves:
โ€ข Normalization: Scaling pixel values from the 0-255 range to the 0-1 range.
โ€ข One-Hot Encoding: Converting class vectors (integers) to a binary matrix.

Let's also visualize some images to understand our data.

# Define class names for CIFAR-10
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

# Visualize a few images
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i])
plt.xlabel(class_names[y_train[i][0]])
plt.show()

# Normalize pixel values to be between 0 and 1
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode the labels
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

#DataPreprocessing #Normalization #Visualization

---

Part 3: Building the CNN Model

Now, we'll construct our CNN model. A common architecture consists of a stack of Conv2D and MaxPooling2D layers, followed by Dense layers for classification.

โ€ข Conv2D: Extracts features (like edges, corners) from the input image.
โ€ข MaxPooling2D: Reduces the spatial dimensions (downsampling), which helps in making the feature detection more robust.
โ€ข Flatten: Converts the 2D feature maps into a 1D vector.
โ€ข Dense: A standard fully-connected neural network layer.

model = models.Sequential()

# Convolutional Base
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# Flatten and Dense Layers
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax')) # 10 output classes

# Print the model summary
model.summary()

#ModelBuilding #CNN #KerasLayers

---

Part 4: Compiling the Model

Before training, we need to configure the learning process. This is done via the compile() method, which requires:
โ€ข Optimizer: An algorithm to update the model's weights (e.g., 'adam').
โ€ข Loss Function: A function to measure how inaccurate the model is during training (e.g., 'categorical_crossentropy' for multi-class classification).
โ€ข Metrics: Used to monitor the training and testing steps (e.g., 'accuracy').

model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

#ModelCompilation #Optimizer #LossFunction

---
#YOLOv8 #ComputerVision #ObjectDetection #IndustrialAI #Python

Applying YOLOv8 for Industrial Automation: Counting Plastic Bottles

This lesson will guide you through a complete computer vision project using YOLOv8. The goal is to detect and count plastic bottles in an image from an industrial setting, such as a conveyor belt or a storage area.

---

Step 1: Setup and Installation

First, we need to install the necessary libraries. The ultralytics library provides the YOLOv8 model, and opencv-python is essential for image processing tasks.

#Setup #Installation

# Open your terminal or command prompt and run this command:
pip install ultralytics opencv-python


---

Step 2: Loading the Model and the Target Image

We will load a pre-trained YOLOv8 model. These models are trained on the large COCO dataset, which already knows how to identify common objects like 'bottle'. Then, we'll load our industrial image. Ensure you have an image named factory_bottles.jpg in your project folder.

#ModelLoading #DataHandling

import cv2
from ultralytics import YOLO

# Load a pre-trained YOLOv8 model (yolov8n.pt is the smallest and fastest)
model = YOLO('yolov8n.pt')

# Load the image from the industrial setting
image_path = 'factory_bottles.jpg' # Make sure this image is in your directory
img = cv2.imread(image_path)

# A quick check to ensure the image was loaded correctly
if img is None:
print(f"Error: Could not load image at {image_path}")
else:
print("YOLOv8 model and image loaded successfully.")


---

Step 3: Performing Detection on the Image

With the model and image loaded, we can now run the detection. The ultralytics library makes this process incredibly simple. The model will analyze the image and identify all the objects it recognizes.

#Inference #ObjectDetection

# Run the model on the image to get detection results
results = model(img)

print("Detection complete. Processing results...")


---

Step 4: Filtering and Counting the Bottles

The model detects many types of objects. Our task is to go through the results, filter for only the 'bottle' class, and count how many there are. We'll also store the locations (bounding boxes) of each detected bottle for visualization.

#DataProcessing #Filtering

# Initialize a counter for the bottles
bottle_count = 0
bottle_boxes = []

# The model's results is a list, so we loop through it
for result in results:
# Each result has a 'boxes' attribute with the detections
boxes = result.boxes
for box in boxes:
# Get the class ID of the detected object
class_id = int(box.cls)
# Check if the class name is 'bottle'
if model.names[class_id] == 'bottle':
bottle_count += 1
# Store the bounding box coordinates (x1, y1, x2, y2)
bottle_boxes.append(box.xyxy[0])

print(f"Total plastic bottles detected: {bottle_count}")


---

Step 5: Visualizing the Results

A number is good, but seeing what the model detected is better. We will draw the bounding boxes and the final count directly onto the image to create a clear visual output.

#Visualization #OpenCV
๐Ÿ”ฅ1
#Pandas #DataAnalysis #Python #DataScience #Tutorial

Top 30 Pandas Functions & Methods

This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.

---

1. pd.DataFrame()
Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.

import pandas as pd
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
print(df)

col1  col2
0 1 3
1 2 4


---

2. pd.Series()
Creates a new Series (a 1D labeled array).

import pandas as pd
s = pd.Series([10, 20, 30, 40], name='MyNumbers')
print(s)

0    10
1 20
2 30
3 40
Name: MyNumbers, dtype: int64


---

3. pd.read_csv()
Reads data from a CSV file into a DataFrame. (Assuming a file data.csv exists).

# Create a dummy csv file first
with open('data.csv', 'w') as f:
f.write('Name,Age\nAlice,25\nBob,30')

df = pd.read_csv('data.csv')
print(df)

Name  Age
0 Alice 25
1 Bob 30


---

4. df.to_csv()
Writes a DataFrame to a CSV file.

import pandas as pd
df = pd.DataFrame({'Name': ['Charlie'], 'Age': [35]})
# index=False prevents writing the DataFrame index to the file
df.to_csv('output.csv', index=False)
# You can check that 'output.csv' has been created.
print("File 'output.csv' created.")

File 'output.csv' created.

#PandasIO #DataFrame #Series

---

5. df.head()
Returns the first n rows of the DataFrame (default is 5).

import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.head(3))

Name  Value
0 A 1
1 B 2
2 C 3


---

6. df.tail()
Returns the last n rows of the DataFrame (default is 5).

import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.tail(2))

Name  Value
4 E 5
5 F 6


---

7. df.info()
Provides a concise summary of the DataFrame, including data types and non-null values.

import pandas as pd
import numpy as np
data = {'col1': [1, 2, 3], 'col2': [4.0, 5.0, np.nan], 'col3': ['A', 'B', 'C']}
df = pd.DataFrame(data)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null int64
1 col2 2 non-null float64
2 col3 3 non-null object
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes


---

8. df.shape
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)

(2, 3)

#DataInspection #PandasBasics

---

9. df.describe()
Generates descriptive statistics for numerical columns (count, mean, std, min, max, etc.).

import pandas as pd
df = pd.DataFrame({'Age': [22, 38, 26, 35, 29]})
print(df.describe())
โค2
Top 100 Data Analyst Interview Questions & Answers

#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience

Part 1: SQL Questions (Q1-30)

#1. What is the difference between DELETE, TRUNCATE, and DROP?
A:
โ€ข DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.
โ€ข TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.
โ€ข DROP is a DDL command that removes the entire table, including its structure, data, and indexes.

#2. Select all unique departments from the employees table.
A: Use the DISTINCT keyword.

SELECT DISTINCT department
FROM employees;


#3. Find the top 5 highest-paid employees.
A: Use ORDER BY and LIMIT.

SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;


#4. What is the difference between WHERE and HAVING?
A:
โ€ข WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).
โ€ข HAVING is used to filter groups after aggregations (GROUP BY) have been performed.

-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;


#5. What are the different types of SQL joins?
A:
โ€ข (INNER) JOIN: Returns records that have matching values in both tables.
โ€ข LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.
โ€ข RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.
โ€ข FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.
โ€ข SELF JOIN: A regular join, but the table is joined with itself.

#6. Write a query to find the second-highest salary.
A: Use OFFSET or a subquery.

-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);


#7. Find duplicate emails in a customers table.
A: Group by the email column and use HAVING to find groups with a count greater than 1.

SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;


#8. What is a primary key vs. a foreign key?
A:
โ€ข A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
โ€ข A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.

#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.

-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;


#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.
๐Ÿ’ก Applying Image Filters with Pillow

Pillow's ImageFilter module provides a set of pre-defined filters you can apply to your images with a single line of code. This example demonstrates how to apply a Gaussian blur effect, which is useful for softening images or creating depth-of-field effects.

from PIL import Image, ImageFilter

try:
# Open an existing image
with Image.open("your_image.jpg") as img:
# Apply the Gaussian Blur filter
# The radius parameter controls the blur intensity
blurred_img = img.filter(ImageFilter.GaussianBlur(radius=5))

# Display the blurred image
blurred_img.show()

# Save the new image
blurred_img.save("blurred_image.png")

except FileNotFoundError:
print("Error: 'your_image.jpg' not found. Please provide an image.")


Code explanation: The script opens an image file, applies a GaussianBlur filter from the ImageFilter module using the .filter() method, and then displays and saves the resulting blurred image. The blur intensity is controlled by the radius argument.

#Python #Pillow #ImageProcessing #ImageFilter #PIL

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceM โœจ
โ€ข Get raw audio data as a NumPy array.
import numpy as np
samples = np.array(audio.get_array_of_samples())

โ€ข Create a Pydub segment from a NumPy array.
new_audio = AudioSegment(
samples.tobytes(),
frame_rate=audio.frame_rate,
sample_width=audio.sample_width,
channels=audio.channels
)

โ€ข Read a WAV file directly into a NumPy array.
from scipy.io.wavfile import read
rate, data = read("sound.wav")

โ€ข Write a NumPy array to a WAV file.
from scipy.io.wavfile import write
write("new_sound.wav", rate, data)

โ€ข Generate a sine wave.
import numpy as np
sample_rate = 44100
frequency = 440 # A4 note
duration = 5
t = np.linspace(0., duration, int(sample_rate * duration))
amplitude = np.iinfo(np.int16).max * 0.5
data = amplitude * np.sin(2. * np.pi * frequency * t)
# This array can now be written to a file


VIII. Audio Analysis with Librosa

โ€ข Load audio with Librosa.
import librosa
y, sr = librosa.load("sound.mp3")

โ€ข Estimate tempo (Beats Per Minute).
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

โ€ข Get beat event times in seconds.
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

โ€ข Decompose into harmonic and percussive components.
y_harmonic, y_percussive = librosa.effects.hpss(y)

โ€ข Compute a spectrogram.
import numpy as np
D = librosa.stft(y)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

โ€ข Compute Mel-Frequency Cepstral Coefficients (MFCCs).
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

โ€ข Compute Chroma features (related to musical pitch).
chroma = librosa.feature.chroma_stft(y=y, sr=sr)

โ€ข Detect onset events (the start of notes).
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
onset_times = librosa.frames_to_time(onset_frames, sr=sr)

โ€ข Pitch shifting.
y_pitched = librosa.effects.pitch_shift(y, sr=sr, n_steps=4) # Shift up 4 semitones

โ€ข Time stretching (change speed without changing pitch).
y_fast = librosa.effects.time_stretch(y, rate=2.0) # Double speed


IX. More Utilities

โ€ข Detect leading silence.
from pydub.silence import detect_leading_silence
trim_ms = detect_leading_silence(audio)
trimmed_audio = audio[trim_ms:]

โ€ข Get the root mean square (RMS) energy.
rms = audio.rms

โ€ข Get the maximum possible RMS for the audio format.
max_possible_rms = audio.max_possible_amplitude

โ€ข Find the loudest section of an audio file.
from pydub.scipy_effects import normalize
loudest_part = normalize(audio.strip_silence(silence_len=1000, silence_thresh=-32))

โ€ข Change the frame rate (resample).
resampled = audio.set_frame_rate(16000)

โ€ข Create a simple band-pass filter.
from pydub.scipy_effects import band_pass_filter
filtered = band_pass_filter(audio, 400, 2000) # Pass between 400Hz and 2000Hz

โ€ข Convert file format in one line.
AudioSegment.from_file("music.ogg").export("music.mp3", format="mp3")

โ€ข Get the raw bytes of the audio data.
raw_data = audio.raw_data

โ€ข Get the maximum amplitude.
max_amp = audio.max

โ€ข Match the volume of two segments.
matched_audio2 = audio2.apply_gain(audio1.dBFS - audio2.dBFS)


#Python #AudioProcessing #Pydub #Librosa #SignalProcessing

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceM โœจ
โค2
segment = sine_wave[0:51]
windowed_segment = segment * window


VI. Convolution & Correlation

โ€ข Perform linear convolution.
sig1 = np.repeat([0., 1., 0.], 100)
sig2 = np.repeat([0., 1., 1., 0.], 100)
convolved = signal.convolve(sig1, sig2, mode='same')

โ€ข Compute cross-correlation.
# Useful for finding delays between signals
correlation = signal.correlate(sig1, sig2, mode='full')

โ€ข Compute auto-correlation.
# Useful for finding periodicities in a signal
autocorr = signal.correlate(sine_wave, sine_wave, mode='full')


VII. Time-Frequency Analysis

โ€ข Compute and plot a spectrogram.
f, t_spec, Sxx = signal.spectrogram(chirp_signal, fs)
plt.pcolormesh(t_spec, f, Sxx, shading='gouraud')
plt.show()

โ€ข Perform Continuous Wavelet Transform (CWT).
widths = np.arange(1, 31)
cwt_matrix = signal.cwt(chirp_signal, signal.ricker, widths)

โ€ข Perform Hilbert transform to get the analytic signal.
analytic_signal = signal.hilbert(sine_wave)

โ€ข Calculate instantaneous frequency.
instant_phase = np.unwrap(np.angle(analytic_signal))
instant_freq = (np.diff(instant_phase) / (2.0*np.pi) * fs)


VIII. Feature Extraction

โ€ข Find peaks in a signal.
peaks, _ = signal.find_peaks(sine_wave, height=0.5)

โ€ข Find peaks with prominence criteria.
peaks_prom, _ = signal.find_peaks(noisy_signal, prominence=1)

โ€ข Differentiate a signal (e.g., to find velocity from position).
derivative = np.diff(sine_wave)

โ€ข Integrate a signal.
from scipy.integrate import cumulative_trapezoid
integral = cumulative_trapezoid(sine_wave, t, initial=0)

โ€ข Detrend a signal to remove a linear trend.
trend = np.linspace(0, 1, fs)
trended_signal = sine_wave + trend
detrended = signal.detrend(trended_signal)


IX. System Analysis

โ€ข Define a system via a transfer function (numerator, denominator).
# Example: 2nd order low-pass filter
system = signal.TransferFunction([1], [1, 1, 1])

โ€ข Compute the step response of a system.
t_step, y_step = signal.step(system)

โ€ข Compute the impulse response of a system.
t_impulse, y_impulse = signal.impulse(system)

โ€ข Compute the Bode plot of a system's frequency response.
w, mag, phase = signal.bode(system)


X. Signal Generation from Data

โ€ข Generate a signal from a function.
t = np.linspace(0, 1, 500)
custom_signal = np.sinc(2 * np.pi * 4 * t)

โ€ข Convert a list of values to a signal array.
my_data = [0, 1, 2, 3, 2, 1, 0, -1, -2, -1, 0]
data_signal = np.array(my_data)

โ€ข Read signal data from a WAV file.
from scipy.io import wavfile
samplerate, data = wavfile.read('audio.wav')

โ€ข Create a pulse train signal.
pulse_train = np.zeros(fs)
pulse_train[::100] = 1 # Impulse every 100 samples


#Python #SignalProcessing #SciPy #NumPy #DSP

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceM โœจ
fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots

โ€ข Plot on a specific subplot (Axes object).
axes[0, 0].plot(x, np.sin(x))

โ€ข Set the title for a specific subplot.
axes[0, 0].set_title('Subplot 1')

โ€ข Set labels for a specific subplot.
axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')

โ€ข Add a legend to a specific subplot.
axes[0, 0].legend(['Sine'])

โ€ข Add a main title for the entire figure.
fig.suptitle('Main Figure Title')

โ€ข Automatically adjust subplot parameters for a tight layout.
plt.tight_layout()

โ€ข Share x or y axes between subplots.
fig, axes = plt.subplots(2, 1, sharex=True)

โ€ข Get the current Axes instance.
ax = plt.gca()

โ€ข Create a second y-axis that shares the x-axis.
ax2 = ax.twinx()


VI. Specialized Plots

โ€ข Create a contour plot.
X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)

โ€ข Create a filled contour plot.
plt.contourf(X, Y, Z)

โ€ข Create a stream plot for vector fields.
U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)

โ€ข Create a 3D surface plot.
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)


#Python #Matplotlib #DataVisualization #DataScience #Plotting

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceM โœจ
โ€ข Group data by a column.
df.groupby('col1')

โ€ข Group by a column and get the sum.
df.groupby('col1').sum()

โ€ข Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])

โ€ข Get the size of each group.
df.groupby('col1').size()

โ€ข Get the frequency counts of unique values in a Series.
df['col1'].value_counts()

โ€ข Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])


VI. Merging, Joining & Concatenating

โ€ข Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')

โ€ข Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows

โ€ข Join DataFrames on their indexes.
left_df.join(right_df, how='outer')


VII. Input & Output

โ€ข Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)

โ€ข Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')

โ€ข Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')

โ€ข Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)


VIII. Time Series & Special Operations

โ€ข Use the string accessor (.str) for Series operations.
s.str.lower()
s.str.contains('pattern')

โ€ข Use the datetime accessor (.dt) for Series operations.
s.dt.year
s.dt.day_name()

โ€ข Create a rolling window calculation.
df['col1'].rolling(window=3).mean()

โ€ข Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')


#Python #Pandas #DataAnalysis #DataScience #Programming

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
By: @DataScienceM โœจ
โค6๐Ÿ‘1๐Ÿ”ฅ1
๐Ÿ“Œ NumPy for Absolute Beginners: A Project-Based Approach to Data Analysis

๐Ÿ—‚ Category: DATA SCIENCE

๐Ÿ•’ Date: 2025-11-04 | โฑ๏ธ Read time: 14 min read

Master NumPy for data analysis with this project-based guide for absolute beginners. Learn to build a high-performance sensor data pipeline from scratch and unlock the true speed of Python for data-intensive applications.

#NumPy #Python #DataAnalysis #DataScience
๐Ÿ“Œ Train a Humanoid Robot with AI and Python

๐Ÿ—‚ Category: ROBOTICS

๐Ÿ•’ Date: 2025-11-04 | โฑ๏ธ Read time: 9 min read

Explore how to train a humanoid robot using Python and AI. This guide covers the application of 3D simulations and Reinforcement Learning, leveraging powerful tools like the MuJoCo physics engine and the Gym toolkit to create and manage sophisticated learning environments for robotics.

#AI #Robotics #Python #ReinforcementLearning #MachineLearning
โค1