Machine Learning
39.2K subscribers
3.83K photos
32 videos
41 files
1.3K links
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
• Get raw audio data as a NumPy array.
import numpy as np
samples = np.array(audio.get_array_of_samples())

• Create a Pydub segment from a NumPy array.
new_audio = AudioSegment(
samples.tobytes(),
frame_rate=audio.frame_rate,
sample_width=audio.sample_width,
channels=audio.channels
)

• Read a WAV file directly into a NumPy array.
from scipy.io.wavfile import read
rate, data = read("sound.wav")

• Write a NumPy array to a WAV file.
from scipy.io.wavfile import write
write("new_sound.wav", rate, data)

• Generate a sine wave.
import numpy as np
sample_rate = 44100
frequency = 440 # A4 note
duration = 5
t = np.linspace(0., duration, int(sample_rate * duration))
amplitude = np.iinfo(np.int16).max * 0.5
data = amplitude * np.sin(2. * np.pi * frequency * t)
# This array can now be written to a file


VIII. Audio Analysis with Librosa

• Load audio with Librosa.
import librosa
y, sr = librosa.load("sound.mp3")

• Estimate tempo (Beats Per Minute).
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

• Get beat event times in seconds.
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

• Decompose into harmonic and percussive components.
y_harmonic, y_percussive = librosa.effects.hpss(y)

• Compute a spectrogram.
import numpy as np
D = librosa.stft(y)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

• Compute Mel-Frequency Cepstral Coefficients (MFCCs).
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

• Compute Chroma features (related to musical pitch).
chroma = librosa.feature.chroma_stft(y=y, sr=sr)

• Detect onset events (the start of notes).
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
onset_times = librosa.frames_to_time(onset_frames, sr=sr)

• Pitch shifting.
y_pitched = librosa.effects.pitch_shift(y, sr=sr, n_steps=4) # Shift up 4 semitones

• Time stretching (change speed without changing pitch).
y_fast = librosa.effects.time_stretch(y, rate=2.0) # Double speed


IX. More Utilities

• Detect leading silence.
from pydub.silence import detect_leading_silence
trim_ms = detect_leading_silence(audio)
trimmed_audio = audio[trim_ms:]

• Get the root mean square (RMS) energy.
rms = audio.rms

• Get the maximum possible RMS for the audio format.
max_possible_rms = audio.max_possible_amplitude

• Find the loudest section of an audio file.
from pydub.scipy_effects import normalize
loudest_part = normalize(audio.strip_silence(silence_len=1000, silence_thresh=-32))

• Change the frame rate (resample).
resampled = audio.set_frame_rate(16000)

• Create a simple band-pass filter.
from pydub.scipy_effects import band_pass_filter
filtered = band_pass_filter(audio, 400, 2000) # Pass between 400Hz and 2000Hz

• Convert file format in one line.
AudioSegment.from_file("music.ogg").export("music.mp3", format="mp3")

• Get the raw bytes of the audio data.
raw_data = audio.raw_data

• Get the maximum amplitude.
max_amp = audio.max

• Match the volume of two segments.
matched_audio2 = audio2.apply_gain(audio1.dBFS - audio2.dBFS)


#Python #AudioProcessing #Pydub #Librosa #SignalProcessing

━━━━━━━━━━━━━━━
By: @DataScienceM
3
📌 The Pearson Correlation Coefficient, Explained Simply

🗂 Category: STATISTICS

🕒 Date: 2025-11-01 | ⏱️ Read time: 7 min read

A simple explanation of the Pearson correlation coefficient with examples
2
📌 Graph RAG vs SQL RAG

🗂 Category: LARGE LANGUAGE MODELS

🕒 Date: 2025-11-01 | ⏱️ Read time: 7 min read

Evaluating RAGs on graph and SQL databases
📌 Understanding the Two Faces of Shiny for Python: Core and Express

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-29 | ⏱️ Read time: 7 min read

Exploring the Differences and Use Cases of Shiny Core and Shiny Express for Python
📌 Do You Need a Degree to Be a Data Scientist?

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-29 | ⏱️ Read time: 8 min read

No, but it certainly helps.
🤖🧠 HunyuanWorld-Mirror: Tencent’s Breakthrough in Universal 3D Reconstruction

🗓️ 03 Nov 2025
📚 AI News & Trends

The race toward achieving universal 3D understanding has reached a significant milestone with Tencent’s HunyuanWorld-Mirror, a cutting-edge open-source model designed to revolutionize 3D reconstruction. In an era dominated by visual intelligence and immersive digital experiences, this new model stands out by offering a feed-forward, geometry-aware framework that can predict multiple 3D outputs in a single ...

#HunyuanWorld #Tencent #3DReconstruction #UniversalAI #GeometryAware #OpenSourceAI
📌 Data Scientists Work in the Cloud. Here’s How to Practice This as a Student (Part 2: Python)

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-29 | ⏱️ Read time: 9 min read

Because data scientists don’t write production code in the Udemy code editor
💡 Top 50 Operations for Signal Processing in Python

Note: Most examples use numpy, scipy.signal, and matplotlib.pyplot. Assume they are imported as:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt

I. Signal Generation

• Create a time vector.
fs = 1000  # Sampling frequency
t = np.linspace(0, 1, fs, endpoint=False)

• Generate a sine wave.
freq = 50 # Hz
sine_wave = np.sin(2 * np.pi * freq * t)

• Generate a square wave.
square_wave = signal.square(2 * np.pi * freq * t)

• Generate a sawtooth wave.
sawtooth_wave = signal.sawtooth(2 * np.pi * freq * t)

• Generate Gaussian white noise.
noise = np.random.normal(0, 1, len(t))

• Generate a frequency-swept cosine (chirp).
chirp_signal = signal.chirp(t, f0=1, f1=100, t1=1, method='linear')

• Generate an impulse signal (unit impulse).
impulse = signal.unit_impulse(100, 'mid') # at index 50 of 100

• Generate a Gaussian pulse.
gaus_pulse = signal.gausspulse(t, fc=5, bw=0.5)


II. Signal Visualization & Properties

• Plot a signal.
plt.plot(t, sine_wave)
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.show()

• Calculate the mean value.
mean_val = np.mean(sine_wave)

• Calculate the Root Mean Square (RMS).
rms_val = np.sqrt(np.mean(sine_wave**2))

• Calculate the standard deviation.
std_dev = np.std(sine_wave)

• Find the maximum value and its index.
max_val = np.max(sine_wave)
max_idx = np.argmax(sine_wave)


III. Frequency Domain Analysis (FFT)

• Compute the Fast Fourier Transform (FFT).
from scipy.fft import fft, fftfreq
yf = fft(sine_wave)

• Get the frequency bins for the FFT.
N = len(sine_wave)
xf = fftfreq(N, 1 / fs)[:N//2]

• Plot the magnitude spectrum.
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.grid()
plt.show()

• Compute the Inverse FFT (IFFT).
from scipy.fft import ifft
original_signal = ifft(yf)

• Compute the Power Spectral Density (PSD) using Welch's method.
f, Pxx_den = signal.welch(sine_wave, fs, nperseg=1024)


IV. Digital Filtering

• Design a Butterworth low-pass filter.
b, a = signal.butter(4, 100, 'low', analog=False, fs=fs)

• Apply a filter to a signal (zero-phase filtering).
noisy_signal = sine_wave + noise
filtered_signal = signal.filtfilt(b, a, noisy_signal)

• Design a Chebyshev Type I high-pass filter.
b, a = signal.cheby1(4, 5, 100, 'high', fs=fs) # 5dB ripple

• Design a Bessel band-pass filter.
b, a = signal.bessel(4, [50, 150], 'band', fs=fs)

• Design an FIR filter using a window method.
numtaps = 101
fir_coeffs = signal.firwin(numtaps, cutoff=100, fs=fs)

• Plot the frequency response of a filter.
w, h = signal.freqz(b, a, fs=fs)
plt.plot(w, 20 * np.log10(abs(h)))

• Apply a median filter (good for salt-and-pepper noise).
median_filtered = signal.medfilt(noisy_signal, kernel_size=3)

• Apply a Wiener filter for noise reduction.
wiener_filtered = signal.wiener(noisy_signal)


V. Resampling & Windowing

• Resample a signal to a new length.
resampled = signal.resample(sine_wave, num=500) # Resample to 500 points

• Decimate a signal (downsample by a factor).
decimated = signal.decimate(sine_wave, q=4) # Downsample by 4

• Create a Hamming window.
window = signal.windows.hamming(51)

• Apply a window to a signal segment.
segment = sine_wave[0:51]
windowed_segment = segment * window


VI. Convolution & Correlation

• Perform linear convolution.
sig1 = np.repeat([0., 1., 0.], 100)
sig2 = np.repeat([0., 1., 1., 0.], 100)
convolved = signal.convolve(sig1, sig2, mode='same')

• Compute cross-correlation.
# Useful for finding delays between signals
correlation = signal.correlate(sig1, sig2, mode='full')

• Compute auto-correlation.
# Useful for finding periodicities in a signal
autocorr = signal.correlate(sine_wave, sine_wave, mode='full')


VII. Time-Frequency Analysis

• Compute and plot a spectrogram.
f, t_spec, Sxx = signal.spectrogram(chirp_signal, fs)
plt.pcolormesh(t_spec, f, Sxx, shading='gouraud')
plt.show()

• Perform Continuous Wavelet Transform (CWT).
widths = np.arange(1, 31)
cwt_matrix = signal.cwt(chirp_signal, signal.ricker, widths)

• Perform Hilbert transform to get the analytic signal.
analytic_signal = signal.hilbert(sine_wave)

• Calculate instantaneous frequency.
instant_phase = np.unwrap(np.angle(analytic_signal))
instant_freq = (np.diff(instant_phase) / (2.0*np.pi) * fs)


VIII. Feature Extraction

• Find peaks in a signal.
peaks, _ = signal.find_peaks(sine_wave, height=0.5)

• Find peaks with prominence criteria.
peaks_prom, _ = signal.find_peaks(noisy_signal, prominence=1)

• Differentiate a signal (e.g., to find velocity from position).
derivative = np.diff(sine_wave)

• Integrate a signal.
from scipy.integrate import cumulative_trapezoid
integral = cumulative_trapezoid(sine_wave, t, initial=0)

• Detrend a signal to remove a linear trend.
trend = np.linspace(0, 1, fs)
trended_signal = sine_wave + trend
detrended = signal.detrend(trended_signal)


IX. System Analysis

• Define a system via a transfer function (numerator, denominator).
# Example: 2nd order low-pass filter
system = signal.TransferFunction([1], [1, 1, 1])

• Compute the step response of a system.
t_step, y_step = signal.step(system)

• Compute the impulse response of a system.
t_impulse, y_impulse = signal.impulse(system)

• Compute the Bode plot of a system's frequency response.
w, mag, phase = signal.bode(system)


X. Signal Generation from Data

• Generate a signal from a function.
t = np.linspace(0, 1, 500)
custom_signal = np.sinc(2 * np.pi * 4 * t)

• Convert a list of values to a signal array.
my_data = [0, 1, 2, 3, 2, 1, 0, -1, -2, -1, 0]
data_signal = np.array(my_data)

• Read signal data from a WAV file.
from scipy.io import wavfile
samplerate, data = wavfile.read('audio.wav')

• Create a pulse train signal.
pulse_train = np.zeros(fs)
pulse_train[::100] = 1 # Impulse every 100 samples


#Python #SignalProcessing #SciPy #NumPy #DSP

━━━━━━━━━━━━━━━
By: @DataScienceM
💡 Top 50 Matplotlib Commands in Python

Note: Examples assume the following imports:
import matplotlib.pyplot as plt
import numpy as np

I. Figure & Basic Plots

• Create a figure.
fig = plt.figure(figsize=(8, 6))

• Create a basic line plot.
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))

• Show/display the plot.
plt.show()

• Save a figure to a file.
plt.savefig("my_plot.png", dpi=300)

• Create a scatter plot.
plt.scatter(x, np.cos(x))

• Create a bar chart.
categories = ['A', 'B', 'C']
values = [3, 7, 2]
plt.bar(categories, values)

• Create a horizontal bar chart.
plt.barh(categories, values)

• Create a histogram.
data = np.random.randn(1000)
plt.hist(data, bins=30)

• Create a pie chart.
plt.pie(values, labels=categories, autopct='%1.1f%%')

• Create a box plot.
plt.boxplot([data, data*2])

• Display a 2D array or image.
matrix = np.random.rand(10, 10)
plt.imshow(matrix, cmap='viridis')

• Clear the current figure.
plt.clf()


II. Labels, Titles & Legends

• Add a title to the plot.
plt.title("Sine Wave")

• Add a label to the x-axis.
plt.xlabel("Time (s)")

• Add a label to the y-axis.
plt.ylabel("Amplitude")

• Add a legend.
plt.plot(x, np.sin(x), label='Sine')
plt.plot(x, np.cos(x), label='Cosine')
plt.legend()

• Add a grid.
plt.grid(True)

• Add text to the plot at specific coordinates.
plt.text(2, 0.5, 'An important point')

• Add an annotation with an arrow.
plt.annotate('Peak', xy=(np.pi/2, 1), xytext=(3, 1.5),
arrowprops=dict(facecolor='black', shrink=0.05))


III. Axes & Ticks

• Set the x-axis limits.
plt.xlim(0, 5)

• Set the y-axis limits.
plt.ylim(-1.5, 1.5)

• Set the x-axis ticks and labels.
plt.xticks([0, np.pi, 2*np.pi], ['0', '$\pi$', '$2\pi$'])

• Set the y-axis ticks and labels.
plt.yticks([-1, 0, 1])

• Set a logarithmic scale on an axis.
plt.yscale('log')

• Set the aspect ratio of the plot.
plt.axis('equal') # Other options: 'tight', 'off'


IV. Plot Customization

• Set the color of a plot.
plt.plot(x, np.sin(x), color='red')

• Set the line style.
plt.plot(x, np.sin(x), linestyle='--')

• Set the line width.
plt.plot(x, np.sin(x), linewidth=3)

• Set the marker style for points.
plt.plot(x, np.sin(x), marker='o')

• Set the transparency (alpha).
plt.hist(data, alpha=0.5)

• Use a predefined style.
plt.style.use('ggplot')

• Fill the area between two curves.
plt.fill_between(x, np.sin(x), np.cos(x), alpha=0.2)

• Create an error bar plot.
y_err = 0.2 * np.ones_like(x)
plt.errorbar(x, np.sin(x), yerr=y_err)

• Add a horizontal line.
plt.axhline(y=0, color='k', linestyle='-')

• Add a vertical line.
plt.axvline(x=np.pi, color='k', linestyle='-')

• Add a colorbar for plots like imshow or scatter.
plt.colorbar(label='Magnitude')


V. Subplots (Object-Oriented Approach)

• Create a figure and a grid of subplots (preferred method).
fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots

• Plot on a specific subplot (Axes object).
axes[0, 0].plot(x, np.sin(x))

• Set the title for a specific subplot.
axes[0, 0].set_title('Subplot 1')

• Set labels for a specific subplot.
axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')

• Add a legend to a specific subplot.
axes[0, 0].legend(['Sine'])

• Add a main title for the entire figure.
fig.suptitle('Main Figure Title')

• Automatically adjust subplot parameters for a tight layout.
plt.tight_layout()

• Share x or y axes between subplots.
fig, axes = plt.subplots(2, 1, sharex=True)

• Get the current Axes instance.
ax = plt.gca()

• Create a second y-axis that shares the x-axis.
ax2 = ax.twinx()


VI. Specialized Plots

• Create a contour plot.
X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)

• Create a filled contour plot.
plt.contourf(X, Y, Z)

• Create a stream plot for vector fields.
U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)

• Create a 3D surface plot.
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)


#Python #Matplotlib #DataVisualization #DataScience #Plotting

━━━━━━━━━━━━━━━
By: @DataScienceM
📌 SQL Explained: Normal Forms

🗂 Category: DATA ENGINEERING

🕒 Date: 2024-05-29 | ⏱️ Read time: 9 min read

Applying 1st, 2nd and 3rd normal forms to a database
📌 Simple Ways to Speed Up Your PyTorch Model Training

🗂 Category: MACHINE LEARNING

🕒 Date: 2024-05-28 | ⏱️ Read time: 12 min read

If all machine learning engineers want one thing, it’s faster model training - maybe after good test…
📌 Fine-Tune Smaller Transformer Models: Text Classification

🗂 Category: MACHINE LEARNING

🕒 Date: 2024-05-28 | ⏱️ Read time: 22 min read

Using Microsoft’s Phi-3 to generate synthetic data
📌 How I Assess the Memory Consumption of My Python Code

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2024-05-28 | ⏱️ Read time: 6 min read

Different approaches to measure the memory consumption of a variable or a function
📌 Scaling Monosemanticity: Anthropic’s One Step Towards Interpretable & Manipulable LLMs

🗂 Category:

🕒 Date: 2024-05-28 | ⏱️ Read time: 13 min read

From prompt engineering to activation engineering for more controllable and safer LLMs
1
🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

🗓️ 04 Nov 2025
📚 AI News & Trends

In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...

#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
📌 Introduction to Domain Adaptation- Motivation, Options, Tradeoffs

🗂 Category:

🕒 Date: 2024-05-28 | ⏱️ Read time: 15 min read

Stepping out of the “comfort zone” – part 1/3 of a deep-dive into domain adaptation…
🔥1
💡 Top 50 Pandas Operations in Python

(Note: Examples assume the import import pandas as pd and import numpy as np)

I. Series & DataFrame Creation

• Create a pandas Series from a list.
s = pd.Series([1, 3, 5, np.nan, 6, 8])

• Create a DataFrame from a dictionary of lists.
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

• Create a DataFrame from a list of dictionaries.
data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)

• Read data from a CSV file.
df = pd.read_csv('my_file.csv')

• Create a date range.
dates = pd.date_range('20230101', periods=6)


II. Data Inspection & Selection

• View the first 5 rows.
df.head()

• View the last 5 rows.
df.tail()

• Get a concise summary of the DataFrame.
df.info()

• Get descriptive statistics for numerical columns.
df.describe()

• Get the dimensions of the DataFrame (rows, columns).
df.shape

• Get the column labels.
df.columns

• Get the index (row labels).
df.index

• Select a single column.
df['col1'] # or df.col1

• Select multiple columns.
df[['col1', 'col2']]

• Select rows by label/index name using .loc.
df.loc[0:2, ['col1']] # Select rows 0,1,2 and column 'col1'

• Select rows by integer position using .iloc.
df.iloc[0:3, 0:1] # Select first 3 rows and first column

• Perform boolean/conditional selection.
df[df['col1'] > 2]

• Filter rows using .isin().
df[df['col1'].isin([1, 3])]


III. Data Cleaning

• Check for missing/null values.
df.isnull().sum() # Returns a Series with counts of nulls per column

• Drop rows with any missing values.
df.dropna()

• Fill missing values with a specific value.
df.fillna(value=0)

• Check for duplicated rows.
df.duplicated()

• Drop duplicated rows.
df.drop_duplicates(inplace=True)


IV. Data Manipulation & Operations

• Drop specified labels (columns or rows).
df.drop('col1', axis=1) # Drop a column

• Rename columns.
df.rename(columns={'col1': 'new_col1_name'})

• Set a column as the index.
df.set_index('col1')

• Reset the index.
df.reset_index(drop=True)

• Apply a function along an axis (e.g., per column).
df.apply(np.cumsum)

• Apply a function element-wise to a Series.
df['col1'].map(lambda x: x*100)

• Sort by values in a column.
df.sort_values(by='col1', ascending=False)

• Sort by index.
df.sort_index(axis=1, ascending=False)

• Change the data type of a column.
df['col1'].astype('float')

• Create a new column based on a calculation.
df['new_col'] = df['col1'] * 2


V. Grouping & Aggregation
🔥1
• Group data by a column.
df.groupby('col1')

• Group by a column and get the sum.
df.groupby('col1').sum()

• Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])

• Get the size of each group.
df.groupby('col1').size()

• Get the frequency counts of unique values in a Series.
df['col1'].value_counts()

• Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])


VI. Merging, Joining & Concatenating

• Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')

• Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows

• Join DataFrames on their indexes.
left_df.join(right_df, how='outer')


VII. Input & Output

• Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)

• Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')

• Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')

• Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)


VIII. Time Series & Special Operations

• Use the string accessor (.str) for Series operations.
s.str.lower()
s.str.contains('pattern')

• Use the datetime accessor (.dt) for Series operations.
s.dt.year
s.dt.day_name()

• Create a rolling window calculation.
df['col1'].rolling(window=3).mean()

• Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')


#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM
6👍1🔥1