Data Science Machine Learning Data Analysis
38.8K subscribers
3.66K photos
31 videos
39 files
1.28K links
ads: @HusseinSheikho

This channel is for Programmers, Coders, Software Engineers.

1- Data Science
2- Machine Learning
3- Data Visualization
4- Artificial Intelligence
5- Data Analysis
6- Statistics
7- Deep Learning
Download Telegram
fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots

• Plot on a specific subplot (Axes object).
axes[0, 0].plot(x, np.sin(x))

• Set the title for a specific subplot.
axes[0, 0].set_title('Subplot 1')

• Set labels for a specific subplot.
axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')

• Add a legend to a specific subplot.
axes[0, 0].legend(['Sine'])

• Add a main title for the entire figure.
fig.suptitle('Main Figure Title')

• Automatically adjust subplot parameters for a tight layout.
plt.tight_layout()

• Share x or y axes between subplots.
fig, axes = plt.subplots(2, 1, sharex=True)

• Get the current Axes instance.
ax = plt.gca()

• Create a second y-axis that shares the x-axis.
ax2 = ax.twinx()


VI. Specialized Plots

• Create a contour plot.
X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)

• Create a filled contour plot.
plt.contourf(X, Y, Z)

• Create a stream plot for vector fields.
U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)

• Create a 3D surface plot.
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)


#Python #Matplotlib #DataVisualization #DataScience #Plotting

━━━━━━━━━━━━━━━
By: @DataScienceM
• Group data by a column.
df.groupby('col1')

• Group by a column and get the sum.
df.groupby('col1').sum()

• Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])

• Get the size of each group.
df.groupby('col1').size()

• Get the frequency counts of unique values in a Series.
df['col1'].value_counts()

• Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])


VI. Merging, Joining & Concatenating

• Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')

• Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows

• Join DataFrames on their indexes.
left_df.join(right_df, how='outer')


VII. Input & Output

• Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)

• Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')

• Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')

• Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)


VIII. Time Series & Special Operations

• Use the string accessor (.str) for Series operations.
s.str.lower()
s.str.contains('pattern')

• Use the datetime accessor (.dt) for Series operations.
s.dt.year
s.dt.day_name()

• Create a rolling window calculation.
df['col1'].rolling(window=3).mean()

• Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')


#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM
6👍1🔥1
📌 NumPy for Absolute Beginners: A Project-Based Approach to Data Analysis

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-04 | ⏱️ Read time: 14 min read

Master NumPy for data analysis with this project-based guide for absolute beginners. Learn to build a high-performance sensor data pipeline from scratch and unlock the true speed of Python for data-intensive applications.

#NumPy #Python #DataAnalysis #DataScience
📌 Why Nonparametric Models Deserve a Second Look

🗂 Category: MACHINE LEARNING

🕒 Date: 2025-11-05 | ⏱️ Read time: 7 min read

Nonparametric models offer a powerful, unified framework for regression, classification, and synthetic data generation. By leveraging nonparametric conditional distributions, these methods provide significant flexibility because they don't require pre-defining a specific functional form for the data. This adaptability makes them highly effective for capturing complex patterns and relationships that might be missed by traditional models. It's time for data professionals to reconsider the unique advantages of these assumption-free techniques for modern machine learning challenges.

#NonparametricModels #MachineLearning #DataScience #Statistics
📌 Evaluating Synthetic Data — The Million Dollar Question

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-07 | ⏱️ Read time: 13 min read

How can you trust your synthetic data? Answering this "million dollar question" is crucial for any AI/ML project. This article details a straightforward method for evaluating synthetic data quality: the Maximum Similarity Test. Learn how this simple test can help you measure how well your generated data mirrors real-world information, building confidence in your models and ensuring the reliability of your results.

#SyntheticData #DataScience #MachineLearning #DataQuality
📌 Power Analysis in Marketing: A Hands-On Introduction

🗂 Category: STATISTICS

🕒 Date: 2025-11-08 | ⏱️ Read time: 18 min read

Dive into the fundamentals of power analysis for marketing. This hands-on introduction demystifies statistical power, explaining what it is and demonstrating how to compute it. Understand why power is crucial for reliable A/B testing and campaign analysis, and learn to strengthen your experimental design. This is the first part of a practical series for data-driven professionals.

#PowerAnalysis #MarketingAnalytics #DataScience #Statistics
📌 LLM-Powered Time-Series Analysis

🗂 Category: LARGE LANGUAGE MODELS

🕒 Date: 2025-11-09 | ⏱️ Read time: 9 min read

Explore the next frontier of time-series analysis by leveraging the power of Large Language Models. This article, the second in a series, delves into practical prompting strategies for advanced model development. Learn how to effectively guide LLMs to build more sophisticated and accurate forecasting and analysis solutions, moving beyond basic applications to unlock new capabilities in this critical data science domain.

#LLMs #TimeSeriesAnalysis #PromptEngineering #DataScience #AI
2
Python tip:
Use np.polyval() to evaluate a polynomial at specific values.

import numpy as np
poly_coeffs = np.array([3, 0, 1]) # Represents 3x^2 + 0x + 1
x_values = np.array([0, 1, 2])
y_values = np.polyval(poly_coeffs, x_values)
print(y_values) # Output: [ 1 4 13] (3*0^2+1, 3*1^2+1, 3*2^2+1)


Python tip:
Use np.polyfit() to find the coefficients of a polynomial that best fits a set of data points.

import numpy as np
x = np.array([0, 1, 2, 3])
y = np.array([0, 0.8, 0.9, 0.1])
coefficients = np.polyfit(x, y, 2) # Fit a 2nd degree polynomial
print(coefficients)


Python tip:
Use np.clip() to limit values in an array to a specified range, as an instance method.

import numpy as np
arr = np.array([1, 10, 3, 15, 6])
clipped_arr = arr.clip(min=3, max=10)
print(clipped_arr)


Python tip:
Use np.squeeze() to remove single-dimensional entries from the shape of an array.

import numpy as np
arr = np.zeros((1, 3, 1, 4))
squeezed_arr = np.squeeze(arr) # Removes axes of length 1
print(squeezed_arr.shape) # Output: (3, 4)


Python tip:
Create a new array with an inserted axis using np.expand_dims().

import numpy as np
arr = np.array([1, 2, 3]) # Shape (3,)
expanded_arr = np.expand_dims(arr, axis=0) # Add a new axis at position 0
print(expanded_arr.shape) # Output: (1, 3)


Python tip:
Use np.ptp() (peak-to-peak) to find the range (max - min) of an array.

import numpy as np
arr = np.array([1, 5, 2, 8, 3])
peak_to_peak = np.ptp(arr)
print(peak_to_peak) # Output: 7 (8 - 1)


Python tip:
Use np.prod() to calculate the product of array elements.

import numpy as np
arr = np.array([1, 2, 3, 4])
product = np.prod(arr)
print(product) # Output: 24 (1 * 2 * 3 * 4)


Python tip:
Use np.allclose() to compare two arrays for equality within a tolerance.

import numpy as np
a = np.array([1.0, 2.0])
b = np.array([1.00000000001, 2.0])
print(np.allclose(a, b)) # Output: True


Python tip:
Use np.array_split() to split an array into N approximately equal sub-arrays.

import numpy as np
arr = np.arange(7)
split_arr = np.array_split(arr, 3) # Split into 3 parts
print(split_arr)


#NumPyTips #PythonNumericalComputing #ArrayManipulation #DataScience #MachineLearning #PythonTips #NumPyForBeginners #Vectorization #LinearAlgebra #StatisticalAnalysis

━━━━━━━━━━━━━━━
By: @DataScienceM
📌 Does More Data Always Yield Better Performance?

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-10 | ⏱️ Read time: 9 min read

Exploring and challenging the conventional wisdom of “more data → better performance” by experimenting with…

#DataScience #AI #Python
1
📌 The Three Ages of Data Science: When to Use Traditional Machine Learning, Deep Learning, or an LLM (Explained with One Example)

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-11 | ⏱️ Read time: 10 min read

This article charts the evolution of the data scientist's role through three distinct eras: traditional machine learning, deep learning, and the current age of large language models (LLMs). Using a single, practical use case, it illustrates how the approach to problem-solving has shifted with each technological generation. The piece serves as a guide for practitioners, clarifying when to leverage classic algorithms, complex neural networks, or the latest foundation models, helping them select the most appropriate tool for the task at hand.

#DataScience #MachineLearning #DeepLearning #LLM
📌 How to Build Agents with GPT-5

🗂 Category: AGENTIC AI

🕒 Date: 2025-11-11 | ⏱️ Read time: 8 min read

Learn how to use GPT-5 as a powerful AI Agent on your data.

#DataScience #AI #Python