π Ultimate Guide to Graph Neural Networks (GNNs): Part 4 β GNN Training Dynamics, Optimization Challenges, and Scalability Solutions
Duration: ~45 minutes reading time | Comprehensive guide to training GNNs effectively at scale
Part 4-A: https://hackmd.io/@husseinsheikho/GNN4-A
Part4-B: https://hackmd.io/@husseinsheikho/GNN4-B
Duration: ~45 minutes reading time | Comprehensive guide to training GNNs effectively at scale
Part 4-A: https://hackmd.io/@husseinsheikho/GNN4-A
Part4-B: https://hackmd.io/@husseinsheikho/GNN4-B
#GraphNeuralNetworks #GNN #MachineLearning #DeepLearning #AI #NeuralNetworks #DataScience #GraphTheory #ArtificialIntelligence #PyTorchGeometric #GNNOptimization #ScalableGNNs #TrainingDynamics #AIforBeginners #AdvancedAI
βοΈ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBkπ± Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
β€4π1
π Ultimate Guide to Graph Neural Networks (GNNs): Part 5 β GNN Applications Across Domains: Real-World Impact in 30 Minutes
Duration: ~30 minutes reading time | Practical guide to GNN applications with concrete ROI metrics
Link: https://hackmd.io/@husseinsheikho/GNN-5
Duration: ~30 minutes reading time | Practical guide to GNN applications with concrete ROI metrics
Link: https://hackmd.io/@husseinsheikho/GNN-5
#GraphNeuralNetworks #GNN #MachineLearning #DeepLearning #AI #NeuralNetworks #DataScience #GraphTheory #ArtificialIntelligence #RealWorldApplications #HealthcareAI #FinTech #DrugDiscovery #RecommendationSystems #ClimateAI
βοΈ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBkπ± Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
β€5
π Ultimate Guide to Graph Neural Networks (GNNs): Part 6 β Advanced Frontiers, Ethics, and Future Directions
Duration: ~50 minutes reading time | Cutting-edge insights on where GNNs are headed
Let's read: https://hackmd.io/@husseinsheikho/GNN-6
Duration: ~50 minutes reading time | Cutting-edge insights on where GNNs are headed
Let's read: https://hackmd.io/@husseinsheikho/GNN-6
#GraphNeuralNetworks #GNN #MachineLearning #DeepLearning #AI #NeuralNetworks #DataScience #GraphTheory #ArtificialIntelligence #FutureOfGNNs #EmergingResearch #EthicalAI #GNNBestPractices #AdvancedAI #50MinuteRead
βοΈ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBkπ± Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
β€4
π Ultimate Guide to Graph Neural Networks (GNNs): Part 7 β Advanced Implementation, Multimodal Integration, and Scientific Applications
Duration: ~60 minutes reading time | Deep dive into cutting-edge GNN implementations and applications
Read: https://hackmd.io/@husseinsheikho/GNN7
βοΈ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk
Duration: ~60 minutes reading time | Deep dive into cutting-edge GNN implementations and applications
Read: https://hackmd.io/@husseinsheikho/GNN7
#GraphNeuralNetworks #GNN #MachineLearning #DeepLearning #AI #NeuralNetworks #DataScience #GraphTheory #ArtificialIntelligence #AdvancedGNNs #MultimodalLearning #ScientificAI #GNNImplementation #60MinuteRead
Please open Telegram to view this post
VIEW IN TELEGRAM
β€2
PyTorch Masterclass: Part 1 β Foundations of Deep Learning with PyTorch
Duration: ~120 minutes
Link: https://hackmd.io/@husseinsheikho/pytorch-1
https://t.iss.one/DataScienceMπ°
Duration: ~120 minutes
Link: https://hackmd.io/@husseinsheikho/pytorch-1
#PyTorch #DeepLearning #MachineLearning #AI #NeuralNetworks #DataScience #Python #Tensors #Autograd #Backpropagation #GradientDescent #AIForBeginners #PyTorchTutorial #MachineLearningEngineer
https://t.iss.one/DataScienceM
Please open Telegram to view this post
VIEW IN TELEGRAM
β€7
Best Practice for R :: Cheat Sheet
More: https://github.com/wurli/r-best-practice
#rstats #stats #datascience
https://t.iss.one/DataScienceMπ
More: https://github.com/wurli/r-best-practice
#rstats #stats #datascience
https://t.iss.one/DataScienceM
Please open Telegram to view this post
VIEW IN TELEGRAM
β€4π₯4
β¨ Object Tracking with YOLOv8 and Python β¨
π Table of Contents Object Tracking with YOLOv8 and Python YOLOv8: Reliable Object Detection and Tracking Understanding YOLOv8 Architecture Mosaic Data Augmentation Anchor-Free Detection C2f (Coarse-to-Fine) Module Decoupled Head Loss Object Detection and Tracking with YOLOv8 Object Detection Object T...
π·οΈ #AdvancedComputerVision #DataScience #DeepLearning #MachineLearning #ObjectDetection #ObjectTracking #ProgrammingTutorials #Tutorial #VideoObjectTracking #YOLO
π Table of Contents Object Tracking with YOLOv8 and Python YOLOv8: Reliable Object Detection and Tracking Understanding YOLOv8 Architecture Mosaic Data Augmentation Anchor-Free Detection C2f (Coarse-to-Fine) Module Decoupled Head Loss Object Detection and Tracking with YOLOv8 Object Detection Object T...
π·οΈ #AdvancedComputerVision #DataScience #DeepLearning #MachineLearning #ObjectDetection #ObjectTracking #ProgrammingTutorials #Tutorial #VideoObjectTracking #YOLO
π€π§ Master Machine Learning: Explore the Ultimate βMachine-Learning-Tutorialsβ Repository
ποΈ 23 Oct 2025
π AI News & Trends
In todayβs data-driven world, Machine Learning (ML) has become the cornerstone of modern technology from intelligent chatbots to predictive analytics and recommendation systems. However, mastering ML isnβt just about coding, it requires a structured understanding of algorithms, statistics, optimization techniques and real-world problem-solving. Thatβs where Ujjwal Karnβs Machine-Learning-Tutorials GitHub repository stands out. This open-source, topic-wise ...
#MachineLearning #MLTutorials #ArtificialIntelligence #DataScience #OpenSource #AIEducation
ποΈ 23 Oct 2025
π AI News & Trends
In todayβs data-driven world, Machine Learning (ML) has become the cornerstone of modern technology from intelligent chatbots to predictive analytics and recommendation systems. However, mastering ML isnβt just about coding, it requires a structured understanding of algorithms, statistics, optimization techniques and real-world problem-solving. Thatβs where Ujjwal Karnβs Machine-Learning-Tutorials GitHub repository stands out. This open-source, topic-wise ...
#MachineLearning #MLTutorials #ArtificialIntelligence #DataScience #OpenSource #AIEducation
β€2
# Real-World Case Study: E-commerce Product Pipeline
import boto3
from PIL import Image
import io
def process_product_image(s3_bucket, s3_key):
# 1. Download from S3
s3 = boto3.client('s3')
response = s3.get_object(Bucket=s3_bucket, Key=s3_key)
img = Image.open(io.BytesIO(response['Body'].read()))
# 2. Standardize dimensions
img = img.convert("RGB")
img = img.resize((1200, 1200), Image.LANCZOS)
# 3. Remove background (simplified)
# In practice: use rembg or AWS Rekognition
img = remove_background(img)
# 4. Generate variants
variants = {
"web": img.resize((800, 800)),
"mobile": img.resize((400, 400)),
"thumbnail": img.resize((100, 100))
}
# 5. Upload to CDN
for name, variant in variants.items():
buffer = io.BytesIO()
variant.save(buffer, "JPEG", quality=95)
s3.upload_fileobj(
buffer,
"cdn-bucket",
f"products/{s3_key.split('/')[-1].split('.')[0]}_{name}.jpg",
ExtraArgs={'ContentType': 'image/jpeg', 'CacheControl': 'max-age=31536000'}
)
# 6. Generate WebP version
webp_buffer = io.BytesIO()
img.save(webp_buffer, "WEBP", quality=85)
s3.upload_fileobj(webp_buffer, "cdn-bucket", f"products/{s3_key.split('/')[-1].split('.')[0]}.webp")
process_product_image("user-uploads", "products/summer_dress.jpg")
By: @DataScienceM π
#Python #ImageProcessing #ComputerVision #Pillow #OpenCV #MachineLearning #CodingInterview #DataScience #Programming #TechJobs #DeveloperTips #AI #DeepLearning #CloudComputing #Docker #BackendDevelopment #SoftwareEngineering #CareerGrowth #TechTips #Python3
β€1
π€π§ PandasAI: Transforming Data Analysis with Conversational Artificial Intelligence
ποΈ 28 Oct 2025
π AI News & Trends
In a world dominated by data, the ability to analyze and interpret information efficiently has become a core competitive advantage. From business intelligence dashboards to large-scale machine learning models, data-driven decision-making fuels innovation across industries. Yet, for most people, data analysis remains a technical challenge requiring coding expertise, statistical knowledge and familiarity with libraries like ...
#PandasAI #ConversationalAI #DataAnalysis #ArtificialIntelligence #DataScience #MachineLearning
ποΈ 28 Oct 2025
π AI News & Trends
In a world dominated by data, the ability to analyze and interpret information efficiently has become a core competitive advantage. From business intelligence dashboards to large-scale machine learning models, data-driven decision-making fuels innovation across industries. Yet, for most people, data analysis remains a technical challenge requiring coding expertise, statistical knowledge and familiarity with libraries like ...
#PandasAI #ConversationalAI #DataAnalysis #ArtificialIntelligence #DataScience #MachineLearning
β€1
π€π§ Microsoft Data Formulator: Revolutionizing AI-Powered Data Visualization
ποΈ 28 Oct 2025
π AI News & Trends
In todayβs data-driven world, visualization is everything. Whether youβre a business analyst, data scientist or researcher, the ability to convert raw data into meaningful visuals can define the success of your decisions. Thatβs where Microsoftβs Data Formulator steps in a cutting-edge, open-source platform designed to empower analysts to create rich, AI-assisted visualizations effortlessly. Developed by ...
#Microsoft #DataVisualization #AI #DataScience #OpenSource #Analytics
ποΈ 28 Oct 2025
π AI News & Trends
In todayβs data-driven world, visualization is everything. Whether youβre a business analyst, data scientist or researcher, the ability to convert raw data into meaningful visuals can define the success of your decisions. Thatβs where Microsoftβs Data Formulator steps in a cutting-edge, open-source platform designed to empower analysts to create rich, AI-assisted visualizations effortlessly. Developed by ...
#Microsoft #DataVisualization #AI #DataScience #OpenSource #Analytics
π‘ Python: Simple K-Means Clustering Project
K-Means is a popular unsupervised machine learning algorithm used to partition
Code explanation: This script loads the Iris dataset, scales its features using
#Python #MachineLearning #KMeans #Clustering #DataScience
βββββββββββββββ
By: @DataScienceM β¨
K-Means is a popular unsupervised machine learning algorithm used to partition
n observations into k clusters, where each observation belongs to the cluster with the nearest mean (centroid). This simple project demonstrates K-Means on the classic Iris dataset using scikit-learn to group similar flower species based on their measurements.import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np
# 1. Load the Iris dataset
iris = load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # True labels (0, 1, 2 for different species) - not used by KMeans
# 2. (Optional but recommended) Scale the features
# K-Means is sensitive to the scale of features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# 3. Define and train the K-Means model
# We know there are 3 species in Iris, so we set n_clusters=3
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10) # n_init is important for robust results
kmeans.fit(X_scaled)
# 4. Get the cluster assignments for each data point
labels = kmeans.labels_
# 5. Get the coordinates of the cluster centroids
centroids = kmeans.cluster_centers_
# 6. Visualize the clusters (using first two features for simplicity)
plt.figure(figsize=(8, 6))
# Plot each cluster
colors = ['red', 'green', 'blue']
for i in range(3):
plt.scatter(X_scaled[labels == i, 0], X_scaled[labels == i, 1],
s=50, c=colors[i], label=f'Cluster {i+1}', alpha=0.7)
# Plot the centroids
plt.scatter(centroids[:, 0], centroids[:, 1],
s=200, marker='X', c='black', label='Centroids', edgecolor='white')
plt.title('K-Means Clustering on Iris Dataset (Scaled Features)')
plt.xlabel('Scaled Sepal Length')
plt.ylabel('Scaled Sepal Width')
plt.legend()
plt.grid(True)
plt.show()
# You can also compare with true labels (for evaluation, not part of clustering process itself)
# print("True labels:", y)
# print("K-Means labels:", labels)
Code explanation: This script loads the Iris dataset, scales its features using
StandardScaler, and then applies KMeans to group the data into 3 clusters. It visualizes the resulting clusters and their centroids using a scatter plot with the first two scaled features.#Python #MachineLearning #KMeans #Clustering #DataScience
βββββββββββββββ
By: @DataScienceM β¨
π€π§ MLOps Basics: A Complete Guide to Building, Deploying and Monitoring Machine Learning Models
ποΈ 30 Oct 2025
π AI News & Trends
Machine Learning models are powerful but building them is only half the story. The true challenge lies in deploying, scaling and maintaining these models in production environments β a process that requires collaboration between data scientists, developers and operations teams. This is where MLOps (Machine Learning Operations) comes in. MLOps combines the principles of DevOps ...
#MLOps #MachineLearning #DevOps #ModelDeployment #DataScience #ProductionAI
ποΈ 30 Oct 2025
π AI News & Trends
Machine Learning models are powerful but building them is only half the story. The true challenge lies in deploying, scaling and maintaining these models in production environments β a process that requires collaboration between data scientists, developers and operations teams. This is where MLOps (Machine Learning Operations) comes in. MLOps combines the principles of DevOps ...
#MLOps #MachineLearning #DevOps #ModelDeployment #DataScience #ProductionAI
π‘ Pandas Cheatsheet
A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.
1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.
β’ A dictionary is defined where keys become column names and values become the data in those columns.
2. Selecting Data with
Use
β’
β’
3. Filtering Data
Select subsets of data based on conditions.
β’ The expression
β’ Using this Series as an index
4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.
β’
β’
#Python #Pandas #DataAnalysis #DataScience #Programming
βββββββββββββββ
By: @DataScienceM β¨
A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.
1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 32, 28],
'City': ['New York', 'Paris', 'New York']}
df = pd.DataFrame(data)
print(df)
# Name Age City
# 0 Alice 25 New York
# 1 Bob 32 Paris
# 2 Charlie 28 New York
β’ A dictionary is defined where keys become column names and values become the data in those columns.
pd.DataFrame() converts it into a tabular structure.2. Selecting Data with
.loc and .ilocUse
.loc for label-based selection and .iloc for integer-position based selection.# Select the first row by its integer position (0)
print(df.iloc[0])
# Select the row with index label 1 and only the 'Name' column
print(df.loc[1, 'Name'])
# Output for df.iloc[0]:
# Name Alice
# Age 25
# City New York
# Name: 0, dtype: object
#
# Output for df.loc[1, 'Name']:
# Bob
β’
.iloc[0] gets all data from the row at index position 0.β’
.loc[1, 'Name'] gets the data at the intersection of index label 1 and column label 'Name'.3. Filtering Data
Select subsets of data based on conditions.
# Select rows where Age is greater than 27
filtered_df = df[df['Age'] > 27]
print(filtered_df)
# Name Age City
# 1 Bob 32 Paris
# 2 Charlie 28 New York
β’ The expression
df['Age'] > 27 creates a boolean Series (True/False).β’ Using this Series as an index
df[...] returns only the rows where the value was True.4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.
# Group by 'City' and calculate the mean age for each city
city_ages = df.groupby('City')['Age'].mean()
print(city_ages)
# City
# New York 26.5
# Paris 32.0
# Name: Age, dtype: float64
β’
.groupby('City') splits the DataFrame into groups based on unique city values.β’
['Age'].mean() then calculates the mean of the 'Age' column for each of these groups.#Python #Pandas #DataAnalysis #DataScience #Programming
βββββββββββββββ
By: @DataScienceM β¨
β€1π1
π‘ SciPy: Scientific Computing in Python
SciPy is a fundamental library for scientific and technical computing in Python. Built on NumPy, it provides a wide range of user-friendly and efficient numerical routines for tasks like optimization, integration, linear algebra, and statistics.
β’ Optimization:
β’ We provide the function (
β’ The result object (
β’ Numerical Integration:
β’ It returns a tuple containing the integral result and an estimate of the absolute error.
β’ Linear Algebra:
β’
β’ Statistics:
β’
β’ The p-value helps determine if the difference between sample means is statistically significant (a low p-value, e.g., < 0.05, suggests it is).
#SciPy #Python #DataScience #ScientificComputing #Statistics
βββββββββββββββ
By: @DataScienceM β¨
SciPy is a fundamental library for scientific and technical computing in Python. Built on NumPy, it provides a wide range of user-friendly and efficient numerical routines for tasks like optimization, integration, linear algebra, and statistics.
import numpy as np
from scipy.optimize import minimize
# Define a function to minimize: f(x) = (x - 3)^2
def f(x):
return (x - 3)**2
# Find the minimum of the function with an initial guess
res = minimize(f, x0=0)
print(f"Minimum found at x = {res.x[0]:.4f}")
# Output:
# Minimum found at x = 3.0000
β’ Optimization:
scipy.optimize.minimize is used to find the minimum value of a function.β’ We provide the function (
f) and an initial guess (x0=0).β’ The result object (
res) contains the solution in the .x attribute.from scipy.integrate import quad
# Define the function to integrate: f(x) = sin(x)
def integrand(x):
return np.sin(x)
# Integrate sin(x) from 0 to pi
result, error = quad(integrand, 0, np.pi)
print(f"Integral result: {result:.4f}")
print(f"Estimated error: {error:.2e}")
# Output:
# Integral result: 2.0000
# Estimated error: 2.22e-14
β’ Numerical Integration:
scipy.integrate.quad calculates the definite integral of a function over a given interval.β’ It returns a tuple containing the integral result and an estimate of the absolute error.
from scipy.linalg import solve
# Solve the linear system Ax = b
# 3x + 2y = 12
# x - y = 1
A = np.array([[3, 2], [1, -1]])
b = np.array([12, 1])
solution = solve(A, b)
print(f"Solution (x, y): {solution}")
# Output:
# Solution (x, y): [2.8 1.8]
β’ Linear Algebra:
scipy.linalg provides more advanced linear algebra routines than NumPy.β’
solve(A, b) efficiently finds the solution vector x for a system of linear equations defined by a matrix A and a vector b.from scipy import stats
# Create two independent samples
sample1 = np.random.normal(loc=5, scale=2, size=100)
sample2 = np.random.normal(loc=5.5, scale=2, size=100)
# Perform an independent t-test
t_stat, p_value = stats.ttest_ind(sample1, sample2)
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
# Output (will vary):
# T-statistic: -1.7432
# P-value: 0.0829
β’ Statistics:
scipy.stats is a powerful module for statistical analysis.β’
ttest_ind calculates the T-test for the means of two independent samples.β’ The p-value helps determine if the difference between sample means is statistically significant (a low p-value, e.g., < 0.05, suggests it is).
#SciPy #Python #DataScience #ScientificComputing #Statistics
βββββββββββββββ
By: @DataScienceM β¨
β€3
#Pandas #DataAnalysis #Python #DataScience #Tutorial
Top 30 Pandas Functions & Methods
This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.
---
1.
Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.
---
2.
Creates a new Series (a 1D labeled array).
---
3.
Reads data from a CSV file into a DataFrame. (Assuming a file
---
4.
Writes a DataFrame to a CSV file.
#PandasIO #DataFrame #Series
---
5.
Returns the first
---
6.
Returns the last
---
7.
Provides a concise summary of the DataFrame, including data types and non-null values.
---
8.
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.
#DataInspection #PandasBasics
---
9.
Generates descriptive statistics for numerical columns (count, mean, std, min, max, etc.).
Top 30 Pandas Functions & Methods
This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.
---
1.
pd.DataFrame()Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.
import pandas as pd
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
print(df)
col1 col2
0 1 3
1 2 4
---
2.
pd.Series()Creates a new Series (a 1D labeled array).
import pandas as pd
s = pd.Series([10, 20, 30, 40], name='MyNumbers')
print(s)
0 10
1 20
2 30
3 40
Name: MyNumbers, dtype: int64
---
3.
pd.read_csv()Reads data from a CSV file into a DataFrame. (Assuming a file
data.csv exists).# Create a dummy csv file first
with open('data.csv', 'w') as f:
f.write('Name,Age\nAlice,25\nBob,30')
df = pd.read_csv('data.csv')
print(df)
Name Age
0 Alice 25
1 Bob 30
---
4.
df.to_csv()Writes a DataFrame to a CSV file.
import pandas as pd
df = pd.DataFrame({'Name': ['Charlie'], 'Age': [35]})
# index=False prevents writing the DataFrame index to the file
df.to_csv('output.csv', index=False)
# You can check that 'output.csv' has been created.
print("File 'output.csv' created.")
File 'output.csv' created.
#PandasIO #DataFrame #Series
---
5.
df.head()Returns the first
n rows of the DataFrame (default is 5).import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.head(3))
Name Value
0 A 1
1 B 2
2 C 3
---
6.
df.tail()Returns the last
n rows of the DataFrame (default is 5).import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.tail(2))
Name Value
4 E 5
5 F 6
---
7.
df.info()Provides a concise summary of the DataFrame, including data types and non-null values.
import pandas as pd
import numpy as np
data = {'col1': [1, 2, 3], 'col2': [4.0, 5.0, np.nan], 'col3': ['A', 'B', 'C']}
df = pd.DataFrame(data)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null int64
1 col2 2 non-null float64
2 col3 3 non-null object
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
---
8.
df.shapeReturns a tuple representing the dimensionality (rows, columns) of the DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)
(2, 3)
#DataInspection #PandasBasics
---
9.
df.describe()Generates descriptive statistics for numerical columns (count, mean, std, min, max, etc.).
import pandas as pd
df = pd.DataFrame({'Age': [22, 38, 26, 35, 29]})
print(df.describe())
β€2
Top 100 Data Analyst Interview Questions & Answers
#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience
Part 1: SQL Questions (Q1-30)
#1. What is the difference between
A:
β’
β’
β’
#2. Select all unique departments from the
A: Use the
#3. Find the top 5 highest-paid employees.
A: Use
#4. What is the difference between
A:
β’
β’
#5. What are the different types of SQL joins?
A:
β’
β’
β’
β’
β’
#6. Write a query to find the second-highest salary.
A: Use
#7. Find duplicate emails in a
A: Group by the email column and use
#8. What is a primary key vs. a foreign key?
A:
β’ A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
β’ A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.
#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.
#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a
#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience
Part 1: SQL Questions (Q1-30)
#1. What is the difference between
DELETE, TRUNCATE, and DROP?A:
β’
DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.β’
TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.β’
DROP is a DDL command that removes the entire table, including its structure, data, and indexes.#2. Select all unique departments from the
employees table.A: Use the
DISTINCT keyword.SELECT DISTINCT department
FROM employees;
#3. Find the top 5 highest-paid employees.
A: Use
ORDER BY and LIMIT.SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;
#4. What is the difference between
WHERE and HAVING?A:
β’
WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).β’
HAVING is used to filter groups after aggregations (GROUP BY) have been performed.-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;
#5. What are the different types of SQL joins?
A:
β’
(INNER) JOIN: Returns records that have matching values in both tables.β’
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.β’
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.β’
FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.β’
SELF JOIN: A regular join, but the table is joined with itself.#6. Write a query to find the second-highest salary.
A: Use
OFFSET or a subquery.-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
#7. Find duplicate emails in a
customers table.A: Group by the email column and use
HAVING to find groups with a count greater than 1.SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;
#8. What is a primary key vs. a foreign key?
A:
β’ A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
β’ A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.
#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.
-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a
SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.