Python Data Science Jobs & Interviews
20.4K subscribers
188 photos
4 videos
25 files
327 links
Your go-to hub for Python and Data Science—featuring questions, answers, quizzes, and interview tips to sharpen your skills and boost your career in the data-driven world.

Admin: @Hussein_Sheikho
Download Telegram
NUMPY FOR DS.pdf
4.5 MB
Let's start at the top...

NumPy contains a broad array of functionality for fast numerical & mathematical operations in Python

The core data-structure within #NumPy is an ndArray (or n-dimensional array)

Behind the scenes - much of the NumPy functionality is written in the programming language C

NumPy functionality is used in other popular #Python packages including #Pandas, #Matplotlib, & #scikitlearn!

✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk
Please open Telegram to view this post
VIEW IN TELEGRAM
1👍1
Question 4 (Intermediate):
When working with Pandas in Python, what does the inplace=True parameter do in DataFrame operations?

A) Creates a copy of the DataFrame before applying changes
B) Modifies the original DataFrame directly
C) Saves the results to a CSV file automatically
D) Enables parallel processing for faster execution

#Python #Pandas #DataAnalysis #DataManipulation
🚀 Comprehensive Guide: How to Prepare for a Data Analyst Python Interview – 350 Most Common Interview Questions

Are you ready: https://hackmd.io/@husseinsheikho/pandas-interview

#DataAnalysis #PythonInterview #DataAnalyst #Pandas #NumPy #Matplotlib #Seaborn #SQL #DataCleaning #Visualization #MachineLearning #Statistics #InterviewPrep


✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
3
1. What is the primary data structure in pandas?
2. How do you create a DataFrame from a dictionary?
3. Which method is used to read a CSV file in pandas?
4. What does the head() function do in pandas?
5. How can you check the data types of columns in a DataFrame?
6. Which function drops rows with missing values in pandas?
7. What is the purpose of the merge() function in pandas?
8. How do you filter rows based on a condition in pandas?
9. What does the groupby() method do?
10. How can you sort a DataFrame by a specific column?
11. Which method is used to rename columns in pandas?
12. What is the difference between loc and iloc in pandas?
13. How do you handle duplicate rows in pandas?
14. What function converts a column to datetime format?
15. How do you apply a custom function to a DataFrame?
16. What is the use of the apply() method in pandas?
17. How can you concatenate two DataFrames?
18. What does the pivot_table() function do?
19. How do you calculate summary statistics in pandas?
20. Which method is used to export a DataFrame to a CSV file?

#️⃣ #pandas #dataanalysis #python #dataframe #coding #programming #datascience

By: t.iss.one/DataScienceQ 🚀
#pandas #python #programming #question #dataframe #intermediate

Write a Python program using pandas to perform the following tasks:

1. Create a DataFrame from a dictionary with columns: 'Product', 'Category', 'Price', and 'Quantity' containing:
- Product: ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones']
- Category: ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Accessories']
- Price: [1200, 25, 80, 300, 100]
- Quantity: [10, 50, 30, 20, 40]

2. Add a new column 'Total_Value' that is the product of 'Price' and 'Quantity'.

3. Calculate the total value for each category and print it.

4. Find the product with the highest total value and print its details.

5. Filter the DataFrame to show only products in the 'Electronics' category with a price greater than 200.

import pandas as pd

# 1. Create the DataFrame
data = {
'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones'],
'Category': ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Accessories'],
'Price': [1200, 25, 80, 300, 100],
'Quantity': [10, 50, 30, 20, 40]
}
df = pd.DataFrame(data)

# 2. Add Total_Value column
df['Total_Value'] = df['Price'] * df['Quantity']

# 3. Calculate total value by category
total_by_category = df.groupby('Category')['Total_Value'].sum()

# 4. Find product with highest total value
highest_value_product = df.loc[df['Total_Value'].idxmax()]

# 5. Filter electronics with price > 200
electronics_high_price = df[(df['Category'] == 'Electronics') & (df['Price'] > 200)]

# Print results
print("Original DataFrame:")
print(df)
print("\nTotal Value by Category:")
print(total_by_category)
print("\nProduct with Highest Total Value:")
print(highest_value_product)
print("\nElectronics Products with Price > 200:")
print(electronics_high_price)

Output:
Original DataFrame:
Product Category Price Quantity Total_Value
0 Laptop Electronics 1200 10 12000
1 Mouse Accessories 25 50 1250
2 Keyboard Accessories 80 30 2400
3 Monitor Electronics 300 20 6000
4 Headphones Accessories 100 40 4000

Total Value by Category:
Category
Accessories 7650
Electronics 18000
dtype: int64

Product with Highest Total Value:
Product Laptop
Category Electronics
Price 1200
Quantity 10
Total_Value 12000
Name: 0, dtype: object

Electronics Products with Price > 200:
Product Category Price Quantity Total_Value
0 Laptop Electronics 1200 10 12000

By: @DataScienceQ 🚀
How can you use Seaborn to create a heatmap that visualizes the correlation matrix of a dataset, and what are the key steps involved in preprocessing the data and customizing the plot for better readability? Provide a detailed code example with explanations at an intermediate level, including handling missing values, selecting relevant columns, and adjusting the color palette and annotations.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Step 1: Load a sample dataset (e.g., tips from seaborn's built-in datasets)
df = sns.load_dataset('tips')

# Step 2: Select only numeric columns for correlation analysis
numeric_df = df.select_dtypes(include=[np.number])

# Step 3: Handle missing values (if any)
numeric_df = numeric_df.dropna()

# Step 4: Compute the correlation matrix
correlation_matrix = numeric_df.corr()

# Step 5: Create a heatmap using Seaborn
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, linewidths=.5, fmt='.2f')
plt.title('Correlation Heatmap of Numeric Features in Tips Dataset')
plt.tight_layout()
plt.show()


Explanation:
- Step 1: We load a built-in dataset from Seaborn to work with.
- Step 2: Only numeric columns are selected because correlation is computed between numerical variables.
- Step 3: Missing values are removed to avoid errors during computation.
- Step 4: The corr() method computes pairwise correlations between columns.
- Step 5: sns.heatmap() creates a visual representation where colors represent correlation strength, annot=True adds the actual correlation coefficients, cmap='coolwarm' uses a diverging color scheme, and fmt='.2f' formats numbers to two decimal places.

#Seaborn #DataVisualization #Heatmap #Python #Pandas #CorrelationMatrix #IntermediateProgramming

By: @DataScienceQ 🚀
Pandas Python Tip: Custom Column Operations with apply()! 🚀

The df.apply() method is powerful for applying a function along an axis of the DataFrame (rows or columns), especially useful for custom transformations on columns or rows.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Score': [85, 92, 78]}
df = pd.DataFrame(data)

Example: Create a new column 'Grade' based on 'Score'

def assign_grade(score):
if score >= 90:
return 'A'
elif score >= 80:
return 'B'
else:
return 'C'

df['Grade'] = df['Score'].apply(assign_grade)
print(df)

You can also use lambda functions for simpler operations

df['Score_Double'] = df['Score'].apply(lambda x: x * 2)
print(df)

Key Takeaway: df.apply() (especially on a Series) is excellent for element-wise custom logic, often more readable than complex vectorized operations for specific tasks.

#Pandas #Python #DataScience #DataManipulation #PythonTips
---
By: @DataScienceQ
1