Python Data Science Jobs & Interviews
20.3K subscribers
188 photos
4 videos
25 files
326 links
Your go-to hub for Python and Data Science—featuring questions, answers, quizzes, and interview tips to sharpen your skills and boost your career in the data-driven world.

Admin: @Hussein_Sheikho
Download Telegram
Question 4 (Intermediate):
When working with Pandas in Python, what does the inplace=True parameter do in DataFrame operations?

A) Creates a copy of the DataFrame before applying changes
B) Modifies the original DataFrame directly
C) Saves the results to a CSV file automatically
D) Enables parallel processing for faster execution

#Python #Pandas #DataAnalysis #DataManipulation
🚀 Comprehensive Guide: How to Prepare for a Data Analyst Python Interview – 350 Most Common Interview Questions

Are you ready: https://hackmd.io/@husseinsheikho/pandas-interview

#DataAnalysis #PythonInterview #DataAnalyst #Pandas #NumPy #Matplotlib #Seaborn #SQL #DataCleaning #Visualization #MachineLearning #Statistics #InterviewPrep


✉️ Our Telegram channels: https://t.iss.one/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A
Please open Telegram to view this post
VIEW IN TELEGRAM
3
1. What is the primary data structure in pandas?
2. How do you create a DataFrame from a dictionary?
3. Which method is used to read a CSV file in pandas?
4. What does the head() function do in pandas?
5. How can you check the data types of columns in a DataFrame?
6. Which function drops rows with missing values in pandas?
7. What is the purpose of the merge() function in pandas?
8. How do you filter rows based on a condition in pandas?
9. What does the groupby() method do?
10. How can you sort a DataFrame by a specific column?
11. Which method is used to rename columns in pandas?
12. What is the difference between loc and iloc in pandas?
13. How do you handle duplicate rows in pandas?
14. What function converts a column to datetime format?
15. How do you apply a custom function to a DataFrame?
16. What is the use of the apply() method in pandas?
17. How can you concatenate two DataFrames?
18. What does the pivot_table() function do?
19. How do you calculate summary statistics in pandas?
20. Which method is used to export a DataFrame to a CSV file?

#️⃣ #pandas #dataanalysis #python #dataframe #coding #programming #datascience

By: t.iss.one/DataScienceQ 🚀
1. What is the output of the following code?
import numpy as np
a = np.array([[1, 2], [3, 4]])
b = a.T
b[0, 0] = 99
print(a)

2. Which of the following functions is used to create an array with values spaced at regular intervals?
A) np.linspace()
B) np.arange()
C) np.logspace()
D) All of the above

3. Write a function that takes a 1D NumPy array and returns a new array where each element is squared, but only if it’s greater than 5.

4. What will be printed by this code?
import numpy as np
x = np.array([1, 2, 3])
y = x.copy()
y[0] = 5
print(x[0])

5. Explain the difference between np.meshgrid() and np.mgrid in generating coordinate matrices.

6. How would you efficiently compute the outer product of two vectors using NumPy?

7. What is the result of np.sum(np.eye(3), axis=1)?

8. Write a program to generate a 5x5 matrix filled with random integers from 1 to 100, then find the maximum value in each row.

9. What happens when you use np.resize() on an array with shape (3,) to resize it to (5,)?

10. Which method can be used to flatten a multi-dimensional array into a 1D array without copying data?

11. What is the output of this code?
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
result = arr[[0, 1], [1, 2]]
print(result)

12. Describe how np.take() works and provide an example using a 2D array.

13. Write a function that calculates the Euclidean distance between all pairs of points in a 2D array of coordinates.

14. What is the purpose of np.frombuffer() and when might it be useful?

15. How do you perform matrix multiplication using np.matmul() and @ operator? Are they always equivalent?

16. Write a program to filter out all elements in a 2D array that are outside the range [10, 90].

17. What does np.nan_to_num() do and why is it important in numerical computations?

18. How can you efficiently transpose a large 3D array of shape (100, 100, 100) using np.transpose() or swapaxes()?

19. Explain the concept of "views" vs "copies" in NumPy and give an example where a view leads to unexpected behavior.

20. Write a function that computes the covariance matrix of a dataset represented as a 2D NumPy array.

#NumPy #AdvancedPython #DataScience #InterviewPrep #PythonLibrary #ScientificComputing #MachineLearning #CodingChallenge #HighLevelNumPy #PythonDeveloper #TechnicalInterview #DataAnalysis

By: @DataScienceQ 🚀
In Python, NumPy is the cornerstone of scientific computing, offering high-performance multidimensional arrays and tools for working with them—critical for data science interviews and real-world applications! 📊

import numpy as np

# Array Creation - The foundation of NumPy
arr = np.array([1, 2, 3])
zeros = np.zeros((2, 3)) # 2x3 matrix of zeros
ones = np.ones((2, 2), dtype=int) # Integer matrix
arange = np.arange(0, 10, 2) # [0 2 4 6 8]
linspace = np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
print(linspace)


# Array Attributes - Master your data's structure
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.shape) # Output: (2, 3)
print(matrix.ndim) # Output: 2
print(matrix.dtype) # Output: int64
print(matrix.size) # Output: 6


# Indexing & Slicing - Precision data access
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(data[1, 2]) # Output: 6 (row 1, col 2)
print(data[0:2, 1:3]) # Output: [[2 3], [5 6]]
print(data[:, -1]) # Output: [3 6 9] (last column)


# Reshaping Arrays - Transform dimensions effortlessly
flat = np.arange(6)
reshaped = flat.reshape(2, 3)
raveled = reshaped.ravel()
print(reshaped)
# Output: [[0 1 2], [3 4 5]]
print(raveled) # Output: [0 1 2 3 4 5]


# Stacking Arrays - Combine datasets vertically/horizontally
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.vstack((a, b))) # Vertical stack
# Output: [[1 2 3], [4 5 6]]
print(np.hstack((a, b))) # Horizontal stack
# Output: [1 2 3 4 5 6]


# Mathematical Operations - Vectorized calculations
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
print(x + y) # Output: [5 7 9]
print(x * 2) # Output: [2 4 6]
print(np.dot(x, y)) # Output: 32 (1*4 + 2*5 + 3*6)


# Broadcasting Magic - Operate on mismatched shapes
matrix = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10
print(matrix + scalar)
# Output: [[11 12 13], [14 15 16]]


# Aggregation Functions - Statistical power in one line
values = np.array([1, 5, 3, 9, 7])
print(np.sum(values)) # Output: 25
print(np.mean(values)) # Output: 5.0
print(np.max(values)) # Output: 9
print(np.std(values)) # Output: 2.8284271247461903


# Boolean Masking - Filter data like a pro
temperatures = np.array([18, 25, 12, 30, 22])
hot_days = temperatures > 24
print(temperatures[hot_days]) # Output: [25 30]


# Random Number Generation - Simulate real-world data
print(np.random.rand(2, 2)) # Uniform distribution
print(np.random.randn(3)) # Normal distribution
print(np.random.randint(0, 10, (2, 3))) # Random integers


# Linear Algebra Essentials - Solve equations like a physicist
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
x = np.linalg.solve(A, b)
print(x) # Output: [2. 3.] (Solution to 3x+y=9 and x+2y=8)

# Matrix inverse and determinant
print(np.linalg.inv(A)) # Output: [[ 0.4 -0.2], [-0.2 0.6]]
print(np.linalg.det(A)) # Output: 5.0


# File Operations - Save/load your computational work
data = np.array([[1, 2], [3, 4]])
np.save('array.npy', data)
loaded = np.load('array.npy')
print(np.array_equal(data, loaded)) # Output: True


# Interview Power Move: Vectorization vs Loops
# 10x faster than native Python loops!
def square_sum(n):
arr = np.arange(n)
return np.sum(arr ** 2)

print(square_sum(5)) # Output: 30 (0²+1²+2²+3²+4²)


# Pro Tip: Memory-efficient data processing
# Process 1GB array without loading entire dataset
large_array = np.memmap('large_data.bin', dtype='float32', mode='r', shape=(1000000, 100))
print(large_array[0:5, 0:3]) # Process small slice


By: @DataScienceQ 🚀

#Python #NumPy #DataScience #CodingInterview #MachineLearning #ScientificComputing #DataAnalysis #Programming #TechJobs #DeveloperTips
🧠 Quiz: What is one of the most critical first steps when starting a new data analysis project?

A) Select the most complex predictive model.
B) Immediately remove all outliers from the dataset.
C) Perform Exploratory Data Analysis (EDA) to understand the data's main characteristics.
D) Normalize all numerical features.

Correct answer: C

Explanation: EDA is crucial because it helps you summarize the data's main features, identify patterns, spot anomalies, and check assumptions before you proceed with more formal modeling. Steps like modeling or removing outliers should be informed by the initial understanding gained from EDA.

#DataAnalysis #DataScience #Statistics

━━━━━━━━━━━━━━━
By: @DataScienceQ
4
Top 100 SQL Interview Questions & Answers

#SQL #InterviewQuestions #DataAnalysis #Database #SQLQueries

👇👇👇👇👇
Please open Telegram to view this post
VIEW IN TELEGRAM
Top 100 SQL Interview Questions & Answers

#SQL #InterviewQuestions #DataAnalysis #Database #SQLQueries

Part 1: Basic Queries & DML/DDL (Q1-20)

#1. Select all columns and rows from a table named products.
A: Use SELECT * to retrieve all columns.

SELECT *
FROM products;

Output: All data from the 'products' table.


#2. Select only the product_name and price columns from the products table.
A: List the desired column names.

SELECT product_name, price
FROM products;

Output: A table with two columns: 'product_name' and 'price'.


#3. Find all products with a price greater than 50.
A: Use the WHERE clause with a comparison operator.

SELECT product_name, price
FROM products
WHERE price > 50;

Output: Products whose price is more than 50.


#4. Find all products that are red or have a price less than 20.
A: Use WHERE with OR to combine conditions.

SELECT product_name, color, price
FROM products
WHERE color = 'Red' OR price < 20;

Output: Products that are red OR cheaper than 20.


#5. Find all products that are red AND have a price greater than 100.
A: Use WHERE with AND to combine conditions.

SELECT product_name, color, price
FROM products
WHERE color = 'Red' AND price > 100;

Output: Products that are red AND more expensive than 100.


#6. Select all unique category values from the products table.
A: Use the DISTINCT keyword.

SELECT DISTINCT category
FROM products;

Output: A list of unique categories (e.g., 'Electronics', 'Books').


#7. Count the total number of products in the products table.
A: Use the COUNT(*) aggregate function.

SELECT COUNT(*) AS total_products
FROM products;

Output: A single number representing the total count.


#8. Find the average price of all products.
A: Use the AVG() aggregate function.

SELECT AVG(price) AS average_price
FROM products;

Output: A single number representing the average price.


#9. Find the highest and lowest price among all products.
A: Use the MAX() and MIN() aggregate functions.

SELECT MAX(price) AS highest_price, MIN(price) AS lowest_price
FROM products;

Output: Two numbers: the maximum and minimum price.


#10. Sort products by price in ascending order.
A: Use ORDER BY with ASC (or omit ASC as it's the default).

SELECT product_name, price
FROM products
ORDER BY price ASC;

Output: Products listed from cheapest to most expensive.


#11. Sort products by price in descending order.
A: Use ORDER BY with DESC.

SELECT product_name, price
FROM products
ORDER BY price DESC;

Output: Products listed from most expensive to cheapest.


#12. Sort products by category (ascending), then by price (descending).
A: Specify multiple columns in ORDER BY.

SELECT product_name, category, price
FROM products
ORDER BY category ASC, price DESC;

Output: Products grouped by category, then sorted by price within each category.