Data Science Projects
52.3K subscribers
379 photos
1 video
57 files
334 links
Perfect channel for Data Scientists

Learn Python, AI, R, Machine Learning, Data Science and many more

Admin: @love_data
Download Telegram
Python Cheat Sheet.pdf
677.7 KB
This cheat sheet includes basic python required for data analysis excluding pandas, numpy & other libraries
โค2๐Ÿ‘2
๐Ÿš€ Excel vs SQL vs Python (Pandas):

1๏ธโƒฃ Filtering Data
โ†ณ Excel: =FILTER(A2:D100, B2:B100>50) (Excel 365 users)
โ†ณ SQL: SELECT * FROM table WHERE column > 50;
โ†ณ Python: df_filtered = df[df['column'] > 50]

2๏ธโƒฃ Sorting Data
โ†ณ Excel: Data โ†’ Sort (or =SORT(A2:A100, 1, TRUE))
โ†ณ SQL: SELECT * FROM table ORDER BY column ASC;
โ†ณ Python: df_sorted = df.sort_values(by="column")

3๏ธโƒฃ Counting Rows
โ†ณ Excel: =COUNTA(A:A)
โ†ณ SQL: SELECT COUNT(*) FROM table;
โ†ณ Python: row_count = len(df)

4๏ธโƒฃ Removing Duplicates
โ†ณ Excel: Data โ†’ Remove Duplicates
โ†ณ SQL: SELECT DISTINCT * FROM table;
โ†ณ Python: df_unique = df.drop_duplicates()

5๏ธโƒฃ Joining Tables
โ†ณ Excel: Power Query โ†’ Merge Queries (or VLOOKUP/XLOOKUP)
โ†ณ SQL: SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
โ†ณ Python: df_merged = pd.merge(df1, df2, on="id")

6๏ธโƒฃ Ranking Data
โ†ณ Excel: =RANK.EQ(A2, $A$2:$A$100)
โ†ณ SQL: SELECT column, RANK() OVER (ORDER BY column DESC) AS rank FROM table;
โ†ณ Python: df["rank"] = df["column"].rank(method="min", ascending=False)

7๏ธโƒฃ Moving Average Calculation
โ†ณ Excel: =AVERAGE(B2:B4) (manually for rolling window)
โ†ณ SQL: SELECT date, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table;
โ†ณ Python: df["moving_avg"] = df["value"].rolling(window=3).mean()

8๏ธโƒฃ Running Total
โ†ณ Excel: =SUM($B$2:B2) (drag down)
โ†ณ SQL: SELECT date, SUM(value) OVER (ORDER BY date) AS running_total FROM table;
โ†ณ Python: df["running_total"] = df["value"].cumsum()
โค5๐Ÿ‘1
โ–ŽEssential Data Science Concepts Everyone Should Know:

1. Data Types and Structures:

โ€ข Categorical: Nominal (unordered, e.g., colors) and Ordinal (ordered, e.g., education levels)

โ€ข Numerical: Discrete (countable, e.g., number of children) and Continuous (measurable, e.g., height)

โ€ข Data Structures: Arrays, Lists, Dictionaries, DataFrames (for organizing and manipulating data)

2. Descriptive Statistics:

โ€ข Measures of Central Tendency: Mean, Median, Mode (describing the typical value)

โ€ข Measures of Dispersion: Variance, Standard Deviation, Range (describing the spread of data)

โ€ข Visualizations: Histograms, Boxplots, Scatterplots (for understanding data distribution)

3. Probability and Statistics:

โ€ข Probability Distributions: Normal, Binomial, Poisson (modeling data patterns)

โ€ข Hypothesis Testing: Formulating and testing claims about data (e.g., A/B testing)

โ€ข Confidence Intervals: Estimating the range of plausible values for a population parameter

4. Machine Learning:

โ€ข Supervised Learning: Regression (predicting continuous values) and Classification (predicting categories)

โ€ข Unsupervised Learning: Clustering (grouping similar data points) and Dimensionality Reduction (simplifying data)

โ€ข Model Evaluation: Accuracy, Precision, Recall, F1-score (assessing model performance)

5. Data Cleaning and Preprocessing:

โ€ข Missing Value Handling: Imputation, Deletion (dealing with incomplete data)

โ€ข Outlier Detection and Removal: Identifying and addressing extreme values

โ€ข Feature Engineering: Creating new features from existing ones (e.g., combining variables)

6. Data Visualization:

โ€ข Types of Charts: Bar charts, Line charts, Pie charts, Heatmaps (for communicating insights visually)

โ€ข Principles of Effective Visualization: Clarity, Accuracy, Aesthetics (for conveying information effectively)

7. Ethical Considerations in Data Science:

โ€ข Data Privacy and Security: Protecting sensitive information

โ€ข Bias and Fairness: Ensuring algorithms are unbiased and fair

8. Programming Languages and Tools:

โ€ข Python: Popular for data science with libraries like NumPy, Pandas, Scikit-learn

โ€ข R: Statistical programming language with strong visualization capabilities

โ€ข SQL: For querying and manipulating data in databases

9. Big Data and Cloud Computing:

โ€ข Hadoop and Spark: Frameworks for processing massive datasets

โ€ข Cloud Platforms: AWS, Azure, Google Cloud (for storing and analyzing data)

10. Domain Expertise:

โ€ข Understanding the Data: Knowing the context and meaning of data is crucial for effective analysis

โ€ข Problem Framing: Defining the right questions and objectives for data-driven decision making

Bonus:

โ€ข Data Storytelling: Communicating insights and findings in a clear and engaging manner

Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624

ENJOY LEARNING ๐Ÿ‘๐Ÿ‘
โค5
๐Ÿค– How Artificial Intelligence Works...
โค3
When youโ€™re in an interview, itโ€™s super important to know how to talk about your projects in a way that impresses the interviewer. Here are some key points to help you do just that:

โžค ๐—ฃ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜ ๐—ข๐˜ƒ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„:
- Start with a quick summary of the project you worked on. What was it all about? What were the main goals? Keep it short and sweet something you can explain in about 30 seconds.

โžค ๐—ฃ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ ๐—ฆ๐˜๐—ฎ๐˜๐—ฒ๐—บ๐—ฒ๐—ป๐˜:
- What problem were you trying to solve with this project? Explain why this problem was important and needed addressing.

โžค ๐—ฃ๐—ฟ๐—ผ๐—ฝ๐—ผ๐˜€๐—ฒ๐—ฑ ๐—ฆ๐—ผ๐—น๐˜‚๐˜๐—ถ๐—ผ๐—ป:
- Describe the solution you came up with. How does it work, and why is it a good fix for the problem?

โžค ๐—ฌ๐—ผ๐˜‚๐—ฟ ๐—ฅ๐—ผ๐—น๐—ฒ:
- Talk about what you specifically did. What were your main tasks? Did you face any challenges, and how did you overcome them? Make sure itโ€™s clear whether you were leading the project, a key player, or supporting the team.

โžค ๐—ง๐—ฒ๐—ฐ๐—ต๐—ป๐—ผ๐—น๐—ผ๐—ด๐—ถ๐—ฒ๐˜€ ๐—ฎ๐—ป๐—ฑ ๐—ง๐—ผ๐—ผ๐—น๐˜€:
- Mention the tech and tools you used. This shows your technical know-how and your ability to choose the right tools for the job.

โžค ๐—œ๐—บ๐—ฝ๐—ฎ๐—ฐ๐˜ ๐—ฎ๐—ป๐—ฑ ๐—”๐—ฐ๐—ต๐—ถ๐—ฒ๐˜ƒ๐—ฒ๐—บ๐—ฒ๐—ป๐˜๐˜€:
- Share the results of your project. Did it make things better? How? Mention any improvements, efficiencies, or positive feedback you got.

โžค ๐—ง๐—ฒ๐—ฎ๐—บ ๐—–๐—ผ๐—น๐—น๐—ฎ๐—ฏ๐—ผ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป:
- Talk about how you collaborated. What was your role in the team? How did you communicate and contribute to the teamโ€™s success?

โžค ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ฎ๐—ป๐—ฑ ๐——๐—ฒ๐˜ƒ๐—ฒ๐—น๐—ผ๐—ฝ๐—บ๐—ฒ๐—ป๐˜:
- Reflect on what you learned from the project. What new skills did you gain, and what would you do differently next time?

โžค ๐—ง๐—ถ๐—ฝ๐˜€ ๐—ณ๐—ผ๐—ฟ ๐—ฌ๐—ผ๐˜‚๐—ฟ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ฃ๐—ฟ๐—ฒ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป:
- Be ready with a 30 second elevator pitch about your projects, and also have a five-minute detailed overview ready.
- If thereโ€™s a pause after you describe the project, donโ€™t hesitate to ask if theyโ€™d like more details or if thereโ€™s a specific part theyโ€™re interested in.

By preparing your project details thoroughly and understanding what the interviewer is looking for, you can talk about your experience in a way that really showcases your skills and increases your chances of getting the job.

Coding Projects: https://whatsapp.com/channel/0029VazkxJ62UPB7OQhBE502
โค1
Data Science Cheatsheet ๐Ÿ’ช
โค5
VS Code Shortcuts
โค4
๐Ÿš€ Excel vs SQL vs Python (Pandas):

1๏ธโƒฃ Filtering Data
โ†ณ Excel: =FILTER(A2:D100, B2:B100>50) (Excel 365 users)
โ†ณ SQL: SELECT * FROM table WHERE column > 50;
โ†ณ Python: df_filtered = df[df['column'] > 50]

2๏ธโƒฃ Sorting Data
โ†ณ Excel: Data โ†’ Sort (or =SORT(A2:A100, 1, TRUE))
โ†ณ SQL: SELECT * FROM table ORDER BY column ASC;
โ†ณ Python: df_sorted = df.sort_values(by="column")

3๏ธโƒฃ Counting Rows
โ†ณ Excel: =COUNTA(A:A)
โ†ณ SQL: SELECT COUNT(*) FROM table;
โ†ณ Python: row_count = len(df)

4๏ธโƒฃ Removing Duplicates
โ†ณ Excel: Data โ†’ Remove Duplicates
โ†ณ SQL: SELECT DISTINCT * FROM table;
โ†ณ Python: df_unique = df.drop_duplicates()

5๏ธโƒฃ Joining Tables
โ†ณ Excel: Power Query โ†’ Merge Queries (or VLOOKUP/XLOOKUP)
โ†ณ SQL: SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
โ†ณ Python: df_merged = pd.merge(df1, df2, on="id")

6๏ธโƒฃ Ranking Data
โ†ณ Excel: =RANK.EQ(A2, $A$2:$A$100)
โ†ณ SQL: SELECT column, RANK() OVER (ORDER BY column DESC) AS rank FROM table;
โ†ณ Python: df["rank"] = df["column"].rank(method="min", ascending=False)

7๏ธโƒฃ Moving Average Calculation
โ†ณ Excel: =AVERAGE(B2:B4) (manually for rolling window)
โ†ณ SQL: SELECT date, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table;
โ†ณ Python: df["moving_avg"] = df["value"].rolling(window=3).mean()

8๏ธโƒฃ Running Total
โ†ณ Excel: =SUM($B$2:B2) (drag down)
โ†ณ SQL: SELECT date, SUM(value) OVER (ORDER BY date) AS running_total FROM table;
โ†ณ Python: df["running_total"] = df["value"].cumsum()
โค7
๐Ÿ“˜ SQL Challenges for Data Analytics โ€“ With Explanation ๐Ÿง 

(Beginner โžก๏ธ Advanced)

1๏ธโƒฃ Select Specific Columns

SELECT name, email FROM users;



This fetches only the name and email columns from the users table.

โœ”๏ธ Used when you donโ€™t want all columns from a table.


2๏ธโƒฃ Filter Records with WHERE

SELECT * FROM users WHERE age > 30;



The WHERE clause filters rows where age is greater than 30.

โœ”๏ธ Used for applying conditions on data.


3๏ธโƒฃ ORDER BY Clause

SELECT * FROM users ORDER BY registered_at DESC;



Sorts all users based on registered_at in descending order.
โœ”๏ธ Helpful to get latest data first.


4๏ธโƒฃ Aggregate Functions (COUNT, AVG)

SELECT COUNT(*) AS total_users, AVG(age) AS avg_age FROM users;


Explanation:
- COUNT(*) counts total rows (users).
- AVG(age) calculates the average age.
โœ”๏ธ Used for quick stats from tables.


5๏ธโƒฃ GROUP BY Usage

SELECT city, COUNT(*) AS user_count FROM users GROUP BY city;

Groups data by city and counts users in each group.

โœ”๏ธ Use when you want grouped summaries.


6๏ธโƒฃ JOIN Tables

SELECT users.name, orders.amount  
FROM users
JOIN orders ON users.id = orders.user_id;



Fetches user names along with order amounts by joining users and orders on matching IDs.
โœ”๏ธ Essential when combining data from multiple tables.


7๏ธโƒฃ Use of HAVING

SELECT city, COUNT(*) AS total  
FROM users
GROUP BY city
HAVING COUNT(*) > 5;



Like WHERE, but used with aggregates. This filters cities with more than 5 users.
โœ”๏ธ **Use HAVING after GROUP BY.**


8๏ธโƒฃ Subqueries

SELECT * FROM users  
WHERE salary > (SELECT AVG(salary) FROM users);



Finds users whose salary is above the average. The subquery calculates the average salary first.

โœ”๏ธ Nested queries for dynamic filtering9๏ธโƒฃ CASE Statementnt**

SELECT name,  
CASE
WHEN age < 18 THEN 'Teen'
WHEN age <= 40 THEN 'Adult'
ELSE 'Senior'
END AS age_group
FROM users;



Adds a new column that classifies users into categories based on age.
โœ”๏ธ Powerful for conditional logic.

๐Ÿ”Ÿ Window Functions (Advanced)

SELECT name, city, score,  
RANK() OVER (PARTITION BY city ORDER BY score DESC) AS rank
FROM users;



Ranks users by score *within each city*.

SQL Learning Series: https://whatsapp.com/channel/0029VanC5rODzgT6TiTGoa1v/1075
โค1