Since many of you were asking me to send Data Science Session
๐So we have come with a session for you!! ๐จ๐ปโ๐ป ๐ฉ๐ปโ๐ป
This will help you to speed up your job hunting process ๐ช
Register here
๐๐
https://go.acciojob.com/RYFvdU
Only limited free slots are available so Register Now
๐So we have come with a session for you!! ๐จ๐ปโ๐ป ๐ฉ๐ปโ๐ป
This will help you to speed up your job hunting process ๐ช
Register here
๐๐
https://go.acciojob.com/RYFvdU
Only limited free slots are available so Register Now
โค4
Python Cheat Sheet.pdf
677.7 KB
This cheat sheet includes basic python required for data analysis excluding pandas, numpy & other libraries
โค2๐2
๐ Excel vs SQL vs Python (Pandas):
1๏ธโฃ Filtering Data
โณ Excel: =FILTER(A2:D100, B2:B100>50) (Excel 365 users)
โณ SQL: SELECT * FROM table WHERE column > 50;
โณ Python: df_filtered = df[df['column'] > 50]
2๏ธโฃ Sorting Data
โณ Excel: Data โ Sort (or =SORT(A2:A100, 1, TRUE))
โณ SQL: SELECT * FROM table ORDER BY column ASC;
โณ Python: df_sorted = df.sort_values(by="column")
3๏ธโฃ Counting Rows
โณ Excel: =COUNTA(A:A)
โณ SQL: SELECT COUNT(*) FROM table;
โณ Python: row_count = len(df)
4๏ธโฃ Removing Duplicates
โณ Excel: Data โ Remove Duplicates
โณ SQL: SELECT DISTINCT * FROM table;
โณ Python: df_unique = df.drop_duplicates()
5๏ธโฃ Joining Tables
โณ Excel: Power Query โ Merge Queries (or VLOOKUP/XLOOKUP)
โณ SQL: SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
โณ Python: df_merged = pd.merge(df1, df2, on="id")
6๏ธโฃ Ranking Data
โณ Excel: =RANK.EQ(A2, $A$2:$A$100)
โณ SQL: SELECT column, RANK() OVER (ORDER BY column DESC) AS rank FROM table;
โณ Python: df["rank"] = df["column"].rank(method="min", ascending=False)
7๏ธโฃ Moving Average Calculation
โณ Excel: =AVERAGE(B2:B4) (manually for rolling window)
โณ SQL: SELECT date, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table;
โณ Python: df["moving_avg"] = df["value"].rolling(window=3).mean()
8๏ธโฃ Running Total
โณ Excel: =SUM($B$2:B2) (drag down)
โณ SQL: SELECT date, SUM(value) OVER (ORDER BY date) AS running_total FROM table;
โณ Python: df["running_total"] = df["value"].cumsum()
1๏ธโฃ Filtering Data
โณ Excel: =FILTER(A2:D100, B2:B100>50) (Excel 365 users)
โณ SQL: SELECT * FROM table WHERE column > 50;
โณ Python: df_filtered = df[df['column'] > 50]
2๏ธโฃ Sorting Data
โณ Excel: Data โ Sort (or =SORT(A2:A100, 1, TRUE))
โณ SQL: SELECT * FROM table ORDER BY column ASC;
โณ Python: df_sorted = df.sort_values(by="column")
3๏ธโฃ Counting Rows
โณ Excel: =COUNTA(A:A)
โณ SQL: SELECT COUNT(*) FROM table;
โณ Python: row_count = len(df)
4๏ธโฃ Removing Duplicates
โณ Excel: Data โ Remove Duplicates
โณ SQL: SELECT DISTINCT * FROM table;
โณ Python: df_unique = df.drop_duplicates()
5๏ธโฃ Joining Tables
โณ Excel: Power Query โ Merge Queries (or VLOOKUP/XLOOKUP)
โณ SQL: SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
โณ Python: df_merged = pd.merge(df1, df2, on="id")
6๏ธโฃ Ranking Data
โณ Excel: =RANK.EQ(A2, $A$2:$A$100)
โณ SQL: SELECT column, RANK() OVER (ORDER BY column DESC) AS rank FROM table;
โณ Python: df["rank"] = df["column"].rank(method="min", ascending=False)
7๏ธโฃ Moving Average Calculation
โณ Excel: =AVERAGE(B2:B4) (manually for rolling window)
โณ SQL: SELECT date, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table;
โณ Python: df["moving_avg"] = df["value"].rolling(window=3).mean()
8๏ธโฃ Running Total
โณ Excel: =SUM($B$2:B2) (drag down)
โณ SQL: SELECT date, SUM(value) OVER (ORDER BY date) AS running_total FROM table;
โณ Python: df["running_total"] = df["value"].cumsum()
โค5๐1
โEssential Data Science Concepts Everyone Should Know:
1. Data Types and Structures:
โข Categorical: Nominal (unordered, e.g., colors) and Ordinal (ordered, e.g., education levels)
โข Numerical: Discrete (countable, e.g., number of children) and Continuous (measurable, e.g., height)
โข Data Structures: Arrays, Lists, Dictionaries, DataFrames (for organizing and manipulating data)
2. Descriptive Statistics:
โข Measures of Central Tendency: Mean, Median, Mode (describing the typical value)
โข Measures of Dispersion: Variance, Standard Deviation, Range (describing the spread of data)
โข Visualizations: Histograms, Boxplots, Scatterplots (for understanding data distribution)
3. Probability and Statistics:
โข Probability Distributions: Normal, Binomial, Poisson (modeling data patterns)
โข Hypothesis Testing: Formulating and testing claims about data (e.g., A/B testing)
โข Confidence Intervals: Estimating the range of plausible values for a population parameter
4. Machine Learning:
โข Supervised Learning: Regression (predicting continuous values) and Classification (predicting categories)
โข Unsupervised Learning: Clustering (grouping similar data points) and Dimensionality Reduction (simplifying data)
โข Model Evaluation: Accuracy, Precision, Recall, F1-score (assessing model performance)
5. Data Cleaning and Preprocessing:
โข Missing Value Handling: Imputation, Deletion (dealing with incomplete data)
โข Outlier Detection and Removal: Identifying and addressing extreme values
โข Feature Engineering: Creating new features from existing ones (e.g., combining variables)
6. Data Visualization:
โข Types of Charts: Bar charts, Line charts, Pie charts, Heatmaps (for communicating insights visually)
โข Principles of Effective Visualization: Clarity, Accuracy, Aesthetics (for conveying information effectively)
7. Ethical Considerations in Data Science:
โข Data Privacy and Security: Protecting sensitive information
โข Bias and Fairness: Ensuring algorithms are unbiased and fair
8. Programming Languages and Tools:
โข Python: Popular for data science with libraries like NumPy, Pandas, Scikit-learn
โข R: Statistical programming language with strong visualization capabilities
โข SQL: For querying and manipulating data in databases
9. Big Data and Cloud Computing:
โข Hadoop and Spark: Frameworks for processing massive datasets
โข Cloud Platforms: AWS, Azure, Google Cloud (for storing and analyzing data)
10. Domain Expertise:
โข Understanding the Data: Knowing the context and meaning of data is crucial for effective analysis
โข Problem Framing: Defining the right questions and objectives for data-driven decision making
Bonus:
โข Data Storytelling: Communicating insights and findings in a clear and engaging manner
Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624
ENJOY LEARNING ๐๐
1. Data Types and Structures:
โข Categorical: Nominal (unordered, e.g., colors) and Ordinal (ordered, e.g., education levels)
โข Numerical: Discrete (countable, e.g., number of children) and Continuous (measurable, e.g., height)
โข Data Structures: Arrays, Lists, Dictionaries, DataFrames (for organizing and manipulating data)
2. Descriptive Statistics:
โข Measures of Central Tendency: Mean, Median, Mode (describing the typical value)
โข Measures of Dispersion: Variance, Standard Deviation, Range (describing the spread of data)
โข Visualizations: Histograms, Boxplots, Scatterplots (for understanding data distribution)
3. Probability and Statistics:
โข Probability Distributions: Normal, Binomial, Poisson (modeling data patterns)
โข Hypothesis Testing: Formulating and testing claims about data (e.g., A/B testing)
โข Confidence Intervals: Estimating the range of plausible values for a population parameter
4. Machine Learning:
โข Supervised Learning: Regression (predicting continuous values) and Classification (predicting categories)
โข Unsupervised Learning: Clustering (grouping similar data points) and Dimensionality Reduction (simplifying data)
โข Model Evaluation: Accuracy, Precision, Recall, F1-score (assessing model performance)
5. Data Cleaning and Preprocessing:
โข Missing Value Handling: Imputation, Deletion (dealing with incomplete data)
โข Outlier Detection and Removal: Identifying and addressing extreme values
โข Feature Engineering: Creating new features from existing ones (e.g., combining variables)
6. Data Visualization:
โข Types of Charts: Bar charts, Line charts, Pie charts, Heatmaps (for communicating insights visually)
โข Principles of Effective Visualization: Clarity, Accuracy, Aesthetics (for conveying information effectively)
7. Ethical Considerations in Data Science:
โข Data Privacy and Security: Protecting sensitive information
โข Bias and Fairness: Ensuring algorithms are unbiased and fair
8. Programming Languages and Tools:
โข Python: Popular for data science with libraries like NumPy, Pandas, Scikit-learn
โข R: Statistical programming language with strong visualization capabilities
โข SQL: For querying and manipulating data in databases
9. Big Data and Cloud Computing:
โข Hadoop and Spark: Frameworks for processing massive datasets
โข Cloud Platforms: AWS, Azure, Google Cloud (for storing and analyzing data)
10. Domain Expertise:
โข Understanding the Data: Knowing the context and meaning of data is crucial for effective analysis
โข Problem Framing: Defining the right questions and objectives for data-driven decision making
Bonus:
โข Data Storytelling: Communicating insights and findings in a clear and engaging manner
Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624
ENJOY LEARNING ๐๐
โค5
When youโre in an interview, itโs super important to know how to talk about your projects in a way that impresses the interviewer. Here are some key points to help you do just that:
โค ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ ๐ข๐๐ฒ๐ฟ๐๐ถ๐ฒ๐:
- Start with a quick summary of the project you worked on. What was it all about? What were the main goals? Keep it short and sweet something you can explain in about 30 seconds.
โค ๐ฃ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ ๐ฆ๐๐ฎ๐๐ฒ๐บ๐ฒ๐ป๐:
- What problem were you trying to solve with this project? Explain why this problem was important and needed addressing.
โค ๐ฃ๐ฟ๐ผ๐ฝ๐ผ๐๐ฒ๐ฑ ๐ฆ๐ผ๐น๐๐๐ถ๐ผ๐ป:
- Describe the solution you came up with. How does it work, and why is it a good fix for the problem?
โค ๐ฌ๐ผ๐๐ฟ ๐ฅ๐ผ๐น๐ฒ:
- Talk about what you specifically did. What were your main tasks? Did you face any challenges, and how did you overcome them? Make sure itโs clear whether you were leading the project, a key player, or supporting the team.
โค ๐ง๐ฒ๐ฐ๐ต๐ป๐ผ๐น๐ผ๐ด๐ถ๐ฒ๐ ๐ฎ๐ป๐ฑ ๐ง๐ผ๐ผ๐น๐:
- Mention the tech and tools you used. This shows your technical know-how and your ability to choose the right tools for the job.
โค ๐๐บ๐ฝ๐ฎ๐ฐ๐ ๐ฎ๐ป๐ฑ ๐๐ฐ๐ต๐ถ๐ฒ๐๐ฒ๐บ๐ฒ๐ป๐๐:
- Share the results of your project. Did it make things better? How? Mention any improvements, efficiencies, or positive feedback you got.
โค ๐ง๐ฒ๐ฎ๐บ ๐๐ผ๐น๐น๐ฎ๐ฏ๐ผ๐ฟ๐ฎ๐๐ถ๐ผ๐ป:
- Talk about how you collaborated. What was your role in the team? How did you communicate and contribute to the teamโs success?
โค ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ฎ๐ป๐ฑ ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐:
- Reflect on what you learned from the project. What new skills did you gain, and what would you do differently next time?
โค ๐ง๐ถ๐ฝ๐ ๐ณ๐ผ๐ฟ ๐ฌ๐ผ๐๐ฟ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ฃ๐ฟ๐ฒ๐ฝ๐ฎ๐ฟ๐ฎ๐๐ถ๐ผ๐ป:
- Be ready with a 30 second elevator pitch about your projects, and also have a five-minute detailed overview ready.
- If thereโs a pause after you describe the project, donโt hesitate to ask if theyโd like more details or if thereโs a specific part theyโre interested in.
By preparing your project details thoroughly and understanding what the interviewer is looking for, you can talk about your experience in a way that really showcases your skills and increases your chances of getting the job.
Coding Projects: https://whatsapp.com/channel/0029VazkxJ62UPB7OQhBE502
โค ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ ๐ข๐๐ฒ๐ฟ๐๐ถ๐ฒ๐:
- Start with a quick summary of the project you worked on. What was it all about? What were the main goals? Keep it short and sweet something you can explain in about 30 seconds.
โค ๐ฃ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ ๐ฆ๐๐ฎ๐๐ฒ๐บ๐ฒ๐ป๐:
- What problem were you trying to solve with this project? Explain why this problem was important and needed addressing.
โค ๐ฃ๐ฟ๐ผ๐ฝ๐ผ๐๐ฒ๐ฑ ๐ฆ๐ผ๐น๐๐๐ถ๐ผ๐ป:
- Describe the solution you came up with. How does it work, and why is it a good fix for the problem?
โค ๐ฌ๐ผ๐๐ฟ ๐ฅ๐ผ๐น๐ฒ:
- Talk about what you specifically did. What were your main tasks? Did you face any challenges, and how did you overcome them? Make sure itโs clear whether you were leading the project, a key player, or supporting the team.
โค ๐ง๐ฒ๐ฐ๐ต๐ป๐ผ๐น๐ผ๐ด๐ถ๐ฒ๐ ๐ฎ๐ป๐ฑ ๐ง๐ผ๐ผ๐น๐:
- Mention the tech and tools you used. This shows your technical know-how and your ability to choose the right tools for the job.
โค ๐๐บ๐ฝ๐ฎ๐ฐ๐ ๐ฎ๐ป๐ฑ ๐๐ฐ๐ต๐ถ๐ฒ๐๐ฒ๐บ๐ฒ๐ป๐๐:
- Share the results of your project. Did it make things better? How? Mention any improvements, efficiencies, or positive feedback you got.
โค ๐ง๐ฒ๐ฎ๐บ ๐๐ผ๐น๐น๐ฎ๐ฏ๐ผ๐ฟ๐ฎ๐๐ถ๐ผ๐ป:
- Talk about how you collaborated. What was your role in the team? How did you communicate and contribute to the teamโs success?
โค ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ฎ๐ป๐ฑ ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐:
- Reflect on what you learned from the project. What new skills did you gain, and what would you do differently next time?
โค ๐ง๐ถ๐ฝ๐ ๐ณ๐ผ๐ฟ ๐ฌ๐ผ๐๐ฟ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ฃ๐ฟ๐ฒ๐ฝ๐ฎ๐ฟ๐ฎ๐๐ถ๐ผ๐ป:
- Be ready with a 30 second elevator pitch about your projects, and also have a five-minute detailed overview ready.
- If thereโs a pause after you describe the project, donโt hesitate to ask if theyโd like more details or if thereโs a specific part theyโre interested in.
By preparing your project details thoroughly and understanding what the interviewer is looking for, you can talk about your experience in a way that really showcases your skills and increases your chances of getting the job.
Coding Projects: https://whatsapp.com/channel/0029VazkxJ62UPB7OQhBE502
โค1
๐ Excel vs SQL vs Python (Pandas):
1๏ธโฃ Filtering Data
โณ Excel: =FILTER(A2:D100, B2:B100>50) (Excel 365 users)
โณ SQL: SELECT * FROM table WHERE column > 50;
โณ Python: df_filtered = df[df['column'] > 50]
2๏ธโฃ Sorting Data
โณ Excel: Data โ Sort (or =SORT(A2:A100, 1, TRUE))
โณ SQL: SELECT * FROM table ORDER BY column ASC;
โณ Python: df_sorted = df.sort_values(by="column")
3๏ธโฃ Counting Rows
โณ Excel: =COUNTA(A:A)
โณ SQL: SELECT COUNT(*) FROM table;
โณ Python: row_count = len(df)
4๏ธโฃ Removing Duplicates
โณ Excel: Data โ Remove Duplicates
โณ SQL: SELECT DISTINCT * FROM table;
โณ Python: df_unique = df.drop_duplicates()
5๏ธโฃ Joining Tables
โณ Excel: Power Query โ Merge Queries (or VLOOKUP/XLOOKUP)
โณ SQL: SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
โณ Python: df_merged = pd.merge(df1, df2, on="id")
6๏ธโฃ Ranking Data
โณ Excel: =RANK.EQ(A2, $A$2:$A$100)
โณ SQL: SELECT column, RANK() OVER (ORDER BY column DESC) AS rank FROM table;
โณ Python: df["rank"] = df["column"].rank(method="min", ascending=False)
7๏ธโฃ Moving Average Calculation
โณ Excel: =AVERAGE(B2:B4) (manually for rolling window)
โณ SQL: SELECT date, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table;
โณ Python: df["moving_avg"] = df["value"].rolling(window=3).mean()
8๏ธโฃ Running Total
โณ Excel: =SUM($B$2:B2) (drag down)
โณ SQL: SELECT date, SUM(value) OVER (ORDER BY date) AS running_total FROM table;
โณ Python: df["running_total"] = df["value"].cumsum()
1๏ธโฃ Filtering Data
โณ Excel: =FILTER(A2:D100, B2:B100>50) (Excel 365 users)
โณ SQL: SELECT * FROM table WHERE column > 50;
โณ Python: df_filtered = df[df['column'] > 50]
2๏ธโฃ Sorting Data
โณ Excel: Data โ Sort (or =SORT(A2:A100, 1, TRUE))
โณ SQL: SELECT * FROM table ORDER BY column ASC;
โณ Python: df_sorted = df.sort_values(by="column")
3๏ธโฃ Counting Rows
โณ Excel: =COUNTA(A:A)
โณ SQL: SELECT COUNT(*) FROM table;
โณ Python: row_count = len(df)
4๏ธโฃ Removing Duplicates
โณ Excel: Data โ Remove Duplicates
โณ SQL: SELECT DISTINCT * FROM table;
โณ Python: df_unique = df.drop_duplicates()
5๏ธโฃ Joining Tables
โณ Excel: Power Query โ Merge Queries (or VLOOKUP/XLOOKUP)
โณ SQL: SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
โณ Python: df_merged = pd.merge(df1, df2, on="id")
6๏ธโฃ Ranking Data
โณ Excel: =RANK.EQ(A2, $A$2:$A$100)
โณ SQL: SELECT column, RANK() OVER (ORDER BY column DESC) AS rank FROM table;
โณ Python: df["rank"] = df["column"].rank(method="min", ascending=False)
7๏ธโฃ Moving Average Calculation
โณ Excel: =AVERAGE(B2:B4) (manually for rolling window)
โณ SQL: SELECT date, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table;
โณ Python: df["moving_avg"] = df["value"].rolling(window=3).mean()
8๏ธโฃ Running Total
โณ Excel: =SUM($B$2:B2) (drag down)
โณ SQL: SELECT date, SUM(value) OVER (ORDER BY date) AS running_total FROM table;
โณ Python: df["running_total"] = df["value"].cumsum()
โค7