Data Science & Machine Learning
73.2K subscribers
790 photos
2 videos
68 files
689 links
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data
Download Telegram
Ever wondered what the difference is between a Data Analyst and a Data Scientist? Both roles are in high demand, but they tackle data in different ways.
❀9
SQL Cheatsheet πŸ“

This SQL cheatsheet is designed to be your quick reference guide for SQL programming. Whether you’re a beginner learning how to query databases or an experienced developer looking for a handy resource, this cheatsheet covers essential SQL topics.

1. Database Basics
- CREATE DATABASE db_name;
- USE db_name;

2. Tables
- Create Table: CREATE TABLE table_name (col1 datatype, col2 datatype);
- Drop Table: DROP TABLE table_name;
- Alter Table: ALTER TABLE table_name ADD column_name datatype;

3. Insert Data
- INSERT INTO table_name (col1, col2) VALUES (val1, val2);

4. Select Queries
- Basic Select: SELECT * FROM table_name;
- Select Specific Columns: SELECT col1, col2 FROM table_name;
- Select with Condition: SELECT * FROM table_name WHERE condition;

5. Update Data
- UPDATE table_name SET col1 = value1 WHERE condition;

6. Delete Data
- DELETE FROM table_name WHERE condition;

7. Joins
- Inner Join: SELECT * FROM table1 INNER JOIN table2 ON table1.col = table2.col;
- Left Join: SELECT * FROM table1 LEFT JOIN table2 ON table1.col = table2.col;
- Right Join: SELECT * FROM table1 RIGHT JOIN table2 ON table1.col = table2.col;

8. Aggregations
- Count: SELECT COUNT(*) FROM table_name;
- Sum: SELECT SUM(col) FROM table_name;
- Group By: SELECT col, COUNT(*) FROM table_name GROUP BY col;

9. Sorting & Limiting
- Order By: SELECT * FROM table_name ORDER BY col ASC|DESC;
- Limit Results: SELECT * FROM table_name LIMIT n;

10. Indexes
- Create Index: CREATE INDEX idx_name ON table_name (col);
- Drop Index: DROP INDEX idx_name;

11. Subqueries
- SELECT * FROM table_name WHERE col IN (SELECT col FROM other_table);

12. Views
- Create View: CREATE VIEW view_name AS SELECT * FROM table_name;
- Drop View: DROP VIEW view_name;
❀5πŸ”₯1
πŸš€ Complete Roadmap to Become a Data Scientist in 5 Months

πŸ“… Week 1-2: Fundamentals
βœ… Day 1-3: Introduction to Data Science, its applications, and roles.
βœ… Day 4-7: Brush up on Python programming 🐍.
βœ… Day 8-10: Learn basic statistics πŸ“Š and probability 🎲.

πŸ” Week 3-4: Data Manipulation & Visualization
πŸ“ Day 11-15: Master Pandas for data manipulation.
πŸ“ˆ Day 16-20: Learn Matplotlib & Seaborn for data visualization.

πŸ€– Week 5-6: Machine Learning Foundations
πŸ”¬ Day 21-25: Introduction to scikit-learn.
πŸ“Š Day 26-30: Learn Linear & Logistic Regression.

πŸ— Week 7-8: Advanced Machine Learning
🌳 Day 31-35: Explore Decision Trees & Random Forests.
πŸ“Œ Day 36-40: Learn Clustering (K-Means, DBSCAN) & Dimensionality Reduction.

🧠 Week 9-10: Deep Learning
πŸ€– Day 41-45: Basics of Neural Networks with TensorFlow/Keras.
πŸ“Έ Day 46-50: Learn CNNs & RNNs for image & text data.

πŸ› Week 11-12: Data Engineering
πŸ—„ Day 51-55: Learn SQL & Databases.
🧹 Day 56-60: Data Preprocessing & Cleaning.

πŸ“Š Week 13-14: Model Evaluation & Optimization
πŸ“ Day 61-65: Learn Cross-validation & Hyperparameter Tuning.
πŸ“‰ Day 66-70: Understand Evaluation Metrics (Accuracy, Precision, Recall, F1-score).

πŸ— Week 15-16: Big Data & Tools
🐘 Day 71-75: Introduction to Big Data Technologies (Hadoop, Spark).
☁️ Day 76-80: Learn Cloud Computing (AWS, GCP, Azure).

πŸš€ Week 17-18: Deployment & Production
πŸ›  Day 81-85: Deploy models using Flask or FastAPI.
πŸ“¦ Day 86-90: Learn Docker & Cloud Deployment (AWS, Heroku).

🎯 Week 19-20: Specialization
πŸ“ Day 91-95: Choose NLP or Computer Vision, based on your interest.

πŸ† Week 21-22: Projects & Portfolio
πŸ“‚ Day 96-100: Work on Personal Data Science Projects.

πŸ’¬ Week 23-24: Soft Skills & Networking
🎀 Day 101-105: Improve Communication & Presentation Skills.
🌐 Day 106-110: Attend Online Meetups & Forums.

🎯 Week 25-26: Interview Preparation
πŸ’» Day 111-115: Practice Coding Interviews (LeetCode, HackerRank).
πŸ“‚ Day 116-120: Review your projects & prepare for discussions.

πŸ‘¨β€πŸ’» Week 27-28: Apply for Jobs
πŸ“© Day 121-125: Start applying for Entry-Level Data Scientist positions.

🎀 Week 29-30: Interviews
πŸ“ Day 126-130: Attend Interviews & Practice Whiteboard Problems.

πŸ”„ Week 31-32: Continuous Learning
πŸ“° Day 131-135: Stay updated with the Latest Data Science Trends.

πŸ† Week 33-34: Accepting Offers
πŸ“ Day 136-140: Evaluate job offers & Negotiate Your Salary.

🏒 Week 35-36: Settling In
🎯 Day 141-150: Start your New Data Science Job, adapt & keep learning!

πŸŽ‰ Enjoy Learning & Build Your Dream Career in Data Science! πŸš€πŸ”₯
❀7
SQL Joins β€” A Practical Cheatsheet for Professionals

If you’re working with relational data β€” whether you’re a business analyst, backend dev, or aspiring data scientist β€” mastering SQL joins isn’t optional. It’s fundamental.

Here’s a concise guide to the most important join types, with real-world use cases:


INNER JOIN

Returns records with matching keys from both tables.
Use case: Show only customers who’ve placed at least one order.


LEFT JOIN (OUTER)

Returns all rows from the left table, and matched rows from the right.
Use case: List all customers, including those with zero orders.


RIGHT JOIN (OUTER)

Returns all rows from the right table. Rarely used, but powerful.
Use case: Show all orders, even if the customer was deleted.


FULL OUTER JOIN

Returns all records from both tables.
Use case: Capture everything β€” matched and unmatched.


CROSS JOIN

Returns the cartesian product.
Use case: Generate every possible product/supplier combo.


SELF JOIN

Joins a table to itself.
Use case: Show employees and their reporting managers.


Best Practices

Use aliases (A, B) for clean code
Prefer JOIN ON over WHERE for clarity
Always test joins with LIMIT to prevent overloads
❀6πŸ”₯3
Random Module in Python πŸ‘†
❀7
Data Cleaning Tips βœ…
❀7
The Data Science skill no one talks about...

Every aspiring data scientist I talk to thinks their job starts when someone else gives them:
    1. a dataset, and
    2. a clearly defined metric to optimize for, e.g. accuracy

But it doesn’t.

It starts with a business problem you need to understand, frame, and solve. This is the key data science skill that separates senior from junior professionals.

Let’s go through an example.

Example

Imagine you are a data scientist at Uber. And your product lead tells you:

    πŸ‘©β€πŸ’Ό: β€œWe want to decrease user churn by 5% this quarter”


We say that a user churns when she decides to stop using Uber.

But why?

There are different reasons why a user would stop using Uber. For example:

   1.  β€œLyft is offering better prices for that geo” (pricing problem)
   2. β€œCar waiting times are too long” (supply problem)
   3. β€œThe Android version of the app is very slow” (client-app performance problem)

You build this list ↑ by asking the right questions to the rest of the team. You need to understand the user’s experience using the app, from HER point of view.

Typically there is no single reason behind churn, but a combination of a few of these. The question is: which one should you focus on?

This is when you pull out your great data science skills and EXPLORE THE DATA πŸ”Ž.

You explore the data to understand how plausible each of the above explanations is. The output from this analysis is a single hypothesis you should consider further. Depending on the hypothesis, you will solve the data science problem differently.

For example…

Scenario 1: β€œLyft Is Offering Better Prices” (Pricing Problem)

One solution would be to detect/predict the segment of users who are likely to churn (possibly using an ML Model) and send personalized discounts via push notifications. To test your solution works, you will need to run an A/B test, so you will split a percentage of Uber users into 2 groups:

    The A group. No user in this group will receive any discount.

    The B group. Users from this group that the model thinks are likely to churn, will receive a price discount in their next trip.

You could add more groups (e.g. C, D, E…) to test different pricing points.

In a nutshell

    1. Translating business problems into data science problems is the key data science skill that separates a senior from a junior data scientist.
2. Ask the right questions, list possible solutions, and explore the data to narrow down the list to one.
3. Solve this one data science problem
❀10