Common Data Cleaning Techniques for Data Analysts
Remove Duplicates:
Purpose: Eliminate repeated rows to maintain unique data.
Example: SELECT DISTINCT column_name FROM table;
Handle Missing Values:
Purpose: Fill, remove, or impute missing data.
Example:
Remove: df.dropna() (in Python/Pandas)
Fill: df.fillna(0)
Standardize Data:
Purpose: Convert data to a consistent format (e.g., dates, numbers).
Example: Convert text to lowercase: df['column'] = df['column'].str.lower()
Remove Outliers:
Purpose: Identify and remove extreme values.
Example: df = df[df['column'] < threshold]
Correct Data Types:
Purpose: Ensure columns have the correct data type (e.g., dates as datetime, numeric values as integers).
Example: df['date'] = pd.to_datetime(df['date'])
Normalize Data:
Purpose: Scale numerical data to a standard range (0 to 1).
Example: from sklearn.preprocessing import MinMaxScaler; df['scaled'] = MinMaxScaler().fit_transform(df[['column']])
Data Transformation:
Purpose: Transform or aggregate data for better analysis (e.g., log transformations, aggregating columns).
Example: Apply log transformation: df['log_column'] = np.log(df['column'] + 1)
Handle Categorical Data:
Purpose: Convert categorical data into numerical data using encoding techniques.
Example: df['encoded_column'] = pd.get_dummies(df['category_column'])
Impute Missing Values:
Purpose: Fill missing values with a meaningful value (e.g., mean, median, or a specific value).
Example: df['column'] = df['column'].fillna(df['column'].mean())
I have curated best 80+ top-notch Data Analytics Resources ππ
https://whatsapp.com/channel/0029VaGgzAk72WTmQFERKh02
Like this post for more content like this πβ₯οΈ
Share with credits: https://t.iss.one/sqlspecialist
Hope it helps :)
Remove Duplicates:
Purpose: Eliminate repeated rows to maintain unique data.
Example: SELECT DISTINCT column_name FROM table;
Handle Missing Values:
Purpose: Fill, remove, or impute missing data.
Example:
Remove: df.dropna() (in Python/Pandas)
Fill: df.fillna(0)
Standardize Data:
Purpose: Convert data to a consistent format (e.g., dates, numbers).
Example: Convert text to lowercase: df['column'] = df['column'].str.lower()
Remove Outliers:
Purpose: Identify and remove extreme values.
Example: df = df[df['column'] < threshold]
Correct Data Types:
Purpose: Ensure columns have the correct data type (e.g., dates as datetime, numeric values as integers).
Example: df['date'] = pd.to_datetime(df['date'])
Normalize Data:
Purpose: Scale numerical data to a standard range (0 to 1).
Example: from sklearn.preprocessing import MinMaxScaler; df['scaled'] = MinMaxScaler().fit_transform(df[['column']])
Data Transformation:
Purpose: Transform or aggregate data for better analysis (e.g., log transformations, aggregating columns).
Example: Apply log transformation: df['log_column'] = np.log(df['column'] + 1)
Handle Categorical Data:
Purpose: Convert categorical data into numerical data using encoding techniques.
Example: df['encoded_column'] = pd.get_dummies(df['category_column'])
Impute Missing Values:
Purpose: Fill missing values with a meaningful value (e.g., mean, median, or a specific value).
Example: df['column'] = df['column'].fillna(df['column'].mean())
I have curated best 80+ top-notch Data Analytics Resources ππ
https://whatsapp.com/channel/0029VaGgzAk72WTmQFERKh02
Like this post for more content like this πβ₯οΈ
Share with credits: https://t.iss.one/sqlspecialist
Hope it helps :)
π4
1. What is a UNIQUE constraint?
The UNIQUE Constraint prevents identical values in a column from appearing in two records. The UNIQUE constraint guarantees that every value in a column is unique.
2. What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
3. What is the case when in SQL Server?
The CASE statement is used to construct logic in which one columnβs value is determined by the values of other columns. The condition to be tested is specified by the WHEN statement. If the WHEN condition returns TRUE, the THEN sentence explains what to do.
When none of the WHEN conditions return true, the ELSE statement is executed. The END keyword brings the CASE statement to a close.
4. What is the main difference between βBETWEENβ and βINβ condition operators?
BETWEEN operator is used to display rows based on a range of values in a row whereas the IN condition operator is used to check for values contained in a specific set of values.
The UNIQUE Constraint prevents identical values in a column from appearing in two records. The UNIQUE constraint guarantees that every value in a column is unique.
2. What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
3. What is the case when in SQL Server?
The CASE statement is used to construct logic in which one columnβs value is determined by the values of other columns. The condition to be tested is specified by the WHEN statement. If the WHEN condition returns TRUE, the THEN sentence explains what to do.
When none of the WHEN conditions return true, the ELSE statement is executed. The END keyword brings the CASE statement to a close.
4. What is the main difference between βBETWEENβ and βINβ condition operators?
BETWEEN operator is used to display rows based on a range of values in a row whereas the IN condition operator is used to check for values contained in a specific set of values.
π2β€1β1
Python Detailed Roadmap π
π 1. Basics
βΌ Data Types & Variables
βΌ Operators & Expressions
βΌ Control Flow (if, loops)
π 2. Functions & Modules
βΌ Defining Functions
βΌ Lambda Functions
βΌ Importing & Creating Modules
π 3. File Handling
βΌ Reading & Writing Files
βΌ Working with CSV & JSON
π 4. Object-Oriented Programming (OOP)
βΌ Classes & Objects
βΌ Inheritance & Polymorphism
βΌ Encapsulation
π 5. Exception Handling
βΌ Try-Except Blocks
βΌ Custom Exceptions
π 6. Advanced Python Concepts
βΌ List & Dictionary Comprehensions
βΌ Generators & Iterators
βΌ Decorators
π 7. Essential Libraries
βΌ NumPy (Arrays & Computations)
βΌ Pandas (Data Analysis)
βΌ Matplotlib & Seaborn (Visualization)
π 8. Web Development & APIs
βΌ Web Scraping (BeautifulSoup, Scrapy)
βΌ API Integration (Requests)
βΌ Flask & Django (Backend Development)
π 9. Automation & Scripting
βΌ Automating Tasks with Python
βΌ Working with Selenium & PyAutoGUI
π 10. Data Science & Machine Learning
βΌ Data Cleaning & Preprocessing
βΌ Scikit-Learn (ML Algorithms)
βΌ TensorFlow & PyTorch (Deep Learning)
π 11. Projects
βΌ Build Real-World Applications
βΌ Showcase on GitHub
π 12. β Apply for Jobs
βΌ Strengthen Resume & Portfolio
βΌ Prepare for Technical Interviews
Like for more β€οΈπͺ
π 1. Basics
βΌ Data Types & Variables
βΌ Operators & Expressions
βΌ Control Flow (if, loops)
π 2. Functions & Modules
βΌ Defining Functions
βΌ Lambda Functions
βΌ Importing & Creating Modules
π 3. File Handling
βΌ Reading & Writing Files
βΌ Working with CSV & JSON
π 4. Object-Oriented Programming (OOP)
βΌ Classes & Objects
βΌ Inheritance & Polymorphism
βΌ Encapsulation
π 5. Exception Handling
βΌ Try-Except Blocks
βΌ Custom Exceptions
π 6. Advanced Python Concepts
βΌ List & Dictionary Comprehensions
βΌ Generators & Iterators
βΌ Decorators
π 7. Essential Libraries
βΌ NumPy (Arrays & Computations)
βΌ Pandas (Data Analysis)
βΌ Matplotlib & Seaborn (Visualization)
π 8. Web Development & APIs
βΌ Web Scraping (BeautifulSoup, Scrapy)
βΌ API Integration (Requests)
βΌ Flask & Django (Backend Development)
π 9. Automation & Scripting
βΌ Automating Tasks with Python
βΌ Working with Selenium & PyAutoGUI
π 10. Data Science & Machine Learning
βΌ Data Cleaning & Preprocessing
βΌ Scikit-Learn (ML Algorithms)
βΌ TensorFlow & PyTorch (Deep Learning)
π 11. Projects
βΌ Build Real-World Applications
βΌ Showcase on GitHub
π 12. β Apply for Jobs
βΌ Strengthen Resume & Portfolio
βΌ Prepare for Technical Interviews
Like for more β€οΈπͺ
π6β€3π1
Final Preparation Guide for Data Analytics Interviews: (IMP)
β‘Key SQL Concepts:
- Master SELECT statements, focusing on WHERE, ORDER BY, GROUP BY, and HAVING clauses.
- Understand the basics of JOINS: INNER, LEFT, RIGHT, FULL.
- Get comfortable with aggregate functions like COUNT, SUM, AVG, MAX, and MIN.
- Study subqueries and Common Table Expressions.
- Explore advanced topics like CASE statements, complex JOIN strategies, and Window functions (OVER, PARTITION BY, ROW_NUMBER, RANK).
β‘Python for Data Analysis:
- Review the basics of Python syntax, control structures, and data structures (lists, dictionaries).
- Dive into data manipulation using Pandas and NumPy, covering DataFrames, Series, and group by operations.
- Learn basic plotting techniques with Matplotlib and Seaborn for data visualization.
β‘ Excel Skills:
- Practice cell operations and essential formulas like SUMIFS, COUNTIFS, and AVERAGEIFS.
- Familiarize yourself with PivotTables, PivotCharts, data validation, and What-if analysis.
- Explore advanced formulas and work with the Data Model & Power Pivot.
β‘ Power BI Proficiency:
- Focus on data modeling, including importing data and managing relationships.
- Learn data transformation techniques with Power Query and use DAX for calculated columns and measures.
- Create interactive reports and dashboards, and work on visualizations.
β‘ Basic Statistics:
- Understand fundamental concepts like Mean, Median, Mode, Standard Deviation, and Variance.
- Study probability distributions, Hypothesis Testing, and P-values.
- Learn about Confidence Intervals, Correlation, and Simple Linear Regression.
I have curated best 80+ top-notch Data Analytics Resources ππ
https://whatsapp.com/channel/0029VaGgzAk72WTmQFERKh02
Hope this helps you π
β‘Key SQL Concepts:
- Master SELECT statements, focusing on WHERE, ORDER BY, GROUP BY, and HAVING clauses.
- Understand the basics of JOINS: INNER, LEFT, RIGHT, FULL.
- Get comfortable with aggregate functions like COUNT, SUM, AVG, MAX, and MIN.
- Study subqueries and Common Table Expressions.
- Explore advanced topics like CASE statements, complex JOIN strategies, and Window functions (OVER, PARTITION BY, ROW_NUMBER, RANK).
β‘Python for Data Analysis:
- Review the basics of Python syntax, control structures, and data structures (lists, dictionaries).
- Dive into data manipulation using Pandas and NumPy, covering DataFrames, Series, and group by operations.
- Learn basic plotting techniques with Matplotlib and Seaborn for data visualization.
β‘ Excel Skills:
- Practice cell operations and essential formulas like SUMIFS, COUNTIFS, and AVERAGEIFS.
- Familiarize yourself with PivotTables, PivotCharts, data validation, and What-if analysis.
- Explore advanced formulas and work with the Data Model & Power Pivot.
β‘ Power BI Proficiency:
- Focus on data modeling, including importing data and managing relationships.
- Learn data transformation techniques with Power Query and use DAX for calculated columns and measures.
- Create interactive reports and dashboards, and work on visualizations.
β‘ Basic Statistics:
- Understand fundamental concepts like Mean, Median, Mode, Standard Deviation, and Variance.
- Study probability distributions, Hypothesis Testing, and P-values.
- Learn about Confidence Intervals, Correlation, and Simple Linear Regression.
I have curated best 80+ top-notch Data Analytics Resources ππ
https://whatsapp.com/channel/0029VaGgzAk72WTmQFERKh02
Hope this helps you π
β€2π2
Becoming a Data Analyst in 2025 is more difficult than it was a couple of years ago. The competition has grown but so has the demand for Data Analysts!
There are 5 areas you need to excel at to land a career in data. (so punny...)
1. Skills
2. Experience
3. Networking
4. Job Search
5. Education
Let's dive into the first and most important area, skills.
Skills
Every data analytics job will require a different set of skills for their job description. To cover the majority of entry-level positions, you should focus on the core 3 (or 4 if you have time).
- Excel
- SQL
- Tableau or Power BI
- Python or R(optional)
No need to learn any more than this to get started. Start learning other skills AFTER you land your first job and see what data analytics path you really enjoy.
You might fall into a path that doesn't require Python at all and if you took 3 months to learn it, you wasted 3 months. Your goal should be to get your foot in the door.
Experience
So how do you show that you have experience if you have never worked as a Data Analyst professionally?
It's actually easier than you think!
There are a few ways you can gain experience. volunteer, freelance, or any analytics work at your current job.
First ask your friends, family, or even Reddit if anyone needs help with their data.
Second, you can join Upwork or Fiverr to land some freelance gigs to gain great experience and some extra money.
Thirdly, even if your title isn't "Data Analyst", you might analyze data anyway. Use this as experience!
Networking
I love this section the most. It has been proven by everyone I have mentored that this is one of the most important areas to learn.
Start talking to other Data Analysts, start connecting with the RIGHT people, start posting on LinkedIn, start following people in the field, and start commenting on posts.
All of this, over time, will continue to get "eyes" on your profile. This will lead to more calls, interviews, and like the people I teach, job offers.
Consistency is important here.
Job Search
I believe this is not a skill and is more like a "numbers game". And the ones who excel here, are the ones who are consistent.
I'm not saying you need to apply all day every day but you should spend SOME time applying every day.
This is important because you don't know when exactly a company will be posting their job posting. You also want to be one of the first people to apply so that means you need to check the job boards in multiple small chunks rather than spend all of your time applying in a single chunk of time.
The best way to do this is to open up all of the filters and select the most recent and posted within the last 3 days.
Education
If you have a degree or are currently on your way to getting one, this section doesn't really apply to you since you have a leg up on a lot more job opportunities.
So how else does someone show they are educated enough to become a Data Analyst?
You need to prove it by taking relevant courses in relation to the industry you want to enter. After the course, the actual certificate does not hold much weight unless it's an accredited certificate like a Tableau Professional Certificate.
To counter this, you need to use your project descriptions to explain how you used data to solve a business problem and explain it professionally.
There are so many other areas you could work on but focussing on these to start will definitely get you going in the right direction.
Take time to put these actions to work. Pivot when something isn't working and adapt.
It will take time but these actions will reduce the time it takes you to become a Data Analyst in 2025
Hope this helps you π
There are 5 areas you need to excel at to land a career in data. (so punny...)
1. Skills
2. Experience
3. Networking
4. Job Search
5. Education
Let's dive into the first and most important area, skills.
Skills
Every data analytics job will require a different set of skills for their job description. To cover the majority of entry-level positions, you should focus on the core 3 (or 4 if you have time).
- Excel
- SQL
- Tableau or Power BI
- Python or R(optional)
No need to learn any more than this to get started. Start learning other skills AFTER you land your first job and see what data analytics path you really enjoy.
You might fall into a path that doesn't require Python at all and if you took 3 months to learn it, you wasted 3 months. Your goal should be to get your foot in the door.
Experience
So how do you show that you have experience if you have never worked as a Data Analyst professionally?
It's actually easier than you think!
There are a few ways you can gain experience. volunteer, freelance, or any analytics work at your current job.
First ask your friends, family, or even Reddit if anyone needs help with their data.
Second, you can join Upwork or Fiverr to land some freelance gigs to gain great experience and some extra money.
Thirdly, even if your title isn't "Data Analyst", you might analyze data anyway. Use this as experience!
Networking
I love this section the most. It has been proven by everyone I have mentored that this is one of the most important areas to learn.
Start talking to other Data Analysts, start connecting with the RIGHT people, start posting on LinkedIn, start following people in the field, and start commenting on posts.
All of this, over time, will continue to get "eyes" on your profile. This will lead to more calls, interviews, and like the people I teach, job offers.
Consistency is important here.
Job Search
I believe this is not a skill and is more like a "numbers game". And the ones who excel here, are the ones who are consistent.
I'm not saying you need to apply all day every day but you should spend SOME time applying every day.
This is important because you don't know when exactly a company will be posting their job posting. You also want to be one of the first people to apply so that means you need to check the job boards in multiple small chunks rather than spend all of your time applying in a single chunk of time.
The best way to do this is to open up all of the filters and select the most recent and posted within the last 3 days.
Education
If you have a degree or are currently on your way to getting one, this section doesn't really apply to you since you have a leg up on a lot more job opportunities.
So how else does someone show they are educated enough to become a Data Analyst?
You need to prove it by taking relevant courses in relation to the industry you want to enter. After the course, the actual certificate does not hold much weight unless it's an accredited certificate like a Tableau Professional Certificate.
To counter this, you need to use your project descriptions to explain how you used data to solve a business problem and explain it professionally.
There are so many other areas you could work on but focussing on these to start will definitely get you going in the right direction.
Take time to put these actions to work. Pivot when something isn't working and adapt.
It will take time but these actions will reduce the time it takes you to become a Data Analyst in 2025
Hope this helps you π
π7β€1
1. What is Data Integrity?
Data Integrity is the assurance of accuracy and consistency of data over its entire life-cycle and is a critical aspect of the design, implementation, and usage of any system which stores, processes, or retrieves data. It also defines integrity constraints to enforce business rules on the data when it is entered into an application or a database.
2. What is the Difference Between Joining and Blending in Tableau?
Combining the data from two or more different sources is data blending, such as Oracle, Excel, and SQL Server. In data blending, each data source contains its own set of dimensions and measures. Combining the data between two or more tables or sheets within the same data source is data joining. All the combined tables or sheets contain a common set of dimensions and measures.
3. What is slicing in Python?
As the name suggests, βslicingβ is taking parts of.
Syntax for slicing is [start : stop : step]
start is the starting index from where to slice a list or tuple
stop is the ending index or where to stop.
step is the number of steps to jump.
Default value for start is 0, stop is number of items, step is 1.
Slicing can be done on strings, arrays, lists, and tuples.
4. What is the difference between NOW() and CURRENT_DATE() in SQL?
NOW() returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.
The simple difference between NOW() and CURRENT_DATE() is that NOW() will fetch the current date and time both in format βYYYY-MM_DD HH:MM:SSβ while CURRENT_DATE() will fetch the date of the current day βYYYY-MM_DDβ.
Data Integrity is the assurance of accuracy and consistency of data over its entire life-cycle and is a critical aspect of the design, implementation, and usage of any system which stores, processes, or retrieves data. It also defines integrity constraints to enforce business rules on the data when it is entered into an application or a database.
2. What is the Difference Between Joining and Blending in Tableau?
Combining the data from two or more different sources is data blending, such as Oracle, Excel, and SQL Server. In data blending, each data source contains its own set of dimensions and measures. Combining the data between two or more tables or sheets within the same data source is data joining. All the combined tables or sheets contain a common set of dimensions and measures.
3. What is slicing in Python?
As the name suggests, βslicingβ is taking parts of.
Syntax for slicing is [start : stop : step]
start is the starting index from where to slice a list or tuple
stop is the ending index or where to stop.
step is the number of steps to jump.
Default value for start is 0, stop is number of items, step is 1.
Slicing can be done on strings, arrays, lists, and tuples.
4. What is the difference between NOW() and CURRENT_DATE() in SQL?
NOW() returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.
The simple difference between NOW() and CURRENT_DATE() is that NOW() will fetch the current date and time both in format βYYYY-MM_DD HH:MM:SSβ while CURRENT_DATE() will fetch the date of the current day βYYYY-MM_DDβ.
π4β€1π₯1
UNPOPULAR OPINION: Excel is still relevant for data analysis.
I am often asked by junior data analysts, βWhat is the purpose of learning Excel if they already know Python?β.
The truth is, Excel/Google Sheets are still widely used across most organizations. And if you are working with other people, sooner or later you will be asked to do some quick analysis in Excel.
Yes, even if your organization has Tableau/PowerBI, someone will still download report as CSV and do his own analysis.
If you are just starting your data analytics journey, I always recommend Excel as the first tool to learn.
It will help you to understand how tabular data works.
LOOKUPS are like JOINS in SQL;
VSTACK is UNION in SQL;
and FILTER, SORT, GROUPBY are similar to Python functions.
By learning Excel, you are setting a foundation for other tools.
Excel might not be the trendiest and coolest tool in data analytics, but it is versatile, accessible, and universal.
I am often asked by junior data analysts, βWhat is the purpose of learning Excel if they already know Python?β.
The truth is, Excel/Google Sheets are still widely used across most organizations. And if you are working with other people, sooner or later you will be asked to do some quick analysis in Excel.
Yes, even if your organization has Tableau/PowerBI, someone will still download report as CSV and do his own analysis.
If you are just starting your data analytics journey, I always recommend Excel as the first tool to learn.
It will help you to understand how tabular data works.
LOOKUPS are like JOINS in SQL;
VSTACK is UNION in SQL;
and FILTER, SORT, GROUPBY are similar to Python functions.
By learning Excel, you are setting a foundation for other tools.
Excel might not be the trendiest and coolest tool in data analytics, but it is versatile, accessible, and universal.
π1
Guys, Big Announcement!
Iβm launching a Complete SQL Learning Series β designed for everyone β whether you're a beginner, intermediate, or someone preparing for data interviews.
This is a complete step-by-step journey β from scratch to advanced β filled with practical examples, relatable scenarios, and short quizzes after each topic to solidify your learning.
Hereβs the 5-Week Plan:
Week 1: SQL Fundamentals (No Prior Knowledge Needed)
- What is SQL? Real-world Use Cases
- Databases vs Tables
- SELECT Queries β The Heart of SQL
- Filtering Data with WHERE
- Sorting with ORDER BY
- Using DISTINCT and LIMIT
- Basic Arithmetic and Column Aliases
Week 2: Aggregations & Grouping
- COUNT, SUM, AVG, MIN, MAX β When and How
- GROUP BY β The Right Way
- HAVING vs WHERE
- Dealing with NULLs in Aggregations
- CASE Statements for Conditional Logic
*Week 3: Mastering JOINS & Relationships*
- Understanding Table Relationships (1-to-1, 1-to-Many)
- INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN
- Practical Examples with Two or More Tables
- SELF JOIN & CROSS JOIN β What, When & Why
- Common Join Mistakes & Fixes
Week 4: Advanced SQL Concepts
- Subqueries: Writing Queries Inside Queries
- CTEs (WITH Clause): Cleaner & More Readable SQL
- Window Functions: RANK, DENSE_RANK, ROW_NUMBER
- Using PARTITION BY and ORDER BY
- EXISTS vs IN: Performance and Use Cases
Week 5: Real-World Scenarios & Interview-Ready SQL
- Using SQL to Solve Real Business Problems
- SQL for Sales, Marketing, HR & Product Analytics
- Writing Clean, Efficient & Complex Queries
- Most Common SQL Interview Questions like:
βFind the second highest salaryβ
βDetect duplicates in a tableβ
βCalculate running totalsβ
βIdentify top N products per categoryβ
- Practice Challenges Based on Real Interviews
React with β€οΈ if you're ready for this series
Join our WhatsApp channel to access it: https://whatsapp.com/channel/0029VanC5rODzgT6TiTGoa1v/1075
Iβm launching a Complete SQL Learning Series β designed for everyone β whether you're a beginner, intermediate, or someone preparing for data interviews.
This is a complete step-by-step journey β from scratch to advanced β filled with practical examples, relatable scenarios, and short quizzes after each topic to solidify your learning.
Hereβs the 5-Week Plan:
Week 1: SQL Fundamentals (No Prior Knowledge Needed)
- What is SQL? Real-world Use Cases
- Databases vs Tables
- SELECT Queries β The Heart of SQL
- Filtering Data with WHERE
- Sorting with ORDER BY
- Using DISTINCT and LIMIT
- Basic Arithmetic and Column Aliases
Week 2: Aggregations & Grouping
- COUNT, SUM, AVG, MIN, MAX β When and How
- GROUP BY β The Right Way
- HAVING vs WHERE
- Dealing with NULLs in Aggregations
- CASE Statements for Conditional Logic
*Week 3: Mastering JOINS & Relationships*
- Understanding Table Relationships (1-to-1, 1-to-Many)
- INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN
- Practical Examples with Two or More Tables
- SELF JOIN & CROSS JOIN β What, When & Why
- Common Join Mistakes & Fixes
Week 4: Advanced SQL Concepts
- Subqueries: Writing Queries Inside Queries
- CTEs (WITH Clause): Cleaner & More Readable SQL
- Window Functions: RANK, DENSE_RANK, ROW_NUMBER
- Using PARTITION BY and ORDER BY
- EXISTS vs IN: Performance and Use Cases
Week 5: Real-World Scenarios & Interview-Ready SQL
- Using SQL to Solve Real Business Problems
- SQL for Sales, Marketing, HR & Product Analytics
- Writing Clean, Efficient & Complex Queries
- Most Common SQL Interview Questions like:
βFind the second highest salaryβ
βDetect duplicates in a tableβ
βCalculate running totalsβ
βIdentify top N products per categoryβ
- Practice Challenges Based on Real Interviews
React with β€οΈ if you're ready for this series
Join our WhatsApp channel to access it: https://whatsapp.com/channel/0029VanC5rODzgT6TiTGoa1v/1075
β€4π2π1π1
Complete SQL guide for Data Analytics
1. Introduction to SQL
What is SQL?
β’ SQL (Structured Query Language) is a domain-specific language used for managing and manipulating relational databases. It allows you to interact with data by querying, inserting, updating, and deleting records in a database.
β’ SQL is essential for Data Analytics because it enables analysts to retrieve and manipulate data for analysis, reporting, and decision-making.
Applications in Data Analytics
β’ Data Retrieval: SQL is used to pull data from databases for analysis.
β’ Data Transformation: SQL helps clean, aggregate, and transform data into a usable format for analysis.
β’ Reporting: SQL can be used to create reports by summarizing data or applying business rules.
β’ Data Modeling: SQL helps in preparing datasets for further analysis or machine learning.
2. SQL Basics
Data Types
SQL supports various data types that define the kind of data a column can hold:
β’ Numeric Data Types:
β’ INT: Integer numbers, e.g., 123.
β’ DECIMAL(p,s): Exact numbers with a specified precision and scale, e.g., DECIMAL(10,2) for numbers like 12345.67.
β’ FLOAT: Approximate numbers, e.g., 123.456.
β’ String Data Types:
β’ CHAR(n): Fixed-length strings, e.g., CHAR(10) will always use 10 characters.
β’ VARCHAR(n): Variable-length strings, e.g., VARCHAR(50) can store up to 50 characters.
β’ TEXT: Long text data, e.g., descriptions or long notes.
β’ Date/Time Data Types:
β’ DATE: Stores date values, e.g., 2024-12-01.
β’ DATETIME: Stores both date and time, e.g., 2024-12-01 12:00:00.
Creating and Modifying Tables
You can create, alter, and drop tables using SQL commands:
Data Insertion, Updating, and Deletion
SQL allows you to manipulate data using INSERT, UPDATE, and DELETE commands:
3. Data Retrieval
SELECT Statement
The SELECT statement is used to retrieve data from a database:
Filtering Data with WHERE
The WHERE clause filters data based on specific conditions:
Sorting Data with ORDER BY
The ORDER BY clause sorts the result set by one or more columns:
Aliasing
You can use aliases to rename columns or tables for clarity:
4. Aggregate Functions
Aggregate functions perform calculations on a set of values and return a single result.
Common Aggregate Functions
GROUP BY and HAVING
β’ GROUP BY is used to group rows sharing the same value in a column.
β’ HAVING filters groups based on aggregate conditions.
5. Joins
Joins are used to combine rows from two or more tables based on related columns.
Types of Joins
1. Introduction to SQL
What is SQL?
β’ SQL (Structured Query Language) is a domain-specific language used for managing and manipulating relational databases. It allows you to interact with data by querying, inserting, updating, and deleting records in a database.
β’ SQL is essential for Data Analytics because it enables analysts to retrieve and manipulate data for analysis, reporting, and decision-making.
Applications in Data Analytics
β’ Data Retrieval: SQL is used to pull data from databases for analysis.
β’ Data Transformation: SQL helps clean, aggregate, and transform data into a usable format for analysis.
β’ Reporting: SQL can be used to create reports by summarizing data or applying business rules.
β’ Data Modeling: SQL helps in preparing datasets for further analysis or machine learning.
2. SQL Basics
Data Types
SQL supports various data types that define the kind of data a column can hold:
β’ Numeric Data Types:
β’ INT: Integer numbers, e.g., 123.
β’ DECIMAL(p,s): Exact numbers with a specified precision and scale, e.g., DECIMAL(10,2) for numbers like 12345.67.
β’ FLOAT: Approximate numbers, e.g., 123.456.
β’ String Data Types:
β’ CHAR(n): Fixed-length strings, e.g., CHAR(10) will always use 10 characters.
β’ VARCHAR(n): Variable-length strings, e.g., VARCHAR(50) can store up to 50 characters.
β’ TEXT: Long text data, e.g., descriptions or long notes.
β’ Date/Time Data Types:
β’ DATE: Stores date values, e.g., 2024-12-01.
β’ DATETIME: Stores both date and time, e.g., 2024-12-01 12:00:00.
Creating and Modifying Tables
You can create, alter, and drop tables using SQL commands:
-- Create a table with columns for ID, name, salary, and hire date
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE
);
-- Alter an existing table to add a new column for department
ALTER TABLE employees ADD department VARCHAR(50);
-- Drop a table (delete it from the database)
DROP TABLE employees;Data Insertion, Updating, and Deletion
SQL allows you to manipulate data using INSERT, UPDATE, and DELETE commands:
-- Insert a new employee record
INSERT INTO employees (id, name, salary, hire_date, department)
VALUES (1, 'Alice', 75000.00, '2022-01-15', 'HR');
-- Update the salary of employee with id 1
UPDATE employees
SET salary = 80000
WHERE id = 1;
-- Delete the employee record with id 1
DELETE FROM employees WHERE id = 1;3. Data Retrieval
SELECT Statement
The SELECT statement is used to retrieve data from a database:
SELECT * FROM employees; -- Retrieve all columns
SELECT name, salary FROM employees; -- Retrieve specific columnsFiltering Data with WHERE
The WHERE clause filters data based on specific conditions:
SELECT * FROM employees
WHERE salary > 60000 AND department = 'HR'; -- Filter records based on salary and departmentSorting Data with ORDER BY
The ORDER BY clause sorts the result set by one or more columns:
SELECT * FROM employees
ORDER BY salary DESC; -- Sort by salary in descending orderAliasing
You can use aliases to rename columns or tables for clarity:
SELECT name AS employee_name, salary AS monthly_salary FROM employees;4. Aggregate Functions
Aggregate functions perform calculations on a set of values and return a single result.
Common Aggregate Functions
SELECT COUNT(*) AS total_employees, AVG(salary) AS average_salary
FROM employees; -- Count total employees and calculate the average salaryGROUP BY and HAVING
β’ GROUP BY is used to group rows sharing the same value in a column.
β’ HAVING filters groups based on aggregate conditions.
-- Find average salary by department
SELECT department, AVG(salary) AS average_salary
FROM employees
GROUP BY department;
-- Filter groups with more than 5 employees
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;5. Joins
Joins are used to combine rows from two or more tables based on related columns.
Types of Joins
β€1π₯1
β’ INNER JOIN: Returns rows that have matching values in both tables.
β’ LEFT JOIN: Returns all rows from the left table and matched rows from the right table. If no match, returns NULL.
β’ RIGHT JOIN: Returns all rows from the right table and matched rows from the left table. If no match, returns NULL.
β’ FULL OUTER JOIN: Returns all rows when there is a match in one of the tables.
6. Subqueries and Nested Queries
Subqueries are queries embedded inside other queries. They can be used in the SELECT, FROM, and WHERE clauses.
Correlated Subqueries
A correlated subquery references columns from the outer query.
Using Subqueries in SELECT
You can also use subqueries in the SELECT statement:
7. Advanced SQL
Window Functions
Window functions perform calculations across a set of table rows related to the current row. They do not collapse rows like GROUP BY.
Common Table Expressions (CTEs)
A CTE is a temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement.
8. Data Transformation and Cleaning
CASE Statements
The CASE statement allows you to perform conditional logic within SQL queries.
String Functions
SQL offers several functions to manipulate strings:
Date and Time Functions
SQL allows you to work with date and time values:
9. Database Management
Indexing
Indexes improve query performance by allowing faster retrieval of rows.
Views
A view is a virtual table based on the result of a query. It simplifies complex queries by allowing you to reuse the logic.
Transactions
A transaction ensures that a series of SQL operations are completed successfully. If any part fails, the entire transaction can be rolled back to maintain data integrity.
Best SQL Interview Resources
SELECT e.name, e.salary, d.department_name
FROM employees e
INNER JOIN departments d ON e.department = d.department_id;β’ LEFT JOIN: Returns all rows from the left table and matched rows from the right table. If no match, returns NULL.
SELECT e.name, e.salary, d.department_name
FROM employees e
LEFT JOIN departments d ON e.department = d.department_id;β’ RIGHT JOIN: Returns all rows from the right table and matched rows from the left table. If no match, returns NULL.
SELECT e.name, e.salary, d.department_name
FROM employees e
RIGHT JOIN departments d ON e.department = d.department_id;β’ FULL OUTER JOIN: Returns all rows when there is a match in one of the tables.
SELECT e.name, e.salary, d.department_name
FROM employees e
FULL OUTER JOIN departments d ON e.department = d.department_id;6. Subqueries and Nested Queries
Subqueries are queries embedded inside other queries. They can be used in the SELECT, FROM, and WHERE clauses.
Correlated Subqueries
A correlated subquery references columns from the outer query.
-- Find employees with salaries above the average salary of their department
SELECT name, salary
FROM employees e1
WHERE salary > (SELECT AVG(salary)
FROM employees e2
WHERE e1.department = e2.department);Using Subqueries in SELECT
You can also use subqueries in the SELECT statement:
SELECT name,
(SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;7. Advanced SQL
Window Functions
Window functions perform calculations across a set of table rows related to the current row. They do not collapse rows like GROUP BY.
-- Rank employees by salary within each department
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;Common Table Expressions (CTEs)
A CTE is a temporary result set that can be referenced within a SELECT, INSERT, UPDATE, or DELETE statement.
-- Calculate department-wise average salary using a CTE
WITH avg_salary_cte AS (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
)
SELECT e.name, e.salary, a.avg_salary
FROM employees e
JOIN avg_salary_cte a ON e.department = a.department;8. Data Transformation and Cleaning
CASE Statements
The CASE statement allows you to perform conditional logic within SQL queries.
-- Categorize employees based on salary
SELECT name,
CASE
WHEN salary < 50000 THEN 'Low'
WHEN salary BETWEEN 50000 AND 100000 THEN 'Medium'
ELSE 'High'
END AS salary_category
FROM employees;String Functions
SQL offers several functions to manipulate strings:
-- Concatenate first and last names
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
-- Trim extra spaces from a string
SELECT TRIM(name) FROM employees;Date and Time Functions
SQL allows you to work with date and time values:
-- Calculate tenure in days
SELECT name, DATEDIFF(CURDATE(), hire_date) AS days_tenure
FROM employees;9. Database Management
Indexing
Indexes improve query performance by allowing faster retrieval of rows.
-- Create an index on the department column for faster lookups
CREATE INDEX idx_department ON employees(department);Views
A view is a virtual table based on the result of a query. It simplifies complex queries by allowing you to reuse the logic.
-- Create a view for high-salary employees
CREATE VIEW high_salary_employees AS
SELECT name, salary
FROM employees
WHERE salary > 100000;
-- Query the view
SELECT * FROM high_salary_employees;Transactions
A transaction ensures that a series of SQL operations are completed successfully. If any part fails, the entire transaction can be rolled back to maintain data integrity.
-- -- Transaction example
START TRANSACTION;
UPDATE employees SET salary = salary + 5000 WHERE department = 'HR';
DELETE FROM employees WHERE id = 10;
COMMIT; -- Commit the transaction if allBest SQL Interview Resources
π5
π§ Technologies for Data Analysts!
π Data Manipulation & Analysis
βͺοΈ Excel β Spreadsheet Data Analysis & Visualization
βͺοΈ SQL β Structured Query Language for Data Extraction
βͺοΈ Pandas (Python) β Data Analysis with DataFrames
βͺοΈ NumPy (Python) β Numerical Computing for Large Datasets
βͺοΈ Google Sheets β Online Collaboration for Data Analysis
π Data Visualization
βͺοΈ Power BI β Business Intelligence & Dashboarding
βͺοΈ Tableau β Interactive Data Visualization
βͺοΈ Matplotlib (Python) β Plotting Graphs & Charts
βͺοΈ Seaborn (Python) β Statistical Data Visualization
βͺοΈ Google Data Studio β Free, Web-Based Visualization Tool
π ETL (Extract, Transform, Load)
βͺοΈ SQL Server Integration Services (SSIS) β Data Integration & ETL
βͺοΈ Apache NiFi β Automating Data Flows
βͺοΈ Talend β Data Integration for Cloud & On-premises
π§Ή Data Cleaning & Preparation
βͺοΈ OpenRefine β Clean & Transform Messy Data
βͺοΈ Pandas Profiling (Python) β Data Profiling & Preprocessing
βͺοΈ DataWrangler β Data Transformation Tool
π¦ Data Storage & Databases
βͺοΈ SQL β Relational Databases (MySQL, PostgreSQL, MS SQL)
βͺοΈ NoSQL (MongoDB) β Flexible, Schema-less Data Storage
βͺοΈ Google BigQuery β Scalable Cloud Data Warehousing
βͺοΈ Redshift β Amazonβs Cloud Data Warehouse
βοΈ Data Automation
βͺοΈ Alteryx β Data Blending & Advanced Analytics
βͺοΈ Knime β Data Analytics & Reporting Automation
βͺοΈ Zapier β Connect & Automate Data Workflows
π Advanced Analytics & Statistical Tools
βͺοΈ R β Statistical Computing & Analysis
βͺοΈ Python (SciPy, Statsmodels) β Statistical Modeling & Hypothesis Testing
βͺοΈ SPSS β Statistical Software for Data Analysis
βͺοΈ SAS β Advanced Analytics & Predictive Modeling
π Collaboration & Reporting
βͺοΈ Power BI Service β Online Sharing & Collaboration for Dashboards
βͺοΈ Tableau Online β Cloud-Based Visualization & Sharing
βͺοΈ Google Analytics β Web Traffic Data Insights
βͺοΈ Trello / JIRA β Project & Task Management for Data Projects
Data-Driven Decisions with the Right Tools!
React β€οΈ for more
π Data Manipulation & Analysis
βͺοΈ Excel β Spreadsheet Data Analysis & Visualization
βͺοΈ SQL β Structured Query Language for Data Extraction
βͺοΈ Pandas (Python) β Data Analysis with DataFrames
βͺοΈ NumPy (Python) β Numerical Computing for Large Datasets
βͺοΈ Google Sheets β Online Collaboration for Data Analysis
π Data Visualization
βͺοΈ Power BI β Business Intelligence & Dashboarding
βͺοΈ Tableau β Interactive Data Visualization
βͺοΈ Matplotlib (Python) β Plotting Graphs & Charts
βͺοΈ Seaborn (Python) β Statistical Data Visualization
βͺοΈ Google Data Studio β Free, Web-Based Visualization Tool
π ETL (Extract, Transform, Load)
βͺοΈ SQL Server Integration Services (SSIS) β Data Integration & ETL
βͺοΈ Apache NiFi β Automating Data Flows
βͺοΈ Talend β Data Integration for Cloud & On-premises
π§Ή Data Cleaning & Preparation
βͺοΈ OpenRefine β Clean & Transform Messy Data
βͺοΈ Pandas Profiling (Python) β Data Profiling & Preprocessing
βͺοΈ DataWrangler β Data Transformation Tool
π¦ Data Storage & Databases
βͺοΈ SQL β Relational Databases (MySQL, PostgreSQL, MS SQL)
βͺοΈ NoSQL (MongoDB) β Flexible, Schema-less Data Storage
βͺοΈ Google BigQuery β Scalable Cloud Data Warehousing
βͺοΈ Redshift β Amazonβs Cloud Data Warehouse
βοΈ Data Automation
βͺοΈ Alteryx β Data Blending & Advanced Analytics
βͺοΈ Knime β Data Analytics & Reporting Automation
βͺοΈ Zapier β Connect & Automate Data Workflows
π Advanced Analytics & Statistical Tools
βͺοΈ R β Statistical Computing & Analysis
βͺοΈ Python (SciPy, Statsmodels) β Statistical Modeling & Hypothesis Testing
βͺοΈ SPSS β Statistical Software for Data Analysis
βͺοΈ SAS β Advanced Analytics & Predictive Modeling
π Collaboration & Reporting
βͺοΈ Power BI Service β Online Sharing & Collaboration for Dashboards
βͺοΈ Tableau Online β Cloud-Based Visualization & Sharing
βͺοΈ Google Analytics β Web Traffic Data Insights
βͺοΈ Trello / JIRA β Project & Task Management for Data Projects
Data-Driven Decisions with the Right Tools!
React β€οΈ for more
β€5π1