Above attached is 150 SQL queries for practice β€οΈ
π1
Latest job opportunities for data analyst, business analyst and data scientists
ππ
https://www.linkedin.com/posts/sql-analysts_data-engineering-data-analytics-and-data-activity-7136926271753666560-DtGH?utm_source=share&utm_medium=member_android
ππ
https://www.linkedin.com/posts/sql-analysts_data-engineering-data-analytics-and-data-activity-7136926271753666560-DtGH?utm_source=share&utm_medium=member_android
π4π€2
π11β€1
Which of the following is not a python library?
Anonymous Quiz
6%
Pandas
4%
Numpy
75%
Sciborn
16%
Seaborn
Q1: How do you ensure data consistency and integrity in a data warehousing environment?
Ans: I implement data validation checks, use constraints like primary and foreign keys, and ensure that ETL processes have error-handling mechanisms. Regular audits and data reconciliation processes are also set up to ensure data accuracy and consistency.
Q2: Describe a situation where you had to design a star schema for a data warehousing project.
Ans: For a retail sales data warehousing project, I designed a star schema with a central fact table containing sales transactions. Surrounding this were dimension tables like Products, Stores, Time, and Customers. This structure allowed for efficient querying and reporting of sales metrics across various dimensions.
Q3: How would you use data analytics to assess credit risk for loan applicants?
Ans: I'd analyze the applicant's financial history, including credit score, income, employment stability, and existing debts. Using predictive modeling, I'd assess the probability of default based on historical data of similar applicants. This would help in making informed lending decisions.
Q4: Describe a situation where you had to ensure data security for sensitive financial data.
Ans: While working on a project involving customer transaction data, I ensured that all data was encrypted both at rest and in transit. I also implemented role-based access controls, ensuring that only authorized personnel could access specific data sets. Regular audits and penetration tests were conducted to identify and rectify potential vulnerabilities.
Ans: I implement data validation checks, use constraints like primary and foreign keys, and ensure that ETL processes have error-handling mechanisms. Regular audits and data reconciliation processes are also set up to ensure data accuracy and consistency.
Q2: Describe a situation where you had to design a star schema for a data warehousing project.
Ans: For a retail sales data warehousing project, I designed a star schema with a central fact table containing sales transactions. Surrounding this were dimension tables like Products, Stores, Time, and Customers. This structure allowed for efficient querying and reporting of sales metrics across various dimensions.
Q3: How would you use data analytics to assess credit risk for loan applicants?
Ans: I'd analyze the applicant's financial history, including credit score, income, employment stability, and existing debts. Using predictive modeling, I'd assess the probability of default based on historical data of similar applicants. This would help in making informed lending decisions.
Q4: Describe a situation where you had to ensure data security for sensitive financial data.
Ans: While working on a project involving customer transaction data, I ensured that all data was encrypted both at rest and in transit. I also implemented role-based access controls, ensuring that only authorized personnel could access specific data sets. Regular audits and penetration tests were conducted to identify and rectify potential vulnerabilities.
π8β€4
Data Analytics Interview Topics in structured way :
π΅Python: Data Structures: Lists, tuples, dictionaries, sets Pandas: Data manipulation (DataFrame operations, merging, reshaping) NumPy: Numeric computing, arrays Visualization: Matplotlib, Seaborn for creating charts
π΅SQL: Basic : SELECT, WHERE, JOIN, GROUP BY, ORDER BY Advanced : Subqueries, nested queries, window functions DBMS: Creating tables, altering schema, indexing Joins: Inner join, outer join, left/right join Data Manipulation: UPDATE, DELETE, INSERT statements Aggregate Functions: SUM, AVG, COUNT, MAX, MIN
π΅Excel: Formulas & Functions: VLOOKUP, HLOOKUP, IF, SUMIF, COUNTIF Data Cleaning: Removing duplicates, handling errors, text-to-columns PivotTables Charts and Graphs What-If Analysis: Scenario Manager, Goal Seek, Solver
π΅Power BI:
Data Modeling: Creating relationships between datasets
Transformation: Cleaning & shaping data using
Power Query Editor Visualization: Creating interactive reports and dashboards
DAX (Data Analysis Expressions): Formulas for calculated columns, measures Publishing and sharing reports, scheduling data refresh
π΅ Statistics Fundamentals: Mean, median, mode Variance, standard deviation Probability distributions Hypothesis testing, p-values, confidence intervals
π΅Data Manipulation and Cleaning: Data preprocessing techniques (handling missing values, outliers), Data normalization and standardization Data transformation Handling categorical data
π΅Data Visualization: Chart types (bar, line, scatter, histogram, boxplot) Data visualization libraries (matplotlib, seaborn, ggplot) Effective data storytelling through visualization
Also showcase these skills using data portfolio if possible
Like for more content like this π
π΅Python: Data Structures: Lists, tuples, dictionaries, sets Pandas: Data manipulation (DataFrame operations, merging, reshaping) NumPy: Numeric computing, arrays Visualization: Matplotlib, Seaborn for creating charts
π΅SQL: Basic : SELECT, WHERE, JOIN, GROUP BY, ORDER BY Advanced : Subqueries, nested queries, window functions DBMS: Creating tables, altering schema, indexing Joins: Inner join, outer join, left/right join Data Manipulation: UPDATE, DELETE, INSERT statements Aggregate Functions: SUM, AVG, COUNT, MAX, MIN
π΅Excel: Formulas & Functions: VLOOKUP, HLOOKUP, IF, SUMIF, COUNTIF Data Cleaning: Removing duplicates, handling errors, text-to-columns PivotTables Charts and Graphs What-If Analysis: Scenario Manager, Goal Seek, Solver
π΅Power BI:
Data Modeling: Creating relationships between datasets
Transformation: Cleaning & shaping data using
Power Query Editor Visualization: Creating interactive reports and dashboards
DAX (Data Analysis Expressions): Formulas for calculated columns, measures Publishing and sharing reports, scheduling data refresh
π΅ Statistics Fundamentals: Mean, median, mode Variance, standard deviation Probability distributions Hypothesis testing, p-values, confidence intervals
π΅Data Manipulation and Cleaning: Data preprocessing techniques (handling missing values, outliers), Data normalization and standardization Data transformation Handling categorical data
π΅Data Visualization: Chart types (bar, line, scatter, histogram, boxplot) Data visualization libraries (matplotlib, seaborn, ggplot) Effective data storytelling through visualization
Also showcase these skills using data portfolio if possible
Like for more content like this π
π30β€15π₯°4π€1π1
Data Analytics Interview Questions
Q1: Describe a situation where you had to clean a messy dataset. What steps did you take?
Ans: I encountered a dataset with missing values, duplicates, and inconsistent formats. I used Python's Pandas library to identify and handle missing values, standardized data formats using regular expressions, and removed duplicates. I also validated the cleaned data against known benchmarks to ensure accuracy.
Q2: How do you handle outliers in a dataset?
Ans: I start by visualizing the data using box plots or scatter plots to identify potential outliers. Then, depending on the nature of the data and the problem context, I might cap the outliers, transform the data, or even remove them if they're due to errors.
Q3: How would you use data to suggest optimal pricing strategies to Airbnb hosts?
Ans: I'd analyze factors like location, property type, amenities, local events, and historical booking rates. Using regression analysis, I'd model the relationship between these factors and pricing to suggest an optimal price range. Additionally, analyzing competitor pricing in the area can provide insights into market rates.
Q4: Describe a situation where you used data to improve the user experience on the Airbnb platform.
Ans: While analyzing user feedback and platform interaction data, I noticed that users often had difficulty navigating the booking process. Based on this, I suggested streamlining the booking steps and providing clearer instructions. A/B testing confirmed that these changes led to a higher conversion rate and improved user feedback.
Q1: Describe a situation where you had to clean a messy dataset. What steps did you take?
Ans: I encountered a dataset with missing values, duplicates, and inconsistent formats. I used Python's Pandas library to identify and handle missing values, standardized data formats using regular expressions, and removed duplicates. I also validated the cleaned data against known benchmarks to ensure accuracy.
Q2: How do you handle outliers in a dataset?
Ans: I start by visualizing the data using box plots or scatter plots to identify potential outliers. Then, depending on the nature of the data and the problem context, I might cap the outliers, transform the data, or even remove them if they're due to errors.
Q3: How would you use data to suggest optimal pricing strategies to Airbnb hosts?
Ans: I'd analyze factors like location, property type, amenities, local events, and historical booking rates. Using regression analysis, I'd model the relationship between these factors and pricing to suggest an optimal price range. Additionally, analyzing competitor pricing in the area can provide insights into market rates.
Q4: Describe a situation where you used data to improve the user experience on the Airbnb platform.
Ans: While analyzing user feedback and platform interaction data, I noticed that users often had difficulty navigating the booking process. Based on this, I suggested streamlining the booking steps and providing clearer instructions. A/B testing confirmed that these changes led to a higher conversion rate and improved user feedback.
π13β€2
Q1: How would you handle real-time data streaming for analyzing user listening patterns?
Ans: I'd use platforms like Apache Kafka for real-time data ingestion. Using Python, I'd process this stream to identify real-time patterns and store aggregated data for further analysis.
Q2: Describe a situation where you had to use time series analysis to forecast a trend.
Ans: I analyzed monthly active users to forecast future growth. Using Python's statsmodels, I applied ARIMA modeling to the time series data and provided a forecast for the next six months.
Q3: How would you segment and analyze user behavior based on their music preferences?
Ans: I'd cluster users based on their listening history using unsupervised machine learning techniques like K-means clustering. This would help in creating personalized playlists or recommendations.
Q4: How do you handle missing or incomplete data in user listening logs?
Ans: I'd use imputation methods based on the nature of the missing data. For instance, if a user's listening time is missing, I might impute it based on their average listening time or use collaborative filtering methods to estimate it based on similar users.
Ans: I'd use platforms like Apache Kafka for real-time data ingestion. Using Python, I'd process this stream to identify real-time patterns and store aggregated data for further analysis.
Q2: Describe a situation where you had to use time series analysis to forecast a trend.
Ans: I analyzed monthly active users to forecast future growth. Using Python's statsmodels, I applied ARIMA modeling to the time series data and provided a forecast for the next six months.
Q3: How would you segment and analyze user behavior based on their music preferences?
Ans: I'd cluster users based on their listening history using unsupervised machine learning techniques like K-means clustering. This would help in creating personalized playlists or recommendations.
Q4: How do you handle missing or incomplete data in user listening logs?
Ans: I'd use imputation methods based on the nature of the missing data. For instance, if a user's listening time is missing, I might impute it based on their average listening time or use collaborative filtering methods to estimate it based on similar users.
π5
1. What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
2. What is OLTP?
OLTP, or online transactional processing, allows huge groups of people to execute massive amounts of database transactions in real time, usually via the internet. A database transaction occurs when data in a database is changed, inserted, deleted, or queried.
3. What is the difference between joining and blending in Tableau?
Joining term is used when you are combining data from the same source, for example, worksheet in an Excel file or tables in Oracle databaseWhile blending requires two completely defined data sources in your report.
4. How to prevent someone from copying the cell from your worksheet in excel?
If you want to protect your worksheet from being copied, go into Menu bar > Review > Protect sheet > Password.
By entering password you can prevent your worksheet from getting copied.
5. What are the different integrity rules present in the DBMS?
The different integrity rules present in DBMS are as follows:
Entity Integrity: This rule states that the value of the primary key can never be NULL. So, all the tuples in the column identified as the primary key should have a value.
Referential Integrity: This rule states that either the value of the foreign key is NULL or it should be the primary key of any other relation.
A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
2. What is OLTP?
OLTP, or online transactional processing, allows huge groups of people to execute massive amounts of database transactions in real time, usually via the internet. A database transaction occurs when data in a database is changed, inserted, deleted, or queried.
3. What is the difference between joining and blending in Tableau?
Joining term is used when you are combining data from the same source, for example, worksheet in an Excel file or tables in Oracle databaseWhile blending requires two completely defined data sources in your report.
4. How to prevent someone from copying the cell from your worksheet in excel?
If you want to protect your worksheet from being copied, go into Menu bar > Review > Protect sheet > Password.
By entering password you can prevent your worksheet from getting copied.
5. What are the different integrity rules present in the DBMS?
The different integrity rules present in DBMS are as follows:
Entity Integrity: This rule states that the value of the primary key can never be NULL. So, all the tuples in the column identified as the primary key should have a value.
Referential Integrity: This rule states that either the value of the foreign key is NULL or it should be the primary key of any other relation.
π15β€4π2π€1
π1
Q1: How would you analyze data to understand user connection patterns on a professional network?
Ans: I'd use graph databases like Neo4j for social network analysis. By analyzing connection patterns, I can identify influencers or isolated communities.
Q2: Describe a challenging data visualization you created to represent user engagement metrics.
Ans: I visualized multi-dimensional data showing user engagement across features, regions, and time using tools like D3.js, creating an interactive dashboard with drill-down capabilities.
Q3: How would you identify and target passive job seekers on LinkedIn?
Ans: I'd analyze user behavior patterns, like increased profile updates, frequent visits to job postings, or engagement with career-related content, to identify potential passive job seekers.
Q4: How do you measure the effectiveness of a new feature launched on LinkedIn?
Ans: I'd set up A/B tests, comparing user engagement metrics between those who have access to the new feature and a control group. I'd then analyze metrics like time spent, feature usage frequency, and overall platform engagement to measure effectiveness.
Ans: I'd use graph databases like Neo4j for social network analysis. By analyzing connection patterns, I can identify influencers or isolated communities.
Q2: Describe a challenging data visualization you created to represent user engagement metrics.
Ans: I visualized multi-dimensional data showing user engagement across features, regions, and time using tools like D3.js, creating an interactive dashboard with drill-down capabilities.
Q3: How would you identify and target passive job seekers on LinkedIn?
Ans: I'd analyze user behavior patterns, like increased profile updates, frequent visits to job postings, or engagement with career-related content, to identify potential passive job seekers.
Q4: How do you measure the effectiveness of a new feature launched on LinkedIn?
Ans: I'd set up A/B tests, comparing user engagement metrics between those who have access to the new feature and a control group. I'd then analyze metrics like time spent, feature usage frequency, and overall platform engagement to measure effectiveness.
π19β€4π2
π4β€2π€1
33 companies that are CURRENTLY HIRING for 100% REMOTE JOBS
ππ
https://www.linkedin.com/posts/sql-analysts_jobboard-remotehiring-remoteworking-activity-7141483435960832000-2k4s?utm_source=share&utm_medium=member_android
Like this LinkedIn post and bookmark it for your future reference
ππ
https://www.linkedin.com/posts/sql-analysts_jobboard-remotehiring-remoteworking-activity-7141483435960832000-2k4s?utm_source=share&utm_medium=member_android
Like this LinkedIn post and bookmark it for your future reference
π9π2β€1
Hey everyone!
May I request you all to FOLLOW our Data Analytics page
Here's the exclusive link
π https://www.linkedin.com/company/sql-analysts/
This is an official linkedin page for free courses & updates!
Including our giveaways, sessions & much more!
May I request you all to FOLLOW our Data Analytics page
Here's the exclusive link
π https://www.linkedin.com/company/sql-analysts/
This is an official linkedin page for free courses & updates!
Including our giveaways, sessions & much more!
π7π€1
Most Awaited Giveaway for Free Python Courses with Projects
ππ
https://www.linkedin.com/posts/sql-analysts_python-viral-pythonprogramming-activity-7141826568485605376-tf7V?utm_source=share&utm_medium=member_android
ππ
https://www.linkedin.com/posts/sql-analysts_python-viral-pythonprogramming-activity-7141826568485605376-tf7V?utm_source=share&utm_medium=member_android
π4β€2π1
Complete Syllabus for Data Analytics interview:
SQL:
1. Basic
- SELECT statements with WHERE, ORDER BY, GROUP BY, HAVING
- Basic JOINS (INNER, LEFT, RIGHT, FULL)
- Creating and using simple databases and tables
2. Intermediate
- Aggregate functions (COUNT, SUM, AVG, MAX, MIN)
- Subqueries and nested queries
- Common Table Expressions (WITH clause)
- CASE statements for conditional logic in queries
3. Advanced
- Advanced JOIN techniques (self-join, non-equi join)
- Window functions (OVER, PARTITION BY, ROW_NUMBER, RANK, DENSE_RANK, lead, lag)
- optimization with indexing
- Data manipulation (INSERT, UPDATE, DELETE)
Python:
1. Basic
- Syntax, variables, data types (integers, floats, strings, booleans)
- Control structures (if-else, for and while loops)
- Basic data structures (lists, dictionaries, sets, tuples)
- Functions, lambda functions, error handling (try-except)
- Modules and packages
2. Pandas & Numpy
- Creating and manipulating DataFrames and Series
- Indexing, selecting, and filtering data
- Handling missing data (fillna, dropna)
- Data aggregation with groupby, summarizing data
- Merging, joining, and concatenating datasets
3. Basic Visualization
- Basic plotting with Matplotlib (line plots, bar plots, histograms)
- Visualization with Seaborn (scatter plots, box plots, pair plots)
- Customizing plots (sizes, labels, legends, color palettes)
- Introduction to interactive visualizations (e.g., Plotly)
Excel:
1. Basic
- Cell operations, basic formulas (SUMIFS, COUNTIFS, AVERAGEIFS, IF, AND, OR, NOT & Nested Functions etc.)
- Introduction to charts and basic data visualization
- Data sorting and filtering
- Conditional formatting
2. Intermediate
- Advanced formulas (V/XLOOKUP, INDEX-MATCH, nested IF)
- PivotTables and PivotCharts for summarizing data
- Data validation tools
- What-if analysis tools (Data Tables, Goal Seek)
3. Advanced
- Array formulas and advanced functions
- Data Model & Power Pivot
- Advanced Filter
- Slicers and Timelines in Pivot Tables
- Dynamic charts and interactive dashboards
Power BI:
1. Data Modeling
- Importing data from various sources
- Creating and managing relationships between different datasets
- Data modeling basics (star schema, snowflake schema)
2. Data Transformation
- Using Power Query for data cleaning and transformation
- Advanced data shaping techniques
- Calculated columns and measures using DAX
3. Data Visualization and Reporting
- Creating interactive reports and dashboards
- Visualizations (bar, line, pie charts, maps)
- Publishing and sharing reports, scheduling data refreshes
Statistics Fundamentals:
Mean, Median, Mode, Standard Deviation, Variance, Probability Distributions, Hypothesis Testing, P-values, Confidence Intervals, Correlation, Simple Linear Regression, Normal Distribution, Binomial Distribution, Poisson Distribution.
SQL:
1. Basic
- SELECT statements with WHERE, ORDER BY, GROUP BY, HAVING
- Basic JOINS (INNER, LEFT, RIGHT, FULL)
- Creating and using simple databases and tables
2. Intermediate
- Aggregate functions (COUNT, SUM, AVG, MAX, MIN)
- Subqueries and nested queries
- Common Table Expressions (WITH clause)
- CASE statements for conditional logic in queries
3. Advanced
- Advanced JOIN techniques (self-join, non-equi join)
- Window functions (OVER, PARTITION BY, ROW_NUMBER, RANK, DENSE_RANK, lead, lag)
- optimization with indexing
- Data manipulation (INSERT, UPDATE, DELETE)
Python:
1. Basic
- Syntax, variables, data types (integers, floats, strings, booleans)
- Control structures (if-else, for and while loops)
- Basic data structures (lists, dictionaries, sets, tuples)
- Functions, lambda functions, error handling (try-except)
- Modules and packages
2. Pandas & Numpy
- Creating and manipulating DataFrames and Series
- Indexing, selecting, and filtering data
- Handling missing data (fillna, dropna)
- Data aggregation with groupby, summarizing data
- Merging, joining, and concatenating datasets
3. Basic Visualization
- Basic plotting with Matplotlib (line plots, bar plots, histograms)
- Visualization with Seaborn (scatter plots, box plots, pair plots)
- Customizing plots (sizes, labels, legends, color palettes)
- Introduction to interactive visualizations (e.g., Plotly)
Excel:
1. Basic
- Cell operations, basic formulas (SUMIFS, COUNTIFS, AVERAGEIFS, IF, AND, OR, NOT & Nested Functions etc.)
- Introduction to charts and basic data visualization
- Data sorting and filtering
- Conditional formatting
2. Intermediate
- Advanced formulas (V/XLOOKUP, INDEX-MATCH, nested IF)
- PivotTables and PivotCharts for summarizing data
- Data validation tools
- What-if analysis tools (Data Tables, Goal Seek)
3. Advanced
- Array formulas and advanced functions
- Data Model & Power Pivot
- Advanced Filter
- Slicers and Timelines in Pivot Tables
- Dynamic charts and interactive dashboards
Power BI:
1. Data Modeling
- Importing data from various sources
- Creating and managing relationships between different datasets
- Data modeling basics (star schema, snowflake schema)
2. Data Transformation
- Using Power Query for data cleaning and transformation
- Advanced data shaping techniques
- Calculated columns and measures using DAX
3. Data Visualization and Reporting
- Creating interactive reports and dashboards
- Visualizations (bar, line, pie charts, maps)
- Publishing and sharing reports, scheduling data refreshes
Statistics Fundamentals:
Mean, Median, Mode, Standard Deviation, Variance, Probability Distributions, Hypothesis Testing, P-values, Confidence Intervals, Correlation, Simple Linear Regression, Normal Distribution, Binomial Distribution, Poisson Distribution.
π101β€41π₯°4π4π4π€2π2
1. What do Tableau's sets and groups mean?
Data is grouped using sets and groups according to predefined criteria. The primary distinction between the two is that although a set can have only two optionsβeither in or outβa group can divide the dataset into several groups. A user should decide which group or sets to apply based on the conditions.
3.What do you mean by a Bag of Words (BOW)?
It is used for word frequency or occurrences to train a classifier.
It contains a text representation that describes the frequency with which words appear in a document.
It has two steps:
-A list of terms that are well-known.
-A metric for determining the existence of well-known terms.
3. What are Nested Triggers?
Triggers may implement DML by using INSERT, UPDATE, and DELETE statements. These triggers that contain DML and find other triggers for data modification are called Nested Triggers.
4. What is a True positive rate and a false positive rate?
True positive rate or Recall: It gives us the percentage of the true positives captured by the model out of all the Actual Positive class.
TPR = TP/ (TP+FN)
False Positive rate: It gives us the percentage of all the false positives by my model prediction from the all Actual Negative class.
FPR = FP/(FP+TN)
Data is grouped using sets and groups according to predefined criteria. The primary distinction between the two is that although a set can have only two optionsβeither in or outβa group can divide the dataset into several groups. A user should decide which group or sets to apply based on the conditions.
3.What do you mean by a Bag of Words (BOW)?
It is used for word frequency or occurrences to train a classifier.
It contains a text representation that describes the frequency with which words appear in a document.
It has two steps:
-A list of terms that are well-known.
-A metric for determining the existence of well-known terms.
3. What are Nested Triggers?
Triggers may implement DML by using INSERT, UPDATE, and DELETE statements. These triggers that contain DML and find other triggers for data modification are called Nested Triggers.
4. What is a True positive rate and a false positive rate?
True positive rate or Recall: It gives us the percentage of the true positives captured by the model out of all the Actual Positive class.
TPR = TP/ (TP+FN)
False Positive rate: It gives us the percentage of all the false positives by my model prediction from the all Actual Negative class.
FPR = FP/(FP+TN)
π15β€2π€1