Data Analyst Interview Resources
51.3K subscribers
254 photos
1 video
51 files
317 links
Join our telegram channel to learn how data analysis can reveal fascinating patterns, trends, and stories hidden within the numbers! πŸ“Š

For ads & suggestions: @love_data
Download Telegram
1. Give an overview of the fact and dimensions of the table?

Facts are numeric measures of data. They are stored in fact tables. Fact tables store that type of data that will be analyzed by dimension tables. Fact tables have foreign keys associating with dimension tables.
Dimensions are descriptive attributes of data. Those will be stored in the dimensions table. For example, customer’s information like name, number, and email will be stored in the dimension table.


2. Explain the limitation of context filters in Tableau?

Whenever we set a context filter, Tableau generates a temp table that needs to refresh each and every time, whenever the view is triggered. So, if the context filter is changed in the database, it needs to recompute the temp table, so the performance will be decreased.


3. What is the difference between published data and embedded data sources?

The published data source contains connection information that is independent of workbooks and can be used by multiple workbooks.
The embedded data source contains connection information but it is associated with the workbooks.


4. Explain the disaggregation and aggregation of data in Tableau?

Aggregation β†’ The process of summarizing the data and viewing a single numeric value is called aggregation. Example – sum/avg of salary for each employee
Disaggregation β†’The process of viewing each transaction for analyzing all the measures both dependently and independently. Example – individual salary transactions for each employee.
πŸ‘17❀3
Good way to Prepare for a Data Analyst Interview?

A mock interview is a practice interview that closely mimics a real one. In a mock interview, an experienced data analyst tests a less experienced person's knowledge and skills.


πŸ‘‰ Practice with Google for free

Google offers Interview Warmup. You can practice answering  questions and get quick feedback on your answers. This helps you get better and feel more confident for real interviews.

Like for more πŸ‘
πŸ‘30❀5πŸ₯°3
Python Most Important Interview Questions

Question 1: Calculate the average stock price for Company X over the last 6 months.

Question 2: Identify the month with the highest total sales for Company Y using their monthly sales data.

Question 3: Find the maximum and minimum stock price for Company Z on any given day in the last year.

Question 4: Create a column in the DataFrame showing the percentage change in stock price from the previous day for Company X.

Question 5: Determine the number of days when the stock price of Company Y was above its 30-day moving average. Question

6: Compare the average stock price of Companies X and Z in the first quarter of the year.

#Data#
----------------------------------------------
import pandas as pd
data = {   'Date': pd.date_range(start='2023-01-01', periods=180, freq='D'),   'CompanyX_StockPrice': pd.np.random.randint(50, 150, 180),   'CompanyY_Sales': pd.np.random.randint(20000, 50000, 180),   'CompanyZ_StockPrice': pd.np.random.randint(70, 200, 180) }

df = pd.DataFrame(data)
πŸ‘16πŸŽ‰1πŸ‘Œ1
Excel Interview Questions

Topics to Cover:
β€’ Data manipulation
β€’ Formulas and functions
β€’ Pivot tables
β€’ Data visualization

Questions and Answers:

1 Q: How do you use VLOOKUP in Excel?

A: VLOOKUP (Vertical Lookup) searches for a value in the first column of a range and returns a value in the same row from a specified column.

Syntax:
=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]).

2 Q: What is a Pivot Table and how is it useful?

A: A Pivot Table is a data summarization tool that is used in Excel. It allows you to automatically sort, count, and total data stored in one table and display the results in a second table showing the summarized data.

3 Q: How can you remove duplicates from a dataset in Excel?

A: You can remove duplicates by selecting the data range, going to the Data tab, and clicking on "Remove Duplicates". Excel will prompt you to select columns where duplicates should be checked.

4 Q: What is the use of the IF function in Excel?

A: The IF function checks a condition and returns one value if true and another value if false.

Syntax: =IF(logical_test, value_if_true, value_if_false).

5 Q: Explain how to create a chart in Excel.

A: To create a chart, select the data range, go to the Insert tab, choose the desired chart type (e.g., bar, line, pie), and customize the chart as needed using the Chart Tools.

Join for more: https://t.iss.one/excel_analyst
πŸ‘21❀6πŸ€”2
1. What are Query and Query language?

A query is nothing but a request sent to a database to retrieve data or information. The required data can be retrieved from a table or many tables in the database.

Query languages use various types of queries to retrieve data from databases. SQL, Datalog, and AQL are a few examples of query languages; however, SQL is known to be the widely used query language.



2. What are Superkey and candidate key?

A super key may be a single or a combination of keys that help to identify a record in a table. Know that Super keys can have one or more attributes, even though all the attributes are not necessary to identify the records.

A candidate key is the subset of Superkey, which can have one or more than one attributes to identify records in a table. Unlike Superkey, all the attributes of the candidate key must be helpful to identify the records.


3. What do you mean by buffer pool and mention its benefits?

A buffer pool in SQL is also known as a buffer cache. All the resources can store their cached data pages in a buffer pool. The size of the buffer pool can be defined during the configuration of an instance of SQL Server.
The following are the benefits of a buffer pool:

Increase in I/O performance
Reduction in I/O latency
Increase in transaction throughput
Increase in reading performance


4. What is the difference between Zero and NULL values in SQL?

When a field in a column doesn’t have any value, it is said to be having a NULL value. Simply put, NULL is the blank field in a table. It can cancel be considered as an unassigned, unknown, or unavailable value. On the contrary, zero is a number, and it is an available, assigned, and known value.
πŸ‘13❀3πŸ‘Œ1
These 10 tips will make you feel like an expert and increase your productivity 100X:

1. Excel Keyboard Shortcuts:

These save a lot of time.

For example, you can press "Ctrl+C" to copy, "Ctrl+V" to paste, and "Ctrl+Z" to undo.

There are many more, so check out this cheatsheet:

Excel for Data Analysis
πŸ‘6
Want to become a data analyst?

Stage 1 – Excel
Stage 2 – SQL + Project
Stage 3 – Python (Pandas, NumPy) + Project
Stage 4 – Data Visualization (Matplotlib, Seaborn) + Project
Stage 5 – Statistics + Project
Stage 6 – Machine Learning (Scikit-learn) + Project
Stage 7 – Big Data Tools (Hadoop, Spark) + Project

πŸ† – DataAnalytics
πŸ‘36πŸ‘Œ6❀4πŸ€”3πŸ‘1
Here are some Statistics Interview Questions for Data analyst Interview

Que 1. What Is the Difference Between Inferential Statistics and Descriptive Statistics?
Ans 1. The difference between inferential statistics and descriptive statistics is that inferential statistics are used to draw conclusions about a population based on the data you’ve collected. In contrast, descriptive statistics are used to summarize your data.

Que 2. What Is the Difference Between Quantitative Data and Qualitative Data?
Ans 2. Quantitative data is numerical data that can be measured, counted, and expressed as a percentage. Qualitative data is non-numerical information that describes subjective experiences or opinions about an event or topic.

Que 3. How Do You Calculate Range and Interquartile Range?
Ans 3. Range and interquartile range are two ways to calculate the spread of data. The range is the difference between the highest and lowest value in a set of data. The interquartile range is the difference between the 75th percentile and 25th percentile of a set of data.

Que 4. Explain Pareto Principle
Ans 4. The Pareto Principle, also known as the 80-20 rule, is a principle that states that 20% of causes are responsible for 80% of effects.

Que 5. What Are Left-Skewed Distribution and Right-Skewed Distribution?
Ans 5. Left-skewed distributions have a longer tail to the left (lower values), while right-skewed distributions have a longer tail to the right (higher values).

Que 6. What Is an Outlier, and How Can You Find One?
Ans 6. An outlier is an observation point that is distant from other data points. It’s important to note that the term β€œoutlier” doesn’t refer to the numerical value of a data point but rather the distance between it and all other values.

Que 7. What Are Skewness and Kurtosis?
Ans 7. Skewness is an excellent way to measure the symmetry of distribution and the likelihood of a given value falling in the tails. With symmetrical distribution, the mean and median coincide. If the data distribution isn’t symmetrical, it’s skewed.
There are two types of skewness:
Positive is when the right tail is longer. Most values are clustered around the left tail, and the median is smaller than the mean.
Negative is when the left tail is longer. Most values are clustered around the right tail, and the median is greater than the mean

Kurtosis, on the other hand, reveals how heavy or light-tailed data is compared to the normal distribution. There are three types of kurtoses:
Mesokurtic distributions approximate a normal distribution.
Leptokurtic distributions have a pointy shape and heavy tails, indicating a high probability of extreme events occurring.
Platykurtic distributions have a flat shape and light tails. They reveal a low probability of the occurrence of extreme events.
πŸ‘19πŸ‘1
Best practices for writing SQL queries:

Join for more: https://t.iss.one/learndataanalysis

1- Write SQL keywords in capital letters.

2- Use table aliases with columns when you are joining multiple tables.

3- Never use select *, always mention list of columns in select clause.

4- Add useful comments wherever you write complex logic. Avoid too many comments.

5- Use joins instead of subqueries when possible for better performance.

6- Create CTEs instead of multiple sub queries , it will make your query easy to read.

7- Join tables using JOIN keywords instead of writing join condition in where clause for better readability.

8- Never use order by in sub queries , It will unnecessary increase runtime.

9- If you know there are no duplicates in 2 tables, use UNION ALL instead of UNION for better performance.

SQL Basics: https://t.iss.one/sqlanalyst/105
πŸ‘16❀1πŸ‘1
If you are a data analyst and thinking of getting started with freelancing on upwork then here's something you should know.

You should be ready to invest money if you want to get started with freelancing on upwork.

So there's something called connects on Upwork. For simplicity you can consider connects as the currency of upwork which one will spend while submitting a proposal for the freelancing tasks listed on the platform.

Previously upwork used to give some free connects to every new account but these days they don't. So you have to buy the connects at the rate of 100 connects per $15 + Taxes (without upgrading to upwork plus) which will be 1.3k + taxes in INR.

Let's say you submit proposal for those jobs asking for 20 connects, the max you will be able to submit is 5 jobs and you will get the job or not again depend on many factors.
You may end up having no jobs even after spending 100 connects and then again you have to repeat the cycle.

Everything looks shiny from outside but reality can be different.
Every platform requires investment either in the form of time, dedication, money or combination of all.
πŸ‘14πŸ‘2
If you want to earn 6-figures working as a data analyst, learn these 6 important skills:

Excel - advanced Excel functions for data manipulation and interpretation.

Data Cleaning is about mastering data preprocessing and cleaning techniques.

Python/R - data analysis, preparation and manipulation

Statistical Analysis - understanding fundamental statistics for data

Data Visualization - clear and effective visual representations of data

SQL - querying and managing databases efficiently
πŸ‘17❀4
I'm sure you had an idea, but something got in the way and you didn't develop it. The channel "Usual thing" is about this, the author tries to implement different business ideas, but every day he encounters problems and discusses them with you.
https://t.iss.one/usual_thing
πŸ‘2πŸ‘Œ1
1. What are the various types of refresh options provided in Power BI?

Package refresh - This synchronizes your Power BI Desktop or Excel file between the Power BI service and OneDrive, or SharePoint Online.
Model or data refresh - This refreshes the dataset within the Power BI service with data from the original data source.
Tile refresh - This updates the cache for tile visuals every 15 minutes on the dashboard once data changes.
Visual container refresh - This refreshes the visible container and updates the cached report visuals within a report once the data changes.

2. Explain some date manipulation functions in SQL.

Getdate: As its name suggests, the getdate function gives us today’s date. Dateadd: The dateadd function is used for adding a time or date interval to a date.Datediff: The datediff function is used for calculating the difference between two dates based on a given interval. Datename: The datename function can be used for extracting the parts of a date. Year, month, day: The year, month, and day functions allow for decomposing a date.


3. What is CTE in SQL?

A CTE (Common Table Expression) is a one-time result set that only exists for the duration of the query. It allows us to refer to data within a single SELECT, INSERT, UPDATE, DELETE, CREATE VIEW, or MERGE statement's execution scope. It is temporary because its result cannot be stored anywhere and will be lost as soon as a query's execution is completed.
πŸ‘11❀1
TOP CONCEPTS FOR INTERVIEW PREPARATION!!

πŸš€TOP 10 SQL Concepts for Job Interview

1. Aggregate Functions (SUM/AVG)
2. Group By and Order By
3. JOINs (Inner/Left/Right)
4. Union and Union All
5. Date and Time processing
6. String processing
7. Window Functions (Partition by)
8. Subquery
9. View and Index
10. Common Table Expression (CTE)


πŸš€TOP 10 Statistics Concepts for Job Interview

1. Sampling
2. Experiments (A/B tests)
3. Descriptive Statistics
4. p-value
5. Probability Distributions
6. t-test
7. ANOVA
8. Correlation
9. Linear Regression
10. Logistics Regression


πŸš€TOP 10 Python Concepts for Job Interview

1. Reading data from file/table
2. Writing data to file/table
3. Data Types
4. Function
5. Data Preprocessing (numpy/pandas)
6. Data Visualisation (Matplotlib/seaborn/bokeh)
7. Machine Learning (sklearn)
8. Deep Learning (Tensorflow/Keras/PyTorch)
9. Distributed Processing (PySpark)
10. Functional and Object Oriented Programming

Like ❀️ the post if it was helpful to you!!!
πŸ‘16❀8πŸ‘1
Free Resources for Numpy and Pandas:

Codebasics Numpy playlist: 
https://www.youtube.com/playlist?list=PLeo1K3hjS3uset9zIVzJWqplaWBiacTEU

Codebasics pandas playlist (first 9): 
https://www.youtube.com/playlist?list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy

Freecodecamp matplotlib playlist: 
https://youtu.be/3Xc3CA655Y4

Seaborn tutorials: 
https://youtu.be/GcXcSZ0gQps

Pandas for beginners
https://t.iss.one/datasciencefun/660

Numpy for beginners
https://t.iss.one/datasciencefree/156
πŸ‘11❀3πŸ₯°1