Data Science & Machine Learning
72.8K subscribers
772 photos
2 videos
68 files
679 links
Join this channel to learn data science, artificial intelligence and machine learning with funny quizzes, interesting projects and amazing resources for free

For collaborations: @love_data
Download Telegram
Interview QnAs For ML Engineer

1.What are the various steps involved in an data analytics project?

The steps involved in a data analytics project are:

Data collection
Data cleansing
Data pre-processing
EDA
Creation of train test and validation sets
Model creation
Hyperparameter tuning
Model deployment


2. Explain Star Schema.

Star schema is a data warehousing concept in which all schema is connected to a central schema.


3. What is root cause analysis?

Root cause analysis is the process of tracing back of occurrence of an event and the factors which lead to it. It’s generally done when a software malfunctions. In data science, root cause analysis helps businesses understand the semantics behind certain outcomes.


4. Define Confounding Variables.

A confounding variable is an external influence in an experiment. In simple words, these variables change the effect of a dependent and independent variable. A variable should satisfy below conditions to be a confounding variable :

Variables should be correlated to the independent variable.
Variables should be informally related to the dependent variable.
For example, if you are studying whether a lack of exercise has an effect on weight gain, then the lack of exercise is an independent variable and weight gain is a dependent variable. A confounder variable can be any other factor that has an effect on weight gain. Amount of food consumed, weather conditions etc. can be a confounding variable.
πŸ‘2
CS109a: Introduction to Data Science
πŸ‘‡πŸ‘‡
lectures
This media is not supported in your browser
VIEW IN TELEGRAM
❀7πŸ‘5
Managing Machine Learning Projects
Simon Thompson, 2022
πŸ‘‡πŸ‘‡
https://t.iss.one/Programming_experts/121
Which of the following tool can't be used for Data Visualization?
Anonymous Quiz
6%
Tableau
10%
Power BI
9%
Matplotlib
75%
Javascript
πŸ‘5πŸ€”1
To become a Machine Learning Engineer:

β€’ Python
β€’ numpy, pandas, matplotlib, Scikit-Learn
β€’ TensorFlow or PyTorch
β€’ Jupyter, Colab
β€’ Analysis > Code
β€’ 99%: Foundational algorithms
β€’ 1%: Other algorithms
β€’ Solve problems ← This is key
β€’ Teaching = 2 Γ— Learning
β€’ Have fun!
πŸ‘33πŸ₯°5❀1
Useful Pandas🐼 method you should definitely know

βœ… head()
βœ… info()
βœ… fillna()
βœ… melt()
βœ… pivot()
βœ… query()
βœ… merge()
βœ… assign()
βœ… groupby()
βœ… describe()
βœ… sample()
βœ… replace()
βœ… rename()
πŸ‘15😁1
Data Analyst Interview Questions
[Python, SQL, PowerBI]

1. Is indentation required in python?
Ans:
Indentation is necessary for Python. It specifies a block of code. All code within loops, classes, functions, etc is specified within an indented block. It is usually done using four space characters. If your code is not indented necessarily, it will not execute accurately and will throw errors as well.

2. What are Entities and Relationships?
Ans:
Entity:
An entity can be a real-world object that can be easily identifiable. For example, in a college database, students, professors, workers, departments, and projects can be referred to as entities.

Relationships: Relations or links between entities that have something to do with each other. For example – The employee’s table in a company’s database can be associated with the salary table in the same database.

3. What are Aggregate and Scalar functions?
Ans:
An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. A scalar function returns a single value based on the input value.

4. What are Custom Visuals in Power BI?
Ans:
Custom Visuals are like any other visualizations, generated using Power BI. The only difference is that it develops the custom visuals using a custom SDK. The languages like JQuery and JavaScript are used to create custom visuals in Power BI

ENJOY LEARNING πŸ‘πŸ‘
πŸ‘18
❀4πŸ‘3πŸ‘1
Harvard CS109A #DataScience course materials β€” huge collection free & open!

1. Lecture notes
2. R code, #Python notebooks
3. Lab material
4. Advanced sections
and more ...

https://harvard-iacs.github.io/2019-CS109A/pages/materials.html
πŸ‘9😁1
Which of the following command isn't used in pandas?
Anonymous Quiz
6%
head()
4%
replace()
9%
groupby()
4%
rename()
77%
datasciencefun()
😁13🀩5πŸ‘3πŸ‘2πŸ”₯2
American Express is hiring
Position: Data Science Analyst
πŸ‘‰ Apply: https://aexp.eightfold.ai/careers/job/13347327
πŸ‘ All the best.
πŸ‘5
Amazon is hiring Data Scientist Intern!
Qualifications: Bachelor's/ Master's Degree
Salary: 5.4 LPA (Expected)
Batch: 2019/2020/2021/2022/2023
Experience: Freshers
Location: Bangalore, India

πŸ“ŒApply Link: https://www.amazon.jobs/en/jobs/2213292/data-scientist-intern
πŸ‘3
πŸ‘2❀1
Every ML project should keep the following documentation:

β€’ Change log
β€’ Tech debt log
β€’ Potential risks
β€’ Experiment logs
β€’ Future work ideas
β€’ List of assumptions
β€’ ETL pipeline description
πŸ‘10❀1
Advanced Data Analytics Using Python.pdf
2.2 MB
Advanced Data Analytics Using Python
With Machine Learning, Deep Learning and NLP Examples
#book #Ml
πŸ‘4
Do you want roadmap for becoming data scientist in this channel?
Anonymous Poll
96%
Yes
4%
No
🀩6❀4πŸ‘1πŸ‘1πŸŽ‰1
Important Topics to become a data scientist
[Advanced Level]
πŸ‘‡πŸ‘‡

1. Mathematics

Linear Algebra
Analytic Geometry
Matrix
Vector Calculus
Optimization
Regression
Dimensionality Reduction
Density Estimation
Classification

2. Probability

Introduction to Probability
1D Random Variable
The function of One Random Variable
Joint Probability Distribution
Discrete Distribution
Normal Distribution

3. Statistics

Introduction to Statistics
Data Description
Random Samples
Sampling Distribution
Parameter Estimation
Hypotheses Testing
Regression

4. Programming

Python:

Python Basics
List
Set
Tuples
Dictionary
Function
NumPy
Pandas
Matplotlib/Seaborn

R Programming:

R Basics
Vector
List
Data Frame
Matrix
Array
Function
dplyr
ggplot2
Tidyr
Shiny

DataBase:
SQL
MongoDB

Data Structures

Web scraping

Linux

Git

5. Machine Learning

How Model Works
Basic Data Exploration
First ML Model
Model Validation
Underfitting & Overfitting
Random Forest
Handling Missing Values
Handling Categorical Variables
Pipelines
Cross-Validation(R)
XGBoost(Python|R)
Data Leakage

6. Deep Learning

Artificial Neural Network
Convolutional Neural Network
Recurrent Neural Network
TensorFlow
Keras
PyTorch
A Single Neuron
Deep Neural Network
Stochastic Gradient Descent
Overfitting and Underfitting
Dropout Batch Normalization
Binary Classification

7. Feature Engineering

Baseline Model
Categorical Encodings
Feature Generation
Feature Selection

8. Natural Language Processing

Text Classification
Word Vectors

9. Data Visualization Tools

BI (Business Intelligence):
Tableau
Power BI
Qlik View
Qlik Sense

10. Deployment

Microsoft Azure
Heroku
Google Cloud Platform
Flask
Django

Join @datasciencefun to learning important data science and machine learning concepts

ENJOY LEARNING πŸ‘πŸ‘
πŸ‘30❀7
Some of the essential libraries of Python that are used in Data Science

Numpy

SciPy

Pandas

Matplotlib

Keras

TensorFlow

Scikit-learn
πŸ‘14