Dataset Name: 1.88 Million US Wildfires
Basic Description: 24 years of geo-referenced wildfire records
π FULL DATASET DESCRIPTION:
==================================
This data publication contains a spatial database of wildfires that occurred in the United States from 1992 to 2015. It is the third update of a publication originally generated to support the national Fire Program Analysis (FPA) system. The wildfire records were acquired from the reporting systems of federal, state, and local fire organizations. The following core data elements were required for records to be included in this data publication: discovery date, final fire size, and a point location at least as precise as Public Land Survey System (PLSS) section (1-square mile grid). The data were transformed to conform, when possible, to the data standards of the National Wildfire Coordinating Group (NWCG). Basic error-checking was performed and redundant records were identified and removed, to the degree possible. The resulting product, referred to as the Fire Program Analysis fire-occurrence database (FPA FOD), includes 1.88 million geo-referenced wildfire records, representing a total of 140 million acres burned during the 24-year period.
This dataset is an SQLite database that contains the following information:
π₯ DATASET DOWNLOAD INFORMATION
==================================
π° Direct dataset download link:
https://www.kaggle.com/api/v1/datasets/download/rtatman/188-million-us-wildfires
π΄ Dataset Size: Download dataset as zip (176 MB)
π Additional information:
==================================
File count not found
Views: 411,000
Downloads: 38,600
π RELATED NOTEBOOKS:
==================================
1. Exercise: Creating, Reading and Writing | Upvotes: 453,001
URL: https://www.kaggle.com/code/residentmario/exercise-creating-reading-and-writing
2. Exercise: Indexing, Selecting & Assigning | Upvotes: 319,639
URL: https://www.kaggle.com/code/residentmario/exercise-indexing-selecting-assigning
3. Exercise: Summary Functions and Maps | Upvotes: 269,410
URL: https://www.kaggle.com/code/residentmario/exercise-summary-functions-and-maps
4. Next Day Wildfire Spread | Upvotes: 40
URL: https://www.kaggle.com/datasets/fantineh/next-day-wildfire-spread
5. Fire statistics dataset | Upvotes: 8
URL: https://www.kaggle.com/datasets/sujaykapadnis/fire-statistics-dataset
Basic Description: 24 years of geo-referenced wildfire records
π FULL DATASET DESCRIPTION:
==================================
This data publication contains a spatial database of wildfires that occurred in the United States from 1992 to 2015. It is the third update of a publication originally generated to support the national Fire Program Analysis (FPA) system. The wildfire records were acquired from the reporting systems of federal, state, and local fire organizations. The following core data elements were required for records to be included in this data publication: discovery date, final fire size, and a point location at least as precise as Public Land Survey System (PLSS) section (1-square mile grid). The data were transformed to conform, when possible, to the data standards of the National Wildfire Coordinating Group (NWCG). Basic error-checking was performed and redundant records were identified and removed, to the degree possible. The resulting product, referred to as the Fire Program Analysis fire-occurrence database (FPA FOD), includes 1.88 million geo-referenced wildfire records, representing a total of 140 million acres burned during the 24-year period.
This dataset is an SQLite database that contains the following information:
π₯ DATASET DOWNLOAD INFORMATION
==================================
π° Direct dataset download link:
https://www.kaggle.com/api/v1/datasets/download/rtatman/188-million-us-wildfires
π΄ Dataset Size: Download dataset as zip (176 MB)
π Additional information:
==================================
File count not found
Views: 411,000
Downloads: 38,600
π RELATED NOTEBOOKS:
==================================
1. Exercise: Creating, Reading and Writing | Upvotes: 453,001
URL: https://www.kaggle.com/code/residentmario/exercise-creating-reading-and-writing
2. Exercise: Indexing, Selecting & Assigning | Upvotes: 319,639
URL: https://www.kaggle.com/code/residentmario/exercise-indexing-selecting-assigning
3. Exercise: Summary Functions and Maps | Upvotes: 269,410
URL: https://www.kaggle.com/code/residentmario/exercise-summary-functions-and-maps
4. Next Day Wildfire Spread | Upvotes: 40
URL: https://www.kaggle.com/datasets/fantineh/next-day-wildfire-spread
5. Fire statistics dataset | Upvotes: 8
URL: https://www.kaggle.com/datasets/sujaykapadnis/fire-statistics-dataset
β€4
π° Python Roadmap for Beginners
βββ π Introduction to Python
βββ π§Ύ Installing Python & Setting Up VS Code / Jupyter
βββ βοΈ Python Syntax & Indentation Basics
βββ π€ Variables, Data Types (int, float, str, bool)
βββ β Operators (Arithmetic, Comparison, Logical)
βββ π Conditional Statements (if, elif, else)
βββ π Loops (for, while, break, continue)
βββ π§° Functions (def, return, args, kwargs)
βββ π¦ Built-in Data Structures (List, Tuple, Set, Dictionary)
βββ π§ List Comprehension & Dictionary Comprehension
βββ π File Handling (read, write, with open)
βββ π Error Handling (try, except, finally)
βββ π§± Modules & Packages (import, pip install)
βββ π Working with Libraries (NumPy, Pandas, Matplotlib)
βββ π§Ή Data Cleaning with Pandas
βββ π§ͺ Exploratory Data Analysis (EDA)
βββ π€ Intro to OOP in Python (Class, Objects, Inheritance)
βββ π§ Real-World Python Projects & Challenges
SQL Roadmap: https://t.iss.one/sqlspecialist/1340
Power BI Roadmap: https://t.iss.one/sqlspecialist/1397
Python Resources: https://t.iss.one/pythonproz
Hope it helps :)
βββ π Introduction to Python
βββ π§Ύ Installing Python & Setting Up VS Code / Jupyter
βββ βοΈ Python Syntax & Indentation Basics
βββ π€ Variables, Data Types (int, float, str, bool)
βββ β Operators (Arithmetic, Comparison, Logical)
βββ π Conditional Statements (if, elif, else)
βββ π Loops (for, while, break, continue)
βββ π§° Functions (def, return, args, kwargs)
βββ π¦ Built-in Data Structures (List, Tuple, Set, Dictionary)
βββ π§ List Comprehension & Dictionary Comprehension
βββ π File Handling (read, write, with open)
βββ π Error Handling (try, except, finally)
βββ π§± Modules & Packages (import, pip install)
βββ π Working with Libraries (NumPy, Pandas, Matplotlib)
βββ π§Ή Data Cleaning with Pandas
βββ π§ͺ Exploratory Data Analysis (EDA)
βββ π€ Intro to OOP in Python (Class, Objects, Inheritance)
βββ π§ Real-World Python Projects & Challenges
SQL Roadmap: https://t.iss.one/sqlspecialist/1340
Power BI Roadmap: https://t.iss.one/sqlspecialist/1397
Python Resources: https://t.iss.one/pythonproz
Hope it helps :)
β€2
Data Science is very vast field.
I saw one linkedin profile today with below skills π
Technical Skills:
Data Manipulation: Numpy, Pandas, BeautifulSoup, PySpark
Data Visualization: EDA- Matplotlib, Seaborn, Plotly, Tableau, PowerBI
Machine Learning: Scikit-Learn, TimeSeries Analysis
MLOPs: Gensinms, Github Actions, Gitlab CI/CD, mlflows, WandB, comet
Deep Learning: PyTorch, TensorFlow, Keras
Natural Language Processing: NLTK, NER, Spacy, word2vec, Kmeans, KNN, DBscan
Computer Vision: openCV, Yolo-V5, unet, cnn, resnet
Version Control: Git, Github, Gitlab
Database: SQL, NOSQL, Databricks
Web Frameworks: Streamlit, Flask, FastAPI, Streamlit
Generative AI - HuggingFace, LLM, Langchain, GPT-3.5, and GPT-4
Project Management and collaboration tool- JIRA, Confluence
Deployment- AWS, GCP, Docker, Google Vertex AI, Data Robot AI, Big ML, Microsoft Azure
How many of them do you have?
I saw one linkedin profile today with below skills π
Technical Skills:
Data Manipulation: Numpy, Pandas, BeautifulSoup, PySpark
Data Visualization: EDA- Matplotlib, Seaborn, Plotly, Tableau, PowerBI
Machine Learning: Scikit-Learn, TimeSeries Analysis
MLOPs: Gensinms, Github Actions, Gitlab CI/CD, mlflows, WandB, comet
Deep Learning: PyTorch, TensorFlow, Keras
Natural Language Processing: NLTK, NER, Spacy, word2vec, Kmeans, KNN, DBscan
Computer Vision: openCV, Yolo-V5, unet, cnn, resnet
Version Control: Git, Github, Gitlab
Database: SQL, NOSQL, Databricks
Web Frameworks: Streamlit, Flask, FastAPI, Streamlit
Generative AI - HuggingFace, LLM, Langchain, GPT-3.5, and GPT-4
Project Management and collaboration tool- JIRA, Confluence
Deployment- AWS, GCP, Docker, Google Vertex AI, Data Robot AI, Big ML, Microsoft Azure
How many of them do you have?
β€3
Dataset Name: Hand Gesture Recognition Database
Basic Description: Acquired by Leap Motion
π FULL DATASET DESCRIPTION:
==================================
Hand gesture recognition database is presented, composed by a set of near infrared images acquired by the Leap Motion sensor.
The database is composed by 10 different hand-gestures (showed above) that were performed by 10 different subjects (5 men and 5 women).
The database is structured in different folders as:
π₯ DATASET DOWNLOAD INFORMATION
==================================
π΄ Dataset Size: Download dataset as zip (2 GB)
π° Direct dataset download link:
https://www.kaggle.com/api/v1/datasets/download/gti-upm/leapgestrecog
π Additional information:
==================================
Total files: 20,000
Views: 255,000
Downloads: 35,200
Basic Description: Acquired by Leap Motion
π FULL DATASET DESCRIPTION:
==================================
Hand gesture recognition database is presented, composed by a set of near infrared images acquired by the Leap Motion sensor.
The database is composed by 10 different hand-gestures (showed above) that were performed by 10 different subjects (5 men and 5 women).
The database is structured in different folders as:
π₯ DATASET DOWNLOAD INFORMATION
==================================
π΄ Dataset Size: Download dataset as zip (2 GB)
π° Direct dataset download link:
https://www.kaggle.com/api/v1/datasets/download/gti-upm/leapgestrecog
π Additional information:
==================================
Total files: 20,000
Views: 255,000
Downloads: 35,200
β€2
Data Science Portfolio - Kaggle Datasets & AI Projects | Artificial Intelligence
Dataset Name: Hand Gesture Recognition Database Basic Description: Acquired by Leap Motion π FULL DATASET DESCRIPTION: ================================== Hand gesture recognition database is presented, composed by a set of near infrared images acquired byβ¦
π RELATED NOTEBOOKS:
==================================
1. Hand Gesture Recognition Database with CNN | Upvotes: 1,022
URL: https://www.kaggle.com/code/benenharrington/hand-gesture-recognition-database-with-cnn
2. [keras] Hand Gesture Recognition CNN | Upvotes: 492
URL: https://www.kaggle.com/code/kageyama/keras-hand-gesture-recognition-cnn
3. 100% in Hand Gesture Recognition | Upvotes: 245
URL: https://www.kaggle.com/code/mohamedgobara/100-in-hand-gesture-recognition
4. Multi-Modal Dataset for Hand Gesture Recognition | Upvotes: 49
URL: https://www.kaggle.com/datasets/gti-upm/multimodhandgestrec
5. Hand Gesture Recognition Dataset | Upvotes: 8
URL: https://www.kaggle.com/datasets/tapakah68/hand-gesture-recognition-dataset
==============================
==================================
1. Hand Gesture Recognition Database with CNN | Upvotes: 1,022
URL: https://www.kaggle.com/code/benenharrington/hand-gesture-recognition-database-with-cnn
2. [keras] Hand Gesture Recognition CNN | Upvotes: 492
URL: https://www.kaggle.com/code/kageyama/keras-hand-gesture-recognition-cnn
3. 100% in Hand Gesture Recognition | Upvotes: 245
URL: https://www.kaggle.com/code/mohamedgobara/100-in-hand-gesture-recognition
4. Multi-Modal Dataset for Hand Gesture Recognition | Upvotes: 49
URL: https://www.kaggle.com/datasets/gti-upm/multimodhandgestrec
5. Hand Gesture Recognition Dataset | Upvotes: 8
URL: https://www.kaggle.com/datasets/tapakah68/hand-gesture-recognition-dataset
==============================
β€2
π Data Science Summarized: The Core Pillars of Success! π
β 1οΈβ£ Statistics:
The backbone of data analysis and decision-making.
Used for hypothesis testing, distributions, and drawing actionable insights.
β 2οΈβ£ Mathematics:
Critical for building models and understanding algorithms.
Focus on:
Linear Algebra
Calculus
Probability & Statistics
β 3οΈβ£ Python:
The most widely used language in data science.
Essential libraries include:
Pandas
NumPy
Scikit-Learn
TensorFlow
β 4οΈβ£ Machine Learning:
Use algorithms to uncover patterns and make predictions.
Key types:
Regression
Classification
Clustering
β 5οΈβ£ Domain Knowledge:
Context matters.
Understand your industry to build relevant, useful, and accurate models.
β 1οΈβ£ Statistics:
The backbone of data analysis and decision-making.
Used for hypothesis testing, distributions, and drawing actionable insights.
β 2οΈβ£ Mathematics:
Critical for building models and understanding algorithms.
Focus on:
Linear Algebra
Calculus
Probability & Statistics
β 3οΈβ£ Python:
The most widely used language in data science.
Essential libraries include:
Pandas
NumPy
Scikit-Learn
TensorFlow
β 4οΈβ£ Machine Learning:
Use algorithms to uncover patterns and make predictions.
Key types:
Regression
Classification
Clustering
β 5οΈβ£ Domain Knowledge:
Context matters.
Understand your industry to build relevant, useful, and accurate models.
β€2
Tools Every AI Engineer Should Know
1. Data Science Tools
Python: Preferred language with libraries like NumPy, Pandas, Scikit-learn.
R: Ideal for statistical analysis and data visualization.
Jupyter Notebook: Interactive coding environment for Python and R.
MATLAB: Used for mathematical modeling and algorithm development.
RapidMiner: Drag-and-drop platform for machine learning workflows.
KNIME: Open-source analytics platform for data integration and analysis.
2. Machine Learning Tools
Scikit-learn: Comprehensive library for traditional ML algorithms.
XGBoost & LightGBM: Specialized tools for gradient boosting.
TensorFlow: Open-source framework for ML and DL.
PyTorch: Popular DL framework with a dynamic computation graph.
H2O.ai: Scalable platform for ML and AutoML.
Auto-sklearn: AutoML for automating the ML pipeline.
3. Deep Learning Tools
Keras: User-friendly high-level API for building neural networks.
PyTorch: Excellent for research and production in DL.
TensorFlow: Versatile for both research and deployment.
ONNX: Open format for model interoperability.
OpenCV: For image processing and computer vision.
Hugging Face: Focused on natural language processing.
4. Data Engineering Tools
Apache Hadoop: Framework for distributed storage and processing.
Apache Spark: Fast cluster-computing framework.
Kafka: Distributed streaming platform.
Airflow: Workflow automation tool.
Fivetran: ETL tool for data integration.
dbt: Data transformation tool using SQL.
5. Data Visualization Tools
Tableau: Drag-and-drop BI tool for interactive dashboards.
Power BI: Microsoftβs BI platform for data analysis and visualization.
Matplotlib & Seaborn: Python libraries for static and interactive plots.
Plotly: Interactive plotting library with Dash for web apps.
D3.js: JavaScript library for creating dynamic web visualizations.
6. Cloud Platforms
AWS: Services like SageMaker for ML model building.
Google Cloud Platform (GCP): Tools like BigQuery and AutoML.
Microsoft Azure: Azure ML Studio for ML workflows.
IBM Watson: AI platform for custom model development.
7. Version Control and Collaboration Tools
Git: Version control system.
GitHub/GitLab: Platforms for code sharing and collaboration.
Bitbucket: Version control for teams.
8. Other Essential Tools
Docker: For containerizing applications.
Kubernetes: Orchestration of containerized applications.
MLflow: Experiment tracking and deployment.
Weights & Biases (W&B): Experiment tracking and collaboration.
Pandas Profiling: Automated data profiling.
BigQuery/Athena: Serverless data warehousing tools.
Mastering these tools will ensure you are well-equipped to handle various challenges across the AI lifecycle.
#artificialintelligence
1. Data Science Tools
Python: Preferred language with libraries like NumPy, Pandas, Scikit-learn.
R: Ideal for statistical analysis and data visualization.
Jupyter Notebook: Interactive coding environment for Python and R.
MATLAB: Used for mathematical modeling and algorithm development.
RapidMiner: Drag-and-drop platform for machine learning workflows.
KNIME: Open-source analytics platform for data integration and analysis.
2. Machine Learning Tools
Scikit-learn: Comprehensive library for traditional ML algorithms.
XGBoost & LightGBM: Specialized tools for gradient boosting.
TensorFlow: Open-source framework for ML and DL.
PyTorch: Popular DL framework with a dynamic computation graph.
H2O.ai: Scalable platform for ML and AutoML.
Auto-sklearn: AutoML for automating the ML pipeline.
3. Deep Learning Tools
Keras: User-friendly high-level API for building neural networks.
PyTorch: Excellent for research and production in DL.
TensorFlow: Versatile for both research and deployment.
ONNX: Open format for model interoperability.
OpenCV: For image processing and computer vision.
Hugging Face: Focused on natural language processing.
4. Data Engineering Tools
Apache Hadoop: Framework for distributed storage and processing.
Apache Spark: Fast cluster-computing framework.
Kafka: Distributed streaming platform.
Airflow: Workflow automation tool.
Fivetran: ETL tool for data integration.
dbt: Data transformation tool using SQL.
5. Data Visualization Tools
Tableau: Drag-and-drop BI tool for interactive dashboards.
Power BI: Microsoftβs BI platform for data analysis and visualization.
Matplotlib & Seaborn: Python libraries for static and interactive plots.
Plotly: Interactive plotting library with Dash for web apps.
D3.js: JavaScript library for creating dynamic web visualizations.
6. Cloud Platforms
AWS: Services like SageMaker for ML model building.
Google Cloud Platform (GCP): Tools like BigQuery and AutoML.
Microsoft Azure: Azure ML Studio for ML workflows.
IBM Watson: AI platform for custom model development.
GitHub/GitLab: Platforms for code sharing and collaboration.
Bitbucket: Version control for teams.
8. Other Essential Tools
Docker: For containerizing applications.
Kubernetes: Orchestration of containerized applications.
MLflow: Experiment tracking and deployment.
Weights & Biases (W&B): Experiment tracking and collaboration.
Pandas Profiling: Automated data profiling.
BigQuery/Athena: Serverless data warehousing tools.
Mastering these tools will ensure you are well-equipped to handle various challenges across the AI lifecycle.
#artificialintelligence
β€5
Essential Python Libraries for Data Science
- Numpy: Fundamental for numerical operations, handling arrays, and mathematical functions.
- SciPy: Complements Numpy with additional functionalities for scientific computing, including optimization and signal processing.
- Pandas: Essential for data manipulation and analysis, offering powerful data structures like DataFrames.
- Matplotlib: A versatile plotting library for creating static, interactive, and animated visualizations.
- Keras: A high-level neural networks API, facilitating rapid prototyping and experimentation in deep learning.
- TensorFlow: An open-source machine learning framework widely used for building and training deep learning models.
- Scikit-learn: Provides simple and efficient tools for data mining, machine learning, and statistical modeling.
- Seaborn: Built on Matplotlib, Seaborn enhances data visualization with a high-level interface for drawing attractive and informative statistical graphics.
- Statsmodels: Focuses on estimating and testing statistical models, providing tools for exploring data, estimating models, and statistical testing.
- NLTK (Natural Language Toolkit): A library for working with human language data, supporting tasks like classification, tokenization, stemming, tagging, parsing, and more.
These libraries collectively empower data scientists to handle various tasks, from data preprocessing to advanced machine learning implementations.
ENJOY LEARNING ππ
- Numpy: Fundamental for numerical operations, handling arrays, and mathematical functions.
- SciPy: Complements Numpy with additional functionalities for scientific computing, including optimization and signal processing.
- Pandas: Essential for data manipulation and analysis, offering powerful data structures like DataFrames.
- Matplotlib: A versatile plotting library for creating static, interactive, and animated visualizations.
- Keras: A high-level neural networks API, facilitating rapid prototyping and experimentation in deep learning.
- TensorFlow: An open-source machine learning framework widely used for building and training deep learning models.
- Scikit-learn: Provides simple and efficient tools for data mining, machine learning, and statistical modeling.
- Seaborn: Built on Matplotlib, Seaborn enhances data visualization with a high-level interface for drawing attractive and informative statistical graphics.
- Statsmodels: Focuses on estimating and testing statistical models, providing tools for exploring data, estimating models, and statistical testing.
- NLTK (Natural Language Toolkit): A library for working with human language data, supporting tasks like classification, tokenization, stemming, tagging, parsing, and more.
These libraries collectively empower data scientists to handle various tasks, from data preprocessing to advanced machine learning implementations.
ENJOY LEARNING ππ
β€2
Dataset Name: Disease Risk from Daily Habits
This dataset contains detailed lifestyle and biometric information from 100,000 individuals. The goal is to predict the likelihood of having a disease based on habits, health metrics, demographics, and psychological indicators.
π° Direct dataset download link:
https://www.kaggle.com/api/v1/datasets/download/mahdimashayekhi/disease-risk-from-daily-habits
π RELATED NOTEBOOKS:
1. Heart Attack Risk Prediction Dataset | Upvotes: 273
URL: https://www.kaggle.com/datasets/iamsouravbanerjee/heart-attack-prediction-dataset
2. Diabetes_prediction_dataset | Upvotes: 88
URL: https://www.kaggle.com/datasets/marshalpatel3558/diabetes-prediction-dataset
3. Health & Lifestyle Dataset | Upvotes: 37
URL: https://www.kaggle.com/datasets/mahdimashayekhi/health-and-lifestyle-dataset
4. 𧬠Predicting Disease Risk from Daily Habits | Upvotes: 11
URL: https://www.kaggle.com/code/mahdimashayekhi/predicting-disease-risk-from-daily-habits
This dataset contains detailed lifestyle and biometric information from 100,000 individuals. The goal is to predict the likelihood of having a disease based on habits, health metrics, demographics, and psychological indicators.
π° Direct dataset download link:
https://www.kaggle.com/api/v1/datasets/download/mahdimashayekhi/disease-risk-from-daily-habits
π RELATED NOTEBOOKS:
1. Heart Attack Risk Prediction Dataset | Upvotes: 273
URL: https://www.kaggle.com/datasets/iamsouravbanerjee/heart-attack-prediction-dataset
2. Diabetes_prediction_dataset | Upvotes: 88
URL: https://www.kaggle.com/datasets/marshalpatel3558/diabetes-prediction-dataset
3. Health & Lifestyle Dataset | Upvotes: 37
URL: https://www.kaggle.com/datasets/mahdimashayekhi/health-and-lifestyle-dataset
4. 𧬠Predicting Disease Risk from Daily Habits | Upvotes: 11
URL: https://www.kaggle.com/code/mahdimashayekhi/predicting-disease-risk-from-daily-habits
β€2π₯1
Data Analyst Interview Questions with Answers
1. What is the difference between the RANK() and DENSE_RANK() functions?
The RANK() function in the result set defines the rank of each row within your ordered partition. If both rows have the same rank, the next number in the ranking will be the previous rank plus a number of duplicates. If we have three records at rank 4, for example, the next level indicated is 7. The DENSE_RANK() function assigns a distinct rank to each row within a partition based on the provided column value, with no gaps. If we have three records at rank 4, for example, the next level indicated is 5.
2. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?
One-hot encoding is the representation of categorical variables as binary vectors. Label Encoding is converting labels/words into numeric form. Using one-hot encoding increases the dimensionality of the data set. Label encoding doesnβt affect the dimensionality of the data set. One-hot encoding creates a new variable for each level in the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.
3. What is the shortcut to add a filter to a table in EXCEL?
The filter mechanism is used when you want to display only specific data from the entire dataset. By doing so, there is no change being made to the data. The shortcut to add a filter to a table is Ctrl+Shift+L.
4. What is DAX in Power BI?
DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in formulas to calculate and return values. In other words, it helps you create new info from data you already have.
5. Define shelves and sets in Tableau?
Shelves: Every worksheet in Tableau will have shelves such as columns, rows, marks, filters, pages, and more. By placing filters on shelves we can build our own visualization structure. We can control the marks by including or excluding data.
Sets: The sets are used to compute a condition on which the dataset will be prepared. Data will be grouped together based on a condition. Fields which is responsible for grouping are known assets. For example β students having grades of more than 70%.
React β€οΈ for more
1. What is the difference between the RANK() and DENSE_RANK() functions?
The RANK() function in the result set defines the rank of each row within your ordered partition. If both rows have the same rank, the next number in the ranking will be the previous rank plus a number of duplicates. If we have three records at rank 4, for example, the next level indicated is 7. The DENSE_RANK() function assigns a distinct rank to each row within a partition based on the provided column value, with no gaps. If we have three records at rank 4, for example, the next level indicated is 5.
2. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given dataset?
One-hot encoding is the representation of categorical variables as binary vectors. Label Encoding is converting labels/words into numeric form. Using one-hot encoding increases the dimensionality of the data set. Label encoding doesnβt affect the dimensionality of the data set. One-hot encoding creates a new variable for each level in the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.
3. What is the shortcut to add a filter to a table in EXCEL?
The filter mechanism is used when you want to display only specific data from the entire dataset. By doing so, there is no change being made to the data. The shortcut to add a filter to a table is Ctrl+Shift+L.
4. What is DAX in Power BI?
DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in formulas to calculate and return values. In other words, it helps you create new info from data you already have.
5. Define shelves and sets in Tableau?
Shelves: Every worksheet in Tableau will have shelves such as columns, rows, marks, filters, pages, and more. By placing filters on shelves we can build our own visualization structure. We can control the marks by including or excluding data.
Sets: The sets are used to compute a condition on which the dataset will be prepared. Data will be grouped together based on a condition. Fields which is responsible for grouping are known assets. For example β students having grades of more than 70%.
React β€οΈ for more
β€2
The Only roadmap you need to become an ML Engineer π₯³
Phase 1: Foundations (1-2 Months)
πΉ Math & Stats Basics β Linear Algebra, Probability, Statistics
πΉ Python Programming β NumPy, Pandas, Matplotlib, Scikit-Learn
πΉ Data Handling β Cleaning, Feature Engineering, Exploratory Data Analysis
Phase 2: Core Machine Learning (2-3 Months)
πΉ Supervised & Unsupervised Learning β Regression, Classification, Clustering
πΉ Model Evaluation β Cross-validation, Metrics (Accuracy, Precision, Recall, AUC-ROC)
πΉ Hyperparameter Tuning β Grid Search, Random Search, Bayesian Optimization
πΉ Basic ML Projects β Predict house prices, customer segmentation
Phase 3: Deep Learning & Advanced ML (2-3 Months)
πΉ Neural Networks β TensorFlow & PyTorch Basics
πΉ CNNs & Image Processing β Object Detection, Image Classification
πΉ NLP & Transformers β Sentiment Analysis, BERT, LLMs (GPT, Gemini)
πΉ Reinforcement Learning Basics β Q-learning, Policy Gradient
Phase 4: ML System Design & MLOps (2-3 Months)
πΉ ML in Production β Model Deployment (Flask, FastAPI, Docker)
πΉ MLOps β CI/CD, Model Monitoring, Model Versioning (MLflow, Kubeflow)
πΉ Cloud & Big Data β AWS/GCP/Azure, Spark, Kafka
πΉ End-to-End ML Projects β Fraud detection, Recommendation systems
Phase 5: Specialization & Job Readiness (Ongoing)
πΉ Specialize β Computer Vision, NLP, Generative AI, Edge AI
πΉ Interview Prep β Leetcode for ML, System Design, ML Case Studies
πΉ Portfolio Building β GitHub, Kaggle Competitions, Writing Blogs
πΉ Networking β Contribute to open-source, Attend ML meetups, LinkedIn presence
Follow this advanced roadmap to build a successful career in ML!
The data field is vast, offering endless opportunities so start preparing now.
Phase 1: Foundations (1-2 Months)
πΉ Math & Stats Basics β Linear Algebra, Probability, Statistics
πΉ Python Programming β NumPy, Pandas, Matplotlib, Scikit-Learn
πΉ Data Handling β Cleaning, Feature Engineering, Exploratory Data Analysis
Phase 2: Core Machine Learning (2-3 Months)
πΉ Supervised & Unsupervised Learning β Regression, Classification, Clustering
πΉ Model Evaluation β Cross-validation, Metrics (Accuracy, Precision, Recall, AUC-ROC)
πΉ Hyperparameter Tuning β Grid Search, Random Search, Bayesian Optimization
πΉ Basic ML Projects β Predict house prices, customer segmentation
Phase 3: Deep Learning & Advanced ML (2-3 Months)
πΉ Neural Networks β TensorFlow & PyTorch Basics
πΉ CNNs & Image Processing β Object Detection, Image Classification
πΉ NLP & Transformers β Sentiment Analysis, BERT, LLMs (GPT, Gemini)
πΉ Reinforcement Learning Basics β Q-learning, Policy Gradient
Phase 4: ML System Design & MLOps (2-3 Months)
πΉ ML in Production β Model Deployment (Flask, FastAPI, Docker)
πΉ MLOps β CI/CD, Model Monitoring, Model Versioning (MLflow, Kubeflow)
πΉ Cloud & Big Data β AWS/GCP/Azure, Spark, Kafka
πΉ End-to-End ML Projects β Fraud detection, Recommendation systems
Phase 5: Specialization & Job Readiness (Ongoing)
πΉ Specialize β Computer Vision, NLP, Generative AI, Edge AI
πΉ Interview Prep β Leetcode for ML, System Design, ML Case Studies
πΉ Portfolio Building β GitHub, Kaggle Competitions, Writing Blogs
πΉ Networking β Contribute to open-source, Attend ML meetups, LinkedIn presence
Follow this advanced roadmap to build a successful career in ML!
The data field is vast, offering endless opportunities so start preparing now.
β€4
What are the main assumptions of linear regression?
There are several assumptions of linear regression. If any of them is violated, model predictions and interpretation may be worthless or misleading.
1) Linear relationship between features and target variable.
2) Additivity means that the effect of changes in one of the features on the target variable does not depend on values of other features. For example, a model for predicting revenue of a company have of two features - the number of items a sold and the number of items b sold. When company sells more items a the revenue increases and this is independent of the number of items b sold. But, if customers who buy a stop buying b, the additivity assumption is violated.
3) Features are not correlated (no collinearity) since it can be difficult to separate out the individual effects of collinear features on the target variable.
4) Errors are independently and identically normally distributed (yi = B0 + B1*x1i + ... + errori):
i) No correlation between errors (consecutive errors in the case of time series data).
ii) Constant variance of errors - homoscedasticity. For example, in case of time series, seasonal patterns can increase errors in seasons with higher activity.
iii) Errors are normaly distributed, otherwise some features will have more influence on the target variable than to others. If the error distribution is significantly non-normal, confidence intervals may be too wide or too narrow.
There are several assumptions of linear regression. If any of them is violated, model predictions and interpretation may be worthless or misleading.
1) Linear relationship between features and target variable.
2) Additivity means that the effect of changes in one of the features on the target variable does not depend on values of other features. For example, a model for predicting revenue of a company have of two features - the number of items a sold and the number of items b sold. When company sells more items a the revenue increases and this is independent of the number of items b sold. But, if customers who buy a stop buying b, the additivity assumption is violated.
3) Features are not correlated (no collinearity) since it can be difficult to separate out the individual effects of collinear features on the target variable.
4) Errors are independently and identically normally distributed (yi = B0 + B1*x1i + ... + errori):
i) No correlation between errors (consecutive errors in the case of time series data).
ii) Constant variance of errors - homoscedasticity. For example, in case of time series, seasonal patterns can increase errors in seasons with higher activity.
iii) Errors are normaly distributed, otherwise some features will have more influence on the target variable than to others. If the error distribution is significantly non-normal, confidence intervals may be too wide or too narrow.
β€2
Hi Guys,
Here are some of the telegram channels which may help you in data analytics journey ππ
SQL: https://t.iss.one/sqlanalyst
Power BI & Tableau: https://t.iss.one/PowerBI_analyst
Excel: https://t.iss.one/excel_analyst
Python: https://t.iss.one/dsabooks
Jobs: https://t.iss.one/datasciencej
Data Science: https://t.iss.one/datasciencefree
Artificial intelligence: https://t.iss.one/aiindi
Data Analysts: https://t.iss.one/sqlspecialist
Hope it helps :)
Here are some of the telegram channels which may help you in data analytics journey ππ
SQL: https://t.iss.one/sqlanalyst
Power BI & Tableau: https://t.iss.one/PowerBI_analyst
Excel: https://t.iss.one/excel_analyst
Python: https://t.iss.one/dsabooks
Jobs: https://t.iss.one/datasciencej
Data Science: https://t.iss.one/datasciencefree
Artificial intelligence: https://t.iss.one/aiindi
Data Analysts: https://t.iss.one/sqlspecialist
Hope it helps :)
β€1π1
Data Science Cheatsheet πͺ
β€3