Many data scientists don't know how to push ML models to production. Here's the recipe ๐
๐๐ฒ๐ ๐๐ป๐ด๐ฟ๐ฒ๐ฑ๐ถ๐ฒ๐ป๐๐
๐น ๐ง๐ฟ๐ฎ๐ถ๐ป / ๐ง๐ฒ๐๐ ๐๐ฎ๐๐ฎ๐๐ฒ๐ - Ensure Test is representative of Online data
๐น ๐๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด ๐ฃ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ - Generate features in real-time
๐น ๐ ๐ผ๐ฑ๐ฒ๐น ๐ข๐ฏ๐ท๐ฒ๐ฐ๐ - Trained SkLearn or Tensorflow Model
๐น ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ ๐๐ผ๐ฑ๐ฒ ๐ฅ๐ฒ๐ฝ๐ผ - Save model project code to Github
๐น ๐๐ฃ๐ ๐๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ - Use FastAPI or Flask to build a model API
๐น ๐๐ผ๐ฐ๐ธ๐ฒ๐ฟ - Containerize the ML model API
๐น ๐ฅ๐ฒ๐บ๐ผ๐๐ฒ ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ - Choose a cloud service; e.g. AWS sagemaker
๐น ๐จ๐ป๐ถ๐ ๐ง๐ฒ๐๐๐ - Test inputs & outputs of functions and APIs
๐น ๐ ๐ผ๐ฑ๐ฒ๐น ๐ ๐ผ๐ป๐ถ๐๐ผ๐ฟ๐ถ๐ป๐ด - Evidently AI, a simple, open-source for ML monitoring
๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐ฑ๐๐ฟ๐ฒ
๐ฆ๐๐ฒ๐ฝ ๐ญ - ๐๐ฎ๐๐ฎ ๐ฃ๐ฟ๐ฒ๐ฝ๐ฎ๐ฟ๐ฎ๐๐ถ๐ผ๐ป & ๐๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด
Don't push a model with 90% accuracy on train set. Do it based on the test set - if and only if, the test set is representative of the online data. Use SkLearn pipeline to chain a series of model preprocessing functions like null handling.
๐ฆ๐๐ฒ๐ฝ ๐ฎ - ๐ ๐ผ๐ฑ๐ฒ๐น ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐
Train your model with frameworks like Sklearn or Tensorflow. Push the model code including preprocessing, training and validation scripts to Github for reproducibility.
๐ฆ๐๐ฒ๐ฝ ๐ฏ - ๐๐ฃ๐ ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐ & ๐๐ผ๐ป๐๐ฎ๐ถ๐ป๐ฒ๐ฟ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป
Your model needs a "/predict" endpoint, which receives a JSON object in the request input and generates a JSON object with the model score in the response output. You can use frameworks like FastAPI or Flask. Containzerize this API so that it's agnostic to server environment
๐ฆ๐๐ฒ๐ฝ ๐ฐ - ๐ง๐ฒ๐๐๐ถ๐ป๐ด & ๐๐ฒ๐ฝ๐น๐ผ๐๐บ๐ฒ๐ป๐
Write tests to validate inputs & outputs of API functions to prevent errors. Push the code to remote services like AWS Sagemaker.
๐ฆ๐๐ฒ๐ฝ ๐ฑ - ๐ ๐ผ๐ป๐ถ๐๐ผ๐ฟ๐ถ๐ป๐ด
Set up monitoring tools like Evidently AI, or use a built-in one within AWS Sagemaker. I use such tools to track performance metrics and data drifts on online data.
๐๐ฒ๐ ๐๐ป๐ด๐ฟ๐ฒ๐ฑ๐ถ๐ฒ๐ป๐๐
๐น ๐ง๐ฟ๐ฎ๐ถ๐ป / ๐ง๐ฒ๐๐ ๐๐ฎ๐๐ฎ๐๐ฒ๐ - Ensure Test is representative of Online data
๐น ๐๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด ๐ฃ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ - Generate features in real-time
๐น ๐ ๐ผ๐ฑ๐ฒ๐น ๐ข๐ฏ๐ท๐ฒ๐ฐ๐ - Trained SkLearn or Tensorflow Model
๐น ๐ฃ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐ ๐๐ผ๐ฑ๐ฒ ๐ฅ๐ฒ๐ฝ๐ผ - Save model project code to Github
๐น ๐๐ฃ๐ ๐๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ - Use FastAPI or Flask to build a model API
๐น ๐๐ผ๐ฐ๐ธ๐ฒ๐ฟ - Containerize the ML model API
๐น ๐ฅ๐ฒ๐บ๐ผ๐๐ฒ ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ - Choose a cloud service; e.g. AWS sagemaker
๐น ๐จ๐ป๐ถ๐ ๐ง๐ฒ๐๐๐ - Test inputs & outputs of functions and APIs
๐น ๐ ๐ผ๐ฑ๐ฒ๐น ๐ ๐ผ๐ป๐ถ๐๐ผ๐ฟ๐ถ๐ป๐ด - Evidently AI, a simple, open-source for ML monitoring
๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐ฑ๐๐ฟ๐ฒ
๐ฆ๐๐ฒ๐ฝ ๐ญ - ๐๐ฎ๐๐ฎ ๐ฃ๐ฟ๐ฒ๐ฝ๐ฎ๐ฟ๐ฎ๐๐ถ๐ผ๐ป & ๐๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ถ๐ป๐ด
Don't push a model with 90% accuracy on train set. Do it based on the test set - if and only if, the test set is representative of the online data. Use SkLearn pipeline to chain a series of model preprocessing functions like null handling.
๐ฆ๐๐ฒ๐ฝ ๐ฎ - ๐ ๐ผ๐ฑ๐ฒ๐น ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐
Train your model with frameworks like Sklearn or Tensorflow. Push the model code including preprocessing, training and validation scripts to Github for reproducibility.
๐ฆ๐๐ฒ๐ฝ ๐ฏ - ๐๐ฃ๐ ๐๐ฒ๐๐ฒ๐น๐ผ๐ฝ๐บ๐ฒ๐ป๐ & ๐๐ผ๐ป๐๐ฎ๐ถ๐ป๐ฒ๐ฟ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป
Your model needs a "/predict" endpoint, which receives a JSON object in the request input and generates a JSON object with the model score in the response output. You can use frameworks like FastAPI or Flask. Containzerize this API so that it's agnostic to server environment
๐ฆ๐๐ฒ๐ฝ ๐ฐ - ๐ง๐ฒ๐๐๐ถ๐ป๐ด & ๐๐ฒ๐ฝ๐น๐ผ๐๐บ๐ฒ๐ป๐
Write tests to validate inputs & outputs of API functions to prevent errors. Push the code to remote services like AWS Sagemaker.
๐ฆ๐๐ฒ๐ฝ ๐ฑ - ๐ ๐ผ๐ป๐ถ๐๐ผ๐ฟ๐ถ๐ป๐ด
Set up monitoring tools like Evidently AI, or use a built-in one within AWS Sagemaker. I use such tools to track performance metrics and data drifts on online data.
๐7๐1
AI Agents Course
by Hugging Face ๐ค
This free course will take you on a journey, from beginner to expert, in understanding, using and building AI agents.
https://huggingface.co/learn/agents-course/unit0/introduction
by Hugging Face ๐ค
This free course will take you on a journey, from beginner to expert, in understanding, using and building AI agents.
https://huggingface.co/learn/agents-course/unit0/introduction
๐4๐1
How do you handle null, 0, and blank values in your data during the cleaning process?
Sometimes interview questions are also based on this topic. Many data aspirants or even some professionals sometimes make the mistake of simply deleting missing values or trying to fill them without proper analysis.This can damage the integrity of the analysis. Itโs essential to ask or find out the reason behind missing values in the data
whether from the project head, client, or through own investigation.
๐ผ๐ฃ๐จ๐ฌ๐๐ง:
Handling null, 0, and blank values is crucial for ensuring the accuracy and reliability of data analysis. Hereโs how to approach it:
1. ๐๐๐๐ฃ๐ฉ๐๐๐ฎ๐๐ฃ๐ ๐๐ฃ๐ ๐๐ฃ๐๐๐ง๐จ๐ฉ๐๐ฃ๐๐๐ฃ๐ ๐ฉ๐๐ ๐พ๐ค๐ฃ๐ฉ๐๐ญ๐ฉ:
- ๐๐ช๐ก๐ก ๐๐๐ก๐ช๐๐จ: These represent missing or undefined data. Identify them using functions like 'ISNULL' or filters in Power Query.
- 0 ๐๐๐ก๐ช๐๐จ: These can be legitimate data points but may also indicate missing data in some contexts. Understanding the context is important.
- ๐ฝ๐ก๐๐ฃ๐ ๐๐๐ก๐ช๐๐จ: These can be spaces or empty strings. Identify them using 'LEN', 'TRIM', or filters.
2. ๐๐๐ฃ๐๐ก๐๐ฃ๐ ๐๐๐๐จ๐ ๐๐๐ก๐ช๐๐จ ๐๐จ๐๐ฃ๐ ๐๐ง๐ค๐ฅ๐๐ง ๐๐๐๐๐ฃ๐๐ฆ๐ช๐๐จ:
- ๐๐ช๐ก๐ก ๐๐๐ก๐ช๐๐จ: Typically decide whether to impute, remove, or leave them based on the datasetโs context and the analysis requirements. Common imputation methods include using mean, median, or a placeholder.
- 0 ๐๐๐ก๐ช๐๐จ: If 0s are valid data, leave them as is. If they indicate missing data, treat them similarly to null values.
- ๐ฝ๐ก๐๐ฃ๐ ๐๐๐ก๐ช๐๐จ: Convert blanks to nulls or handle them as needed. This involves using 'IF' statements or Power Query transformations.
3. ๐๐จ๐๐ฃ๐ ๐๐ญ๐๐๐ก ๐๐ฃ๐ ๐๐ค๐ฌ๐๐ง ๐๐ช๐๐ง๐ฎ:
- ๐๐ญ๐๐๐ก: Use formulas like 'IFERROR', 'IF', and 'VLOOKUP' to handle these values.
- ๐๐ค๐ฌ๐๐ง ๐๐ช๐๐ง๐ฎ: Use transformations to filter, replace, or fill null and blank values. Steps like 'Fill Down', 'Replace Values', and custom columns help automate the process.
By carefully considering the context and using appropriate methods, the data cleaning process maintains the integrity and quality of the data.
Hope it helps :)
Sometimes interview questions are also based on this topic. Many data aspirants or even some professionals sometimes make the mistake of simply deleting missing values or trying to fill them without proper analysis.This can damage the integrity of the analysis. Itโs essential to ask or find out the reason behind missing values in the data
whether from the project head, client, or through own investigation.
๐ผ๐ฃ๐จ๐ฌ๐๐ง:
Handling null, 0, and blank values is crucial for ensuring the accuracy and reliability of data analysis. Hereโs how to approach it:
1. ๐๐๐๐ฃ๐ฉ๐๐๐ฎ๐๐ฃ๐ ๐๐ฃ๐ ๐๐ฃ๐๐๐ง๐จ๐ฉ๐๐ฃ๐๐๐ฃ๐ ๐ฉ๐๐ ๐พ๐ค๐ฃ๐ฉ๐๐ญ๐ฉ:
- ๐๐ช๐ก๐ก ๐๐๐ก๐ช๐๐จ: These represent missing or undefined data. Identify them using functions like 'ISNULL' or filters in Power Query.
- 0 ๐๐๐ก๐ช๐๐จ: These can be legitimate data points but may also indicate missing data in some contexts. Understanding the context is important.
- ๐ฝ๐ก๐๐ฃ๐ ๐๐๐ก๐ช๐๐จ: These can be spaces or empty strings. Identify them using 'LEN', 'TRIM', or filters.
2. ๐๐๐ฃ๐๐ก๐๐ฃ๐ ๐๐๐๐จ๐ ๐๐๐ก๐ช๐๐จ ๐๐จ๐๐ฃ๐ ๐๐ง๐ค๐ฅ๐๐ง ๐๐๐๐๐ฃ๐๐ฆ๐ช๐๐จ:
- ๐๐ช๐ก๐ก ๐๐๐ก๐ช๐๐จ: Typically decide whether to impute, remove, or leave them based on the datasetโs context and the analysis requirements. Common imputation methods include using mean, median, or a placeholder.
- 0 ๐๐๐ก๐ช๐๐จ: If 0s are valid data, leave them as is. If they indicate missing data, treat them similarly to null values.
- ๐ฝ๐ก๐๐ฃ๐ ๐๐๐ก๐ช๐๐จ: Convert blanks to nulls or handle them as needed. This involves using 'IF' statements or Power Query transformations.
3. ๐๐จ๐๐ฃ๐ ๐๐ญ๐๐๐ก ๐๐ฃ๐ ๐๐ค๐ฌ๐๐ง ๐๐ช๐๐ง๐ฎ:
- ๐๐ญ๐๐๐ก: Use formulas like 'IFERROR', 'IF', and 'VLOOKUP' to handle these values.
- ๐๐ค๐ฌ๐๐ง ๐๐ช๐๐ง๐ฎ: Use transformations to filter, replace, or fill null and blank values. Steps like 'Fill Down', 'Replace Values', and custom columns help automate the process.
By carefully considering the context and using appropriate methods, the data cleaning process maintains the integrity and quality of the data.
Hope it helps :)
๐5โค2๐คฃ1
Will LLMs always hallucinate?
As large language models (LLMs) become more powerful and pervasive, it's crucial that we understand their limitations.
A new paper argues that hallucinations - where the model generates false or nonsensical information - are not just occasional mistakes, but an inherent property of these systems.
While the idea of hallucinations as features isn't new, the researchers' explanation is.
They draw on computational theory and Gรถdel's incompleteness theorems to show that hallucinations are baked into the very structure of LLMs.
In essence, they argue that the process of training and using these models involves undecidable problems - meaning there will always be some inputs that cause the model to go off the rails.
This would have big implications. It suggests that no amount of architectural tweaks, data cleaning, or fact-checking can fully eliminate hallucinations.
So what does this mean in practice? For one, it highlights the importance of using LLMs carefully, with an understanding of their limitations.
It also suggests that research into making models more robust and understanding their failure modes is crucial.
No matter how impressive the results, LLMs are not oracles - they're tools with inherent flaws and biases
LLM & Generative AI Resources: https://whatsapp.com/channel/0029VaoePz73bbV94yTh6V2E
As large language models (LLMs) become more powerful and pervasive, it's crucial that we understand their limitations.
A new paper argues that hallucinations - where the model generates false or nonsensical information - are not just occasional mistakes, but an inherent property of these systems.
While the idea of hallucinations as features isn't new, the researchers' explanation is.
They draw on computational theory and Gรถdel's incompleteness theorems to show that hallucinations are baked into the very structure of LLMs.
In essence, they argue that the process of training and using these models involves undecidable problems - meaning there will always be some inputs that cause the model to go off the rails.
This would have big implications. It suggests that no amount of architectural tweaks, data cleaning, or fact-checking can fully eliminate hallucinations.
So what does this mean in practice? For one, it highlights the importance of using LLMs carefully, with an understanding of their limitations.
It also suggests that research into making models more robust and understanding their failure modes is crucial.
No matter how impressive the results, LLMs are not oracles - they're tools with inherent flaws and biases
LLM & Generative AI Resources: https://whatsapp.com/channel/0029VaoePz73bbV94yTh6V2E
๐5๐คฃ1
Preparing for a machine learning interview as a data analyst is a great step.
Here are some common machine learning interview questions :-
1. Explain the steps involved in a machine learning project lifecycle.
2. What is the difference between supervised and unsupervised learning? Give examples of each.
3. What evaluation metrics would you use to assess the performance of a regression model?
4. What is overfitting and how can you prevent it?
5. Describe the bias-variance tradeoff.
6. What is cross-validation, and why is it important in machine learning?
7. What are some feature selection techniques you are familiar with?
8.What are the assumptions of linear regression?
9. How does regularization help in linear models?
10. Explain the difference between classification and regression.
11. What are some common algorithms used for dimensionality reduction?
12. Describe how a decision tree works.
13. What are ensemble methods, and why are they useful?
14. How do you handle missing or corrupted data in a dataset?
15. What are the different kernels used in Support Vector Machines (SVM)?
These questions cover a range of fundamental concepts and techniques in machine learning that are important for a data scientist role.
Good luck with your interview preparation!
Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624
Like if you need similar content ๐๐
Here are some common machine learning interview questions :-
1. Explain the steps involved in a machine learning project lifecycle.
2. What is the difference between supervised and unsupervised learning? Give examples of each.
3. What evaluation metrics would you use to assess the performance of a regression model?
4. What is overfitting and how can you prevent it?
5. Describe the bias-variance tradeoff.
6. What is cross-validation, and why is it important in machine learning?
7. What are some feature selection techniques you are familiar with?
8.What are the assumptions of linear regression?
9. How does regularization help in linear models?
10. Explain the difference between classification and regression.
11. What are some common algorithms used for dimensionality reduction?
12. Describe how a decision tree works.
13. What are ensemble methods, and why are they useful?
14. How do you handle missing or corrupted data in a dataset?
15. What are the different kernels used in Support Vector Machines (SVM)?
These questions cover a range of fundamental concepts and techniques in machine learning that are important for a data scientist role.
Good luck with your interview preparation!
Best Data Science & Machine Learning Resources: https://topmate.io/coding/914624
Like if you need similar content ๐๐
๐8โค2
Free Session to learn Data Analytics, Data Science & AI
๐๐
https://tracking.acciojob.com/g/PUfdDxgHR
Register fast, only for first few users
๐๐
https://tracking.acciojob.com/g/PUfdDxgHR
Register fast, only for first few users
๐5
Official Python Docs
https://docs.python.org/3/
Tools:
https://docs.python-guide.org/en/latest/dev/virtualenvs/
https://www.pythonforbeginners.com/basics/python-pip-usage
Practice:
https://www.practicepython.org/
https://www.hackerrank.com
https://wiki.python.org/moin/PythonDecorators
Python GUI FAQ
https://docs.python.org/3/faq/gui.html
https://docs.python.org/3/
Tools:
https://docs.python-guide.org/en/latest/dev/virtualenvs/
https://www.pythonforbeginners.com/basics/python-pip-usage
Practice:
https://www.practicepython.org/
https://www.hackerrank.com
https://wiki.python.org/moin/PythonDecorators
Python GUI FAQ
https://docs.python.org/3/faq/gui.html
๐2