HandsOnLLM/Hands-On-Large-Language-Models
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
Language:Jupyter Notebook
Total stars: 194
Stars trend:
#jupyternotebook
#artificialintelligence, #book, #largelanguagemodels, #llm, #llms, #oreilly, #oreillybooks
Official code repo for the O'Reilly Book - "Hands-On Large Language Models"
Language:Jupyter Notebook
Total stars: 194
Stars trend:
16 Sep 2024
5pm β +6
6pm β +6
7pm β +7
8pm β +2
9pm β +3
10pm β +4
11pm β +3
17 Sep 2024
12am β +1
1am β +3
2am β +5
3am βββ +18
4am βββ +17#jupyternotebook
#artificialintelligence, #book, #largelanguagemodels, #llm, #llms, #oreilly, #oreillybooks
π5β€1
Tools Every AI Engineer Should Know
1. Data Science Tools
Python: Preferred language with libraries like NumPy, Pandas, Scikit-learn.
R: Ideal for statistical analysis and data visualization.
Jupyter Notebook: Interactive coding environment for Python and R.
MATLAB: Used for mathematical modeling and algorithm development.
RapidMiner: Drag-and-drop platform for machine learning workflows.
KNIME: Open-source analytics platform for data integration and analysis.
2. Machine Learning Tools
Scikit-learn: Comprehensive library for traditional ML algorithms.
XGBoost & LightGBM: Specialized tools for gradient boosting.
TensorFlow: Open-source framework for ML and DL.
PyTorch: Popular DL framework with a dynamic computation graph.
H2O.ai: Scalable platform for ML and AutoML.
Auto-sklearn: AutoML for automating the ML pipeline.
3. Deep Learning Tools
Keras: User-friendly high-level API for building neural networks.
PyTorch: Excellent for research and production in DL.
TensorFlow: Versatile for both research and deployment.
ONNX: Open format for model interoperability.
OpenCV: For image processing and computer vision.
Hugging Face: Focused on natural language processing.
4. Data Engineering Tools
Apache Hadoop: Framework for distributed storage and processing.
Apache Spark: Fast cluster-computing framework.
Kafka: Distributed streaming platform.
Airflow: Workflow automation tool.
Fivetran: ETL tool for data integration.
dbt: Data transformation tool using SQL.
5. Data Visualization Tools
Tableau: Drag-and-drop BI tool for interactive dashboards.
Power BI: Microsoftβs BI platform for data analysis and visualization.
Matplotlib & Seaborn: Python libraries for static and interactive plots.
Plotly: Interactive plotting library with Dash for web apps.
D3.js: JavaScript library for creating dynamic web visualizations.
6. Cloud Platforms
AWS: Services like SageMaker for ML model building.
Google Cloud Platform (GCP): Tools like BigQuery and AutoML.
Microsoft Azure: Azure ML Studio for ML workflows.
IBM Watson: AI platform for custom model development.
7. Version Control and Collaboration Tools
Git: Version control system.
GitHub/GitLab: Platforms for code sharing and collaboration.
Bitbucket: Version control for teams.
8. Other Essential Tools
Docker: For containerizing applications.
Kubernetes: Orchestration of containerized applications.
MLflow: Experiment tracking and deployment.
Weights & Biases (W&B): Experiment tracking and collaboration.
Pandas Profiling: Automated data profiling.
BigQuery/Athena: Serverless data warehousing tools.
Mastering these tools will ensure you are well-equipped to handle various challenges across the AI lifecycle.
#artificialintelligence
1. Data Science Tools
Python: Preferred language with libraries like NumPy, Pandas, Scikit-learn.
R: Ideal for statistical analysis and data visualization.
Jupyter Notebook: Interactive coding environment for Python and R.
MATLAB: Used for mathematical modeling and algorithm development.
RapidMiner: Drag-and-drop platform for machine learning workflows.
KNIME: Open-source analytics platform for data integration and analysis.
2. Machine Learning Tools
Scikit-learn: Comprehensive library for traditional ML algorithms.
XGBoost & LightGBM: Specialized tools for gradient boosting.
TensorFlow: Open-source framework for ML and DL.
PyTorch: Popular DL framework with a dynamic computation graph.
H2O.ai: Scalable platform for ML and AutoML.
Auto-sklearn: AutoML for automating the ML pipeline.
3. Deep Learning Tools
Keras: User-friendly high-level API for building neural networks.
PyTorch: Excellent for research and production in DL.
TensorFlow: Versatile for both research and deployment.
ONNX: Open format for model interoperability.
OpenCV: For image processing and computer vision.
Hugging Face: Focused on natural language processing.
4. Data Engineering Tools
Apache Hadoop: Framework for distributed storage and processing.
Apache Spark: Fast cluster-computing framework.
Kafka: Distributed streaming platform.
Airflow: Workflow automation tool.
Fivetran: ETL tool for data integration.
dbt: Data transformation tool using SQL.
5. Data Visualization Tools
Tableau: Drag-and-drop BI tool for interactive dashboards.
Power BI: Microsoftβs BI platform for data analysis and visualization.
Matplotlib & Seaborn: Python libraries for static and interactive plots.
Plotly: Interactive plotting library with Dash for web apps.
D3.js: JavaScript library for creating dynamic web visualizations.
6. Cloud Platforms
AWS: Services like SageMaker for ML model building.
Google Cloud Platform (GCP): Tools like BigQuery and AutoML.
Microsoft Azure: Azure ML Studio for ML workflows.
IBM Watson: AI platform for custom model development.
GitHub/GitLab: Platforms for code sharing and collaboration.
Bitbucket: Version control for teams.
8. Other Essential Tools
Docker: For containerizing applications.
Kubernetes: Orchestration of containerized applications.
MLflow: Experiment tracking and deployment.
Weights & Biases (W&B): Experiment tracking and collaboration.
Pandas Profiling: Automated data profiling.
BigQuery/Athena: Serverless data warehousing tools.
Mastering these tools will ensure you are well-equipped to handle various challenges across the AI lifecycle.
#artificialintelligence
π7
10 Python Libraries Every AI Engineer Should Know
1. Hugging Face Transformers
A powerful library for using and fine-tuning pre-trained transformer models for NLP. Learn more: Hugging Face NLP Course
2. Ollama
A framework for running and managing open-source LLMs locally with ease. Learn video: Ollama Course
3. OpenAI Python SDK
The official toolkit for integrating OpenAI models into Python applications. Learn more: The official developer quickstart guide
4. Anthropic SDK
A client library for seamless interaction with Claude and other Anthropic models. Learn more: Anthropic Python SDK
5. LangChain
A framework for building LLM applications with modular and extensible components. Learn more: DeepLearning.AI
6. LlamaIndex
A toolkit for integrating custom data sources with LLMs for better retrieval. Learn more: Building Agentic RAG with LlamaIndex
7. SQLAlchemy
A Python SQL toolkit and ORM for efficient and maintainable database interactions. Learn more: SQLAlchemy Unified Tutorial
8. ChromaDB
An open-source vector database optimized for AI-powered search and retrieval. Learn more: Getting Started - Chroma Docs
9. Weaviate
A cloud-native vector search engine for efficient semantic search at scale. Learn more: 101T Work with: Text data
10. Weights & Biases
A platform for tracking, visualizing, and optimizing ML experiments.
Learn more: Effective MLOps: Model Development
#artificialintelligence
1. Hugging Face Transformers
A powerful library for using and fine-tuning pre-trained transformer models for NLP. Learn more: Hugging Face NLP Course
2. Ollama
A framework for running and managing open-source LLMs locally with ease. Learn video: Ollama Course
3. OpenAI Python SDK
The official toolkit for integrating OpenAI models into Python applications. Learn more: The official developer quickstart guide
4. Anthropic SDK
A client library for seamless interaction with Claude and other Anthropic models. Learn more: Anthropic Python SDK
5. LangChain
A framework for building LLM applications with modular and extensible components. Learn more: DeepLearning.AI
6. LlamaIndex
A toolkit for integrating custom data sources with LLMs for better retrieval. Learn more: Building Agentic RAG with LlamaIndex
7. SQLAlchemy
A Python SQL toolkit and ORM for efficient and maintainable database interactions. Learn more: SQLAlchemy Unified Tutorial
8. ChromaDB
An open-source vector database optimized for AI-powered search and retrieval. Learn more: Getting Started - Chroma Docs
9. Weaviate
A cloud-native vector search engine for efficient semantic search at scale. Learn more: 101T Work with: Text data
10. Weights & Biases
A platform for tracking, visualizing, and optimizing ML experiments.
Learn more: Effective MLOps: Model Development
#artificialintelligence
π4
π° How to become a data scientist in 2025?
π¨π»βπ» If you want to become a data science professional, follow this path! I've prepared a complete roadmap with the best free resources where you can learn the essential skills in this field.
π’ Step 1: Strengthen your math and statistics!
βοΈ The foundation of learning data science is mathematics, linear algebra, statistics, and probability. Topics you should master:
β Linear algebra: matrices, vectors, eigenvalues.
π Course: MIT 18.06 Linear Algebra
β Calculus: derivative, integral, optimization.
π Course: MIT Single Variable Calculus
β Statistics and probability: Bayes' theorem, hypothesis testing.
π Course: Statistics 110
βββββ
π’ Step 2: Learn to code.
βοΈ Learn Python and become proficient in coding. The most important topics you need to master are:
β Python: Pandas, NumPy, Matplotlib libraries
π Course: FreeCodeCamp Python Course
β SQL language: Join commands, Window functions, query optimization.
π Course: Stanford SQL Course
β Data structures and algorithms: arrays, linked lists, trees.
π Course: MIT Introduction to Algorithms
βββββ
π’ Step 3: Clean and visualize data
βοΈ Learn how to process and clean data and then create an engaging story from it!
β Data cleaning: Working with missing values ββand detecting outliers.
π Course: Data Cleaning
β Data visualization: Matplotlib, Seaborn, Tableau
π Course: Data Visualization Tutorial
βββββ
π’ Step 4: Learn Machine Learning
βοΈ It's time to enter the exciting world of machine learning! You should know these topics:
β Supervised learning: regression, classification.
β Unsupervised learning: clustering, PCA, anomaly detection.
β Deep learning: neural networks, CNN, RNN
π Course: CS229: Machine Learning
βββββ
π’ Step 5: Working with Big Data and Cloud Technologies
βοΈ If you're going to work in the real world, you need to know how to work with Big Data and cloud computing.
β Big Data Tools: Hadoop, Spark, Dask
β Cloud platforms: AWS, GCP, Azure
π Course: Data Engineering
βββββ
π’ Step 6: Do real projects!
βοΈ Enough theory, it's time to get coding! Do real projects and build a strong portfolio.
β Kaggle competitions: solving real-world challenges.
β End-to-End projects: data collection, modeling, implementation.
β GitHub: Publish your projects on GitHub.
π Platform: Kaggleπ Platform: ods.ai
βββββ
π’ Step 7: Learn MLOps and deploy models
βοΈ Machine learning is not just about building a model! You need to learn how to deploy and monitor a model.
β MLOps training: model versioning, monitoring, model retraining.
β Deployment models: Flask, FastAPI, Docker
π Course: Stanford MLOps Course
βββββ
π’ Step 8: Stay up to date and network
βοΈ Data science is changing every day, so it is necessary to update yourself every day and stay in regular contact with experienced people and experts in this field.
β Read scientific articles: arXiv, Google Scholar
β Connect with the data community:
π Site: Papers with code
π Site: AI Research at Google
π¨π»βπ» If you want to become a data science professional, follow this path! I've prepared a complete roadmap with the best free resources where you can learn the essential skills in this field.
π’ Step 1: Strengthen your math and statistics!
βοΈ The foundation of learning data science is mathematics, linear algebra, statistics, and probability. Topics you should master:
β Linear algebra: matrices, vectors, eigenvalues.
π Course: MIT 18.06 Linear Algebra
β Calculus: derivative, integral, optimization.
π Course: MIT Single Variable Calculus
β Statistics and probability: Bayes' theorem, hypothesis testing.
π Course: Statistics 110
βββββ
π’ Step 2: Learn to code.
βοΈ Learn Python and become proficient in coding. The most important topics you need to master are:
β Python: Pandas, NumPy, Matplotlib libraries
π Course: FreeCodeCamp Python Course
β SQL language: Join commands, Window functions, query optimization.
π Course: Stanford SQL Course
β Data structures and algorithms: arrays, linked lists, trees.
π Course: MIT Introduction to Algorithms
βββββ
π’ Step 3: Clean and visualize data
βοΈ Learn how to process and clean data and then create an engaging story from it!
β Data cleaning: Working with missing values ββand detecting outliers.
π Course: Data Cleaning
β Data visualization: Matplotlib, Seaborn, Tableau
π Course: Data Visualization Tutorial
βββββ
π’ Step 4: Learn Machine Learning
βοΈ It's time to enter the exciting world of machine learning! You should know these topics:
β Supervised learning: regression, classification.
β Unsupervised learning: clustering, PCA, anomaly detection.
β Deep learning: neural networks, CNN, RNN
π Course: CS229: Machine Learning
βββββ
π’ Step 5: Working with Big Data and Cloud Technologies
βοΈ If you're going to work in the real world, you need to know how to work with Big Data and cloud computing.
β Big Data Tools: Hadoop, Spark, Dask
β Cloud platforms: AWS, GCP, Azure
π Course: Data Engineering
βββββ
π’ Step 6: Do real projects!
βοΈ Enough theory, it's time to get coding! Do real projects and build a strong portfolio.
β Kaggle competitions: solving real-world challenges.
β End-to-End projects: data collection, modeling, implementation.
β GitHub: Publish your projects on GitHub.
π Platform: Kaggleπ Platform: ods.ai
βββββ
π’ Step 7: Learn MLOps and deploy models
βοΈ Machine learning is not just about building a model! You need to learn how to deploy and monitor a model.
β MLOps training: model versioning, monitoring, model retraining.
β Deployment models: Flask, FastAPI, Docker
π Course: Stanford MLOps Course
βββββ
π’ Step 8: Stay up to date and network
βοΈ Data science is changing every day, so it is necessary to update yourself every day and stay in regular contact with experienced people and experts in this field.
β Read scientific articles: arXiv, Google Scholar
β Connect with the data community:
π Site: Papers with code
π Site: AI Research at Google
#ArtificialIntelligence #AI #MachineLearning #LargeLanguageModels #LLMs #DeepLearning #NLP #NaturalLanguageProcessing #AIResearch #TechBooks #AIApplications #DataScience #FutureOfAI #AIEducation #LearnAI #TechInnovation #AIethics #GPT #BERT #T5 #AIBook #data
β€8
Tools Every AI Engineer Should Know
1. Data Science Tools
Python: Preferred language with libraries like NumPy, Pandas, Scikit-learn.
R: Ideal for statistical analysis and data visualization.
Jupyter Notebook: Interactive coding environment for Python and R.
MATLAB: Used for mathematical modeling and algorithm development.
RapidMiner: Drag-and-drop platform for machine learning workflows.
KNIME: Open-source analytics platform for data integration and analysis.
2. Machine Learning Tools
Scikit-learn: Comprehensive library for traditional ML algorithms.
XGBoost & LightGBM: Specialized tools for gradient boosting.
TensorFlow: Open-source framework for ML and DL.
PyTorch: Popular DL framework with a dynamic computation graph.
H2O.ai: Scalable platform for ML and AutoML.
Auto-sklearn: AutoML for automating the ML pipeline.
3. Deep Learning Tools
Keras: User-friendly high-level API for building neural networks.
PyTorch: Excellent for research and production in DL.
TensorFlow: Versatile for both research and deployment.
ONNX: Open format for model interoperability.
OpenCV: For image processing and computer vision.
Hugging Face: Focused on natural language processing.
4. Data Engineering Tools
Apache Hadoop: Framework for distributed storage and processing.
Apache Spark: Fast cluster-computing framework.
Kafka: Distributed streaming platform.
Airflow: Workflow automation tool.
Fivetran: ETL tool for data integration.
dbt: Data transformation tool using SQL.
5. Data Visualization Tools
Tableau: Drag-and-drop BI tool for interactive dashboards.
Power BI: Microsoftβs BI platform for data analysis and visualization.
Matplotlib & Seaborn: Python libraries for static and interactive plots.
Plotly: Interactive plotting library with Dash for web apps.
D3.js: JavaScript library for creating dynamic web visualizations.
6. Cloud Platforms
AWS: Services like SageMaker for ML model building.
Google Cloud Platform (GCP): Tools like BigQuery and AutoML.
Microsoft Azure: Azure ML Studio for ML workflows.
IBM Watson: AI platform for custom model development.
7. Version Control and Collaboration Tools
Git: Version control system.
GitHub/GitLab: Platforms for code sharing and collaboration.
Bitbucket: Version control for teams.
8. Other Essential Tools
Docker: For containerizing applications.
Kubernetes: Orchestration of containerized applications.
MLflow: Experiment tracking and deployment.
Weights & Biases (W&B): Experiment tracking and collaboration.
Pandas Profiling: Automated data profiling.
BigQuery/Athena: Serverless data warehousing tools.
Mastering these tools will ensure you are well-equipped to handle various challenges across the AI lifecycle.
#artificialintelligence
1. Data Science Tools
Python: Preferred language with libraries like NumPy, Pandas, Scikit-learn.
R: Ideal for statistical analysis and data visualization.
Jupyter Notebook: Interactive coding environment for Python and R.
MATLAB: Used for mathematical modeling and algorithm development.
RapidMiner: Drag-and-drop platform for machine learning workflows.
KNIME: Open-source analytics platform for data integration and analysis.
2. Machine Learning Tools
Scikit-learn: Comprehensive library for traditional ML algorithms.
XGBoost & LightGBM: Specialized tools for gradient boosting.
TensorFlow: Open-source framework for ML and DL.
PyTorch: Popular DL framework with a dynamic computation graph.
H2O.ai: Scalable platform for ML and AutoML.
Auto-sklearn: AutoML for automating the ML pipeline.
3. Deep Learning Tools
Keras: User-friendly high-level API for building neural networks.
PyTorch: Excellent for research and production in DL.
TensorFlow: Versatile for both research and deployment.
ONNX: Open format for model interoperability.
OpenCV: For image processing and computer vision.
Hugging Face: Focused on natural language processing.
4. Data Engineering Tools
Apache Hadoop: Framework for distributed storage and processing.
Apache Spark: Fast cluster-computing framework.
Kafka: Distributed streaming platform.
Airflow: Workflow automation tool.
Fivetran: ETL tool for data integration.
dbt: Data transformation tool using SQL.
5. Data Visualization Tools
Tableau: Drag-and-drop BI tool for interactive dashboards.
Power BI: Microsoftβs BI platform for data analysis and visualization.
Matplotlib & Seaborn: Python libraries for static and interactive plots.
Plotly: Interactive plotting library with Dash for web apps.
D3.js: JavaScript library for creating dynamic web visualizations.
6. Cloud Platforms
AWS: Services like SageMaker for ML model building.
Google Cloud Platform (GCP): Tools like BigQuery and AutoML.
Microsoft Azure: Azure ML Studio for ML workflows.
IBM Watson: AI platform for custom model development.
GitHub/GitLab: Platforms for code sharing and collaboration.
Bitbucket: Version control for teams.
8. Other Essential Tools
Docker: For containerizing applications.
Kubernetes: Orchestration of containerized applications.
MLflow: Experiment tracking and deployment.
Weights & Biases (W&B): Experiment tracking and collaboration.
Pandas Profiling: Automated data profiling.
BigQuery/Athena: Serverless data warehousing tools.
Mastering these tools will ensure you are well-equipped to handle various challenges across the AI lifecycle.
#artificialintelligence
β€4