COMMON TERMINOLOGIES IN PYTHON - PART 1
Have you ever gotten into a discussion with a programmer before? Did you find some of the Terminologies mentioned strange or you didn't fully understand them?
In this series, we would be looking at the common Terminologies in python.
It is important to know these Terminologies to be able to professionally/properly explain your codes to people and/or to be able to understand what people say in an instant when these codes are mentioned. Below are a few:
IDLE (Integrated Development and Learning Environment) - this is an environment that allows you to easily write Python code. IDLE can be used to execute a single statements and create, modify, and execute Python scripts.
Python Shell - This is the interactive environment that allows you to type in python code and execute them immediately
System Python - This is the version of python that comes with your operating system
Prompt - usually represented by the symbol ">>>" and it simply means that python is waiting for you to give it some instructions
REPL (Read-Evaluate-Print-Loop) - this refers to the sequence of events in your interactive window in form of a loop (python reads the code inputted>the code is evaluated>output is printed)
Argument - this is a value that is passed to a function when called eg print("Hello World")... "Hello World" is the argument that is being passed.
Function - this is a code that takes some input, known as arguments, processes that input and produces an output called a return value. E.g print("Hello World")... print is the function
Return Value - this is the value that a function returns to the calling script or function when it completes its task (in other words, Output). E.g.
>>> print("Hello World")
Hello World
Where Hello World is your return value.
Note: A return value can be any of these variable types: handle, integer, object, or string
Script - This is a file where you store your python code in a text file and execute all of the code with a single command
Script files - this is a file containing a group of python scripts
Have you ever gotten into a discussion with a programmer before? Did you find some of the Terminologies mentioned strange or you didn't fully understand them?
In this series, we would be looking at the common Terminologies in python.
It is important to know these Terminologies to be able to professionally/properly explain your codes to people and/or to be able to understand what people say in an instant when these codes are mentioned. Below are a few:
IDLE (Integrated Development and Learning Environment) - this is an environment that allows you to easily write Python code. IDLE can be used to execute a single statements and create, modify, and execute Python scripts.
Python Shell - This is the interactive environment that allows you to type in python code and execute them immediately
System Python - This is the version of python that comes with your operating system
Prompt - usually represented by the symbol ">>>" and it simply means that python is waiting for you to give it some instructions
REPL (Read-Evaluate-Print-Loop) - this refers to the sequence of events in your interactive window in form of a loop (python reads the code inputted>the code is evaluated>output is printed)
Argument - this is a value that is passed to a function when called eg print("Hello World")... "Hello World" is the argument that is being passed.
Function - this is a code that takes some input, known as arguments, processes that input and produces an output called a return value. E.g print("Hello World")... print is the function
Return Value - this is the value that a function returns to the calling script or function when it completes its task (in other words, Output). E.g.
>>> print("Hello World")
Hello World
Where Hello World is your return value.
Note: A return value can be any of these variable types: handle, integer, object, or string
Script - This is a file where you store your python code in a text file and execute all of the code with a single command
Script files - this is a file containing a group of python scripts
๐2
๐๐ป๐ณ๐ผ๐๐๐ ๐ญ๐ฌ๐ฌ% ๐๐ฅ๐๐ ๐๐ฒ๐ฟ๐๐ถ๐ณ๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป ๐๐ผ๐๐ฟ๐๐ฒ๐๐
Infosys Springboard is offering a wide range of 100% free courses with certificates to help you upskill and boost your resumeโat no cost.
Whether youโre a student, graduate, or working professional, this platform has something valuable for everyone.
๐๐ข๐ง๐ค ๐:-
https://pdlink.in/4jsHZXf
Enroll For FREE & Get Certified ๐
Infosys Springboard is offering a wide range of 100% free courses with certificates to help you upskill and boost your resumeโat no cost.
Whether youโre a student, graduate, or working professional, this platform has something valuable for everyone.
๐๐ข๐ง๐ค ๐:-
https://pdlink.in/4jsHZXf
Enroll For FREE & Get Certified ๐
Data Science Learning Plan
Step 1: Mathematics for Data Science (Statistics, Probability, Linear Algebra)
Step 2: Python for Data Science (Basics and Libraries)
Step 3: Data Manipulation and Analysis (Pandas, NumPy)
Step 4: Data Visualization (Matplotlib, Seaborn, Plotly)
Step 5: Databases and SQL for Data Retrieval
Step 6: Introduction to Machine Learning (Supervised and Unsupervised Learning)
Step 7: Data Cleaning and Preprocessing
Step 8: Feature Engineering and Selection
Step 9: Model Evaluation and Tuning
Step 10: Deep Learning (Neural Networks, TensorFlow, Keras)
Step 11: Working with Big Data (Hadoop, Spark)
Step 12: Building Data Science Projects and Portfolio
Data Science Resources
๐๐
https://whatsapp.com/channel/0029Va4QUHa6rsQjhITHK82y
Like for more ๐
Step 1: Mathematics for Data Science (Statistics, Probability, Linear Algebra)
Step 2: Python for Data Science (Basics and Libraries)
Step 3: Data Manipulation and Analysis (Pandas, NumPy)
Step 4: Data Visualization (Matplotlib, Seaborn, Plotly)
Step 5: Databases and SQL for Data Retrieval
Step 6: Introduction to Machine Learning (Supervised and Unsupervised Learning)
Step 7: Data Cleaning and Preprocessing
Step 8: Feature Engineering and Selection
Step 9: Model Evaluation and Tuning
Step 10: Deep Learning (Neural Networks, TensorFlow, Keras)
Step 11: Working with Big Data (Hadoop, Spark)
Step 12: Building Data Science Projects and Portfolio
Data Science Resources
๐๐
https://whatsapp.com/channel/0029Va4QUHa6rsQjhITHK82y
Like for more ๐
๐1
๐ฑ ๐๐ฅ๐๐ ๐ง๐ฒ๐ฐ๐ต ๐๐ฒ๐ฟ๐๐ถ๐ณ๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป ๐๐ผ๐๐ฟ๐๐ฒ๐ ๐๐ฟ๐ผ๐บ ๐ ๐ถ๐ฐ๐ฟ๐ผ๐๐ผ๐ณ๐, ๐๐ช๐ฆ, ๐๐๐ , ๐๐ถ๐๐ฐ๐ผ, ๐ฎ๐ป๐ฑ ๐ฆ๐๐ฎ๐ป๐ณ๐ผ๐ฟ๐ฑ. ๐
- Python
- Artificial Intelligence,
- Cybersecurity
- Cloud Computing, and
- Machine Learning
๐๐ข๐ง๐ค ๐:-
https://pdlink.in/3E2wYNr
Enroll For FREE & Get Certified ๐
- Python
- Artificial Intelligence,
- Cybersecurity
- Cloud Computing, and
- Machine Learning
๐๐ข๐ง๐ค ๐:-
https://pdlink.in/3E2wYNr
Enroll For FREE & Get Certified ๐
Three different learning styles in machine learning algorithms:
1. Supervised Learning
Input data is called training data and has a known label or result such as spam/not-spam or a stock price at a time.
A model is prepared through a training process in which it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data.
Example problems are classification and regression.
Example algorithms include: Logistic Regression and the Back Propagation Neural Network.
2. Unsupervised Learning
Input data is not labeled and does not have a known result.
A model is prepared by deducing structures present in the input data. This may be to extract general rules. It may be through a mathematical process to systematically reduce redundancy, or it may be to organize data by similarity.
Example problems are clustering, dimensionality reduction and association rule learning.
Example algorithms include: the Apriori algorithm and K-Means.
3. Semi-Supervised Learning
Input data is a mixture of labeled and unlabelled examples.
There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions.
Example problems are classification and regression.
Example algorithms are extensions to other flexible methods that make assumptions about how to model the unlabeled data.
1. Supervised Learning
Input data is called training data and has a known label or result such as spam/not-spam or a stock price at a time.
A model is prepared through a training process in which it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data.
Example problems are classification and regression.
Example algorithms include: Logistic Regression and the Back Propagation Neural Network.
2. Unsupervised Learning
Input data is not labeled and does not have a known result.
A model is prepared by deducing structures present in the input data. This may be to extract general rules. It may be through a mathematical process to systematically reduce redundancy, or it may be to organize data by similarity.
Example problems are clustering, dimensionality reduction and association rule learning.
Example algorithms include: the Apriori algorithm and K-Means.
3. Semi-Supervised Learning
Input data is a mixture of labeled and unlabelled examples.
There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions.
Example problems are classification and regression.
Example algorithms are extensions to other flexible methods that make assumptions about how to model the unlabeled data.
๐2โค1
๐ฅ Data Science Roadmap 2025
Step 1: ๐ Python Basics
Step 2: ๐ Data Analysis (Pandas, NumPy)
Step 3: ๐ Data Visualization (Matplotlib, Seaborn)
Step 4: ๐ค Machine Learning (Scikit-learn)
Step 5: ๏ฟฝ Deep Learning (TensorFlow/PyTorch)
Step 6: ๐๏ธ SQL & Big Data (Spark)
Step 7: ๐ Deploy Models (Flask, FastAPI)
Step 8: ๐ข Showcase Projects
Step 9: ๐ผ Land a Job!
๐ Pro Tip: Compete on Kaggle
#datascience
Step 1: ๐ Python Basics
Step 2: ๐ Data Analysis (Pandas, NumPy)
Step 3: ๐ Data Visualization (Matplotlib, Seaborn)
Step 4: ๐ค Machine Learning (Scikit-learn)
Step 5: ๏ฟฝ Deep Learning (TensorFlow/PyTorch)
Step 6: ๐๏ธ SQL & Big Data (Spark)
Step 7: ๐ Deploy Models (Flask, FastAPI)
Step 8: ๐ข Showcase Projects
Step 9: ๐ผ Land a Job!
๐ Pro Tip: Compete on Kaggle
#datascience
๐ฅ5
Complete Machine Learning Roadmap
๐๐
1. Introduction to Machine Learning
- Definition
- Purpose
- Types of Machine Learning (Supervised, Unsupervised, Reinforcement)
2. Mathematics for Machine Learning
- Linear Algebra
- Calculus
- Statistics and Probability
3. Programming Languages for ML
- Python and Libraries (NumPy, Pandas, Matplotlib)
- R
4. Data Preprocessing
- Handling Missing Data
- Feature Scaling
- Data Transformation
5. Exploratory Data Analysis (EDA)
- Data Visualization
- Descriptive Statistics
6. Supervised Learning
- Regression
- Classification
- Model Evaluation
7. Unsupervised Learning
- Clustering (K-Means, Hierarchical)
- Dimensionality Reduction (PCA)
8. Model Selection and Evaluation
- Cross-Validation
- Hyperparameter Tuning
- Evaluation Metrics (Precision, Recall, F1 Score)
9. Ensemble Learning
- Random Forest
- Gradient Boosting
10. Neural Networks and Deep Learning
- Introduction to Neural Networks
- Building and Training Neural Networks
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
11. Natural Language Processing (NLP)
- Text Preprocessing
- Sentiment Analysis
- Named Entity Recognition (NER)
12. Reinforcement Learning
- Basics
- Markov Decision Processes
- Q-Learning
13. Machine Learning Frameworks
- TensorFlow
- PyTorch
- Scikit-Learn
14. Deployment of ML Models
- Flask for Web Deployment
- Docker and Kubernetes
15. Ethical and Responsible AI
- Bias and Fairness
- Ethical Considerations
16. Machine Learning in Production
- Model Monitoring
- Continuous Integration/Continuous Deployment (CI/CD)
17. Real-world Projects and Case Studies
18. Machine Learning Resources
- Online Courses
- Books
- Blogs and Journals
๐ Learning Resources for Machine Learning:
- [Python for Machine Learning](https://t.iss.one/udacityfreecourse/167)
- [Fast.ai: Practical Deep Learning for Coders](https://course.fast.ai/)
- [Intro to Machine Learning](https://learn.microsoft.com/en-us/training/paths/intro-to-ml-with-python/)
๐ Books:
- Machine Learning Interviews
- Machine Learning for Absolute Beginners
๐ Join @free4unow_backup for more free resources.
ENJOY LEARNING! ๐๐
๐๐
1. Introduction to Machine Learning
- Definition
- Purpose
- Types of Machine Learning (Supervised, Unsupervised, Reinforcement)
2. Mathematics for Machine Learning
- Linear Algebra
- Calculus
- Statistics and Probability
3. Programming Languages for ML
- Python and Libraries (NumPy, Pandas, Matplotlib)
- R
4. Data Preprocessing
- Handling Missing Data
- Feature Scaling
- Data Transformation
5. Exploratory Data Analysis (EDA)
- Data Visualization
- Descriptive Statistics
6. Supervised Learning
- Regression
- Classification
- Model Evaluation
7. Unsupervised Learning
- Clustering (K-Means, Hierarchical)
- Dimensionality Reduction (PCA)
8. Model Selection and Evaluation
- Cross-Validation
- Hyperparameter Tuning
- Evaluation Metrics (Precision, Recall, F1 Score)
9. Ensemble Learning
- Random Forest
- Gradient Boosting
10. Neural Networks and Deep Learning
- Introduction to Neural Networks
- Building and Training Neural Networks
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN)
11. Natural Language Processing (NLP)
- Text Preprocessing
- Sentiment Analysis
- Named Entity Recognition (NER)
12. Reinforcement Learning
- Basics
- Markov Decision Processes
- Q-Learning
13. Machine Learning Frameworks
- TensorFlow
- PyTorch
- Scikit-Learn
14. Deployment of ML Models
- Flask for Web Deployment
- Docker and Kubernetes
15. Ethical and Responsible AI
- Bias and Fairness
- Ethical Considerations
16. Machine Learning in Production
- Model Monitoring
- Continuous Integration/Continuous Deployment (CI/CD)
17. Real-world Projects and Case Studies
18. Machine Learning Resources
- Online Courses
- Books
- Blogs and Journals
๐ Learning Resources for Machine Learning:
- [Python for Machine Learning](https://t.iss.one/udacityfreecourse/167)
- [Fast.ai: Practical Deep Learning for Coders](https://course.fast.ai/)
- [Intro to Machine Learning](https://learn.microsoft.com/en-us/training/paths/intro-to-ml-with-python/)
๐ Books:
- Machine Learning Interviews
- Machine Learning for Absolute Beginners
๐ Join @free4unow_backup for more free resources.
ENJOY LEARNING! ๐๐
๐4โค2
Here is a list of 50 data science interview questions that can help you prepare for a data science job interview. These questions cover a wide range of topics and levels of difficulty, so be sure to review them thoroughly and practice your answers.
Mathematics and Statistics:
1. What is the Central Limit Theorem, and why is it important in statistics?
2. Explain the difference between population and sample.
3. What is probability and how is it calculated?
4. What are the measures of central tendency, and when would you use each one?
5. Define variance and standard deviation.
6. What is the significance of hypothesis testing in data science?
7. Explain the p-value and its significance in hypothesis testing.
8. What is a normal distribution, and why is it important in statistics?
9. Describe the differences between a Z-score and a T-score.
10. What is correlation, and how is it measured?
11. What is the difference between covariance and correlation?
12. What is the law of large numbers?
Machine Learning:
13. What is machine learning, and how is it different from traditional programming?
14. Explain the bias-variance trade-off.
15. What are the different types of machine learning algorithms?
16. What is overfitting, and how can you prevent it?
17. Describe the k-fold cross-validation technique.
18. What is regularization, and why is it important in machine learning?
19. Explain the concept of feature engineering.
20. What is gradient descent, and how does it work in machine learning?
21. What is a decision tree, and how does it work?
22. What are ensemble methods in machine learning, and provide examples.
23. Explain the difference between supervised and unsupervised learning.
24. What is deep learning, and how does it differ from traditional neural networks?
25. What is a convolutional neural network (CNN), and where is it commonly used?
26. What is a recurrent neural network (RNN), and where is it commonly used?
27. What is the vanishing gradient problem in deep learning?
28. Describe the concept of transfer learning in deep learning.
Data Preprocessing:
29. What is data preprocessing, and why is it important in data science?
30. Explain missing data imputation techniques.
31. What is one-hot encoding, and when is it used?
32. How do you handle categorical data in machine learning?
33. Describe the process of data normalization and standardization.
34. What is feature scaling, and why is it necessary?
35. What is outlier detection, and how can you identify outliers in a dataset?
Data Exploration:
36. What is exploratory data analysis (EDA), and why is it important?
37. Explain the concept of data distribution.
38. What are box plots, and how are they used in EDA?
39. What is a histogram, and what insights can you gain from it?
40. Describe the concept of data skewness.
41. What are scatter plots, and how are they useful in data analysis?
42. What is a correlation matrix, and how is it used in EDA?
43. How do you handle imbalanced datasets in machine learning?
Model Evaluation:
44. What are the common metrics used for evaluating classification models?
45. Explain precision, recall, and F1-score.
46. What is ROC curve analysis, and what does it measure?
47. How do you choose the appropriate evaluation metric for a regression problem?
48. Describe the concept of confusion matrix.
49. What is cross-entropy loss, and how is it used in classification problems?
50. Explain the concept of AUC-ROC.
Mathematics and Statistics:
1. What is the Central Limit Theorem, and why is it important in statistics?
2. Explain the difference between population and sample.
3. What is probability and how is it calculated?
4. What are the measures of central tendency, and when would you use each one?
5. Define variance and standard deviation.
6. What is the significance of hypothesis testing in data science?
7. Explain the p-value and its significance in hypothesis testing.
8. What is a normal distribution, and why is it important in statistics?
9. Describe the differences between a Z-score and a T-score.
10. What is correlation, and how is it measured?
11. What is the difference between covariance and correlation?
12. What is the law of large numbers?
Machine Learning:
13. What is machine learning, and how is it different from traditional programming?
14. Explain the bias-variance trade-off.
15. What are the different types of machine learning algorithms?
16. What is overfitting, and how can you prevent it?
17. Describe the k-fold cross-validation technique.
18. What is regularization, and why is it important in machine learning?
19. Explain the concept of feature engineering.
20. What is gradient descent, and how does it work in machine learning?
21. What is a decision tree, and how does it work?
22. What are ensemble methods in machine learning, and provide examples.
23. Explain the difference between supervised and unsupervised learning.
24. What is deep learning, and how does it differ from traditional neural networks?
25. What is a convolutional neural network (CNN), and where is it commonly used?
26. What is a recurrent neural network (RNN), and where is it commonly used?
27. What is the vanishing gradient problem in deep learning?
28. Describe the concept of transfer learning in deep learning.
Data Preprocessing:
29. What is data preprocessing, and why is it important in data science?
30. Explain missing data imputation techniques.
31. What is one-hot encoding, and when is it used?
32. How do you handle categorical data in machine learning?
33. Describe the process of data normalization and standardization.
34. What is feature scaling, and why is it necessary?
35. What is outlier detection, and how can you identify outliers in a dataset?
Data Exploration:
36. What is exploratory data analysis (EDA), and why is it important?
37. Explain the concept of data distribution.
38. What are box plots, and how are they used in EDA?
39. What is a histogram, and what insights can you gain from it?
40. Describe the concept of data skewness.
41. What are scatter plots, and how are they useful in data analysis?
42. What is a correlation matrix, and how is it used in EDA?
43. How do you handle imbalanced datasets in machine learning?
Model Evaluation:
44. What are the common metrics used for evaluating classification models?
45. Explain precision, recall, and F1-score.
46. What is ROC curve analysis, and what does it measure?
47. How do you choose the appropriate evaluation metric for a regression problem?
48. Describe the concept of confusion matrix.
49. What is cross-entropy loss, and how is it used in classification problems?
50. Explain the concept of AUC-ROC.
๐5โค1
If you're into deep learning, then you know that students usually one of the two paths:
- Computer vision
- Natural language processing (NLP)
If you're into NLP, here are 5 fundamental concepts you should know:
Before we start, What is NLP?
Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and humans through language.
It enables machines to understand, interpret, and respond to human language in a way that is both meaningful and useful.
Data scientists need NLP to analyze, process, and generate insights from large volumes of textual data, aiding in tasks ranging from sentiment analysis to automated summarization.
Tokenization
Tokenization involves breaking down text into smaller units, such as words or phrases. This is the first step in preprocessing textual data for further analysis or NLP applications.
Part-of-Speech Tagging:
This process involves identifying the part of speech for each word in a sentence (e.g., noun, verb, adjective). It is crucial for various NLP tasks that require understanding the grammatical structure of text.
Stemming and Lemmatization
These techniques reduce words to their base or root form. Stemming cuts off prefixes and suffixes, while lemmatization considers the morphological analysis of the words, leading to more accurate results.
Named Entity Recognition (NER)
NER identifies and classifies named entities in text into predefined categories such as the names of persons, organizations, locations, etc. It's essential for tasks like data extraction from documents and content classification.
Sentiment Analysis
This technique determines the emotional tone behind a body of text. It's widely used in business and social media monitoring to gauge public opinion and customer sentiment.
- Computer vision
- Natural language processing (NLP)
If you're into NLP, here are 5 fundamental concepts you should know:
Before we start, What is NLP?
Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and humans through language.
It enables machines to understand, interpret, and respond to human language in a way that is both meaningful and useful.
Data scientists need NLP to analyze, process, and generate insights from large volumes of textual data, aiding in tasks ranging from sentiment analysis to automated summarization.
Tokenization
Tokenization involves breaking down text into smaller units, such as words or phrases. This is the first step in preprocessing textual data for further analysis or NLP applications.
Part-of-Speech Tagging:
This process involves identifying the part of speech for each word in a sentence (e.g., noun, verb, adjective). It is crucial for various NLP tasks that require understanding the grammatical structure of text.
Stemming and Lemmatization
These techniques reduce words to their base or root form. Stemming cuts off prefixes and suffixes, while lemmatization considers the morphological analysis of the words, leading to more accurate results.
Named Entity Recognition (NER)
NER identifies and classifies named entities in text into predefined categories such as the names of persons, organizations, locations, etc. It's essential for tasks like data extraction from documents and content classification.
Sentiment Analysis
This technique determines the emotional tone behind a body of text. It's widely used in business and social media monitoring to gauge public opinion and customer sentiment.
๐2๐ฅ1
If you want to build agents that donโt break in production...
You must start with the most important pattern:
๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐ฅ๐๐.
This week in the ๐ฆ๐ฒ๐ฐ๐ผ๐ป๐ฑ ๐๐ฟ๐ฎ๐ถ๐ป ๐๐ ๐๐๐๐ถ๐๐๐ฎ๐ป๐ course, we released the Agentic RAG module
... and today, Iโm breaking down how itโs architected from the ground up.
๐ช๐ต๐ฎ๐ ๐ถ๐ ๐๐ต๐ฒ ๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐ฅ๐๐ ๐บ๐ผ๐ฑ๐๐น๐ฒ?
The Agentic RAG module takes a user query via a Gradio UI.
The output is a reasoned answer, generated through:
โ Semantic search from a vector DB
โ Multi-step reasoning via an agent
โ Optional summarization through a model/API
๐ข๐ป๐น๐ถ๐ป๐ฒ ๐๐. ๐ข๐ณ๐ณ๐น๐ถ๐ป๐ฒ ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ๐
Hereโs the thing...
Most GenAI pipelines are offline.
Theyโre pre-scheduled, long-running jobs.
(Using tools such as ZenML)
But this module is online.
It runs as a standalone Python app and powers real-time user interactions.
We intentionally decoupled it from our offline feature/training pipelines to preserve a clean separation between ingestion and inference.
๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐๐ฎ๐๐ฒ๐ฟ: ๐ง๐ผ๐ผ๐น๐ถ๐ป๐ด ๐๐ฟ๐ฒ๐ฎ๐ธ๐ฑ๐ผ๐๐ป
The agent is built using SmolAgents (by Hugging Face) and is powered by 3 tools:
"What can I do?" Tool
โ Helps users explore agent capabilities
Retriever Tool
โ Queries MongoDB 's vector and text indexes (populated offline)
Summarization Tool
โ Hits a REST API for refining long-form web content
Each tool was picked to reflect real-world agent scenarios:
โ Python logic
โ DB queries
โ External API calls
The agent uses these tools iteratively to minimize cost and latency.
All reasoning happens in real time with full traceability via the Gradio UI.
๐ช๐ต๐ฎ๐ ๐ต๐ฎ๐ฝ๐ฝ๐ฒ๐ป๐ ๐๐ป๐ฑ๐ฒ๐ฟ ๐๐ต๐ฒ ๐ต๐ผ๐ผ๐ฑ?
User submits a query
The agent decides: โDo I need context?โ
If yes โ queries the vector DB (retriever tool)
Retrieved chunks optionally go through summarization
The agent reasons โ repeats if more context is needed to answer the question fully
Once confident โ final response returned
We can swap the summarization model for full customization between our custom small language model (hosted as a real-time API on Hugging Face) and OpenAI (as a fallback).
Itโs modular, testable, and future-proof.
๐๐ผ๐๐น๐ฑ ๐๐ฒ ๐ต๐ฎ๐๐ฒ ๐๐๐ฒ๐ฑ ๐ฎ ๐๐ถ๐บ๐ฝ๐น๐ฒ ๐๐ผ๐ฟ๐ธ๐ณ๐น๐ผ๐?
Yes.
But the agentic approach unlocks scalability and extensibility.
This is critical if you want to:
โ Add new tools
โ Support multi-turn reasoning
โ Layer in observability or eval logic later
But this is just the beginning.
Weโll be expanding this system with observability:
- Evaluation
- Prompt monitoring
You must start with the most important pattern:
๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐ฅ๐๐.
This week in the ๐ฆ๐ฒ๐ฐ๐ผ๐ป๐ฑ ๐๐ฟ๐ฎ๐ถ๐ป ๐๐ ๐๐๐๐ถ๐๐๐ฎ๐ป๐ course, we released the Agentic RAG module
... and today, Iโm breaking down how itโs architected from the ground up.
๐ช๐ต๐ฎ๐ ๐ถ๐ ๐๐ต๐ฒ ๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐ฅ๐๐ ๐บ๐ผ๐ฑ๐๐น๐ฒ?
The Agentic RAG module takes a user query via a Gradio UI.
The output is a reasoned answer, generated through:
โ Semantic search from a vector DB
โ Multi-step reasoning via an agent
โ Optional summarization through a model/API
๐ข๐ป๐น๐ถ๐ป๐ฒ ๐๐. ๐ข๐ณ๐ณ๐น๐ถ๐ป๐ฒ ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ๐
Hereโs the thing...
Most GenAI pipelines are offline.
Theyโre pre-scheduled, long-running jobs.
(Using tools such as ZenML)
But this module is online.
It runs as a standalone Python app and powers real-time user interactions.
We intentionally decoupled it from our offline feature/training pipelines to preserve a clean separation between ingestion and inference.
๐๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐๐ฎ๐๐ฒ๐ฟ: ๐ง๐ผ๐ผ๐น๐ถ๐ป๐ด ๐๐ฟ๐ฒ๐ฎ๐ธ๐ฑ๐ผ๐๐ป
The agent is built using SmolAgents (by Hugging Face) and is powered by 3 tools:
"What can I do?" Tool
โ Helps users explore agent capabilities
Retriever Tool
โ Queries MongoDB 's vector and text indexes (populated offline)
Summarization Tool
โ Hits a REST API for refining long-form web content
Each tool was picked to reflect real-world agent scenarios:
โ Python logic
โ DB queries
โ External API calls
The agent uses these tools iteratively to minimize cost and latency.
All reasoning happens in real time with full traceability via the Gradio UI.
๐ช๐ต๐ฎ๐ ๐ต๐ฎ๐ฝ๐ฝ๐ฒ๐ป๐ ๐๐ป๐ฑ๐ฒ๐ฟ ๐๐ต๐ฒ ๐ต๐ผ๐ผ๐ฑ?
User submits a query
The agent decides: โDo I need context?โ
If yes โ queries the vector DB (retriever tool)
Retrieved chunks optionally go through summarization
The agent reasons โ repeats if more context is needed to answer the question fully
Once confident โ final response returned
We can swap the summarization model for full customization between our custom small language model (hosted as a real-time API on Hugging Face) and OpenAI (as a fallback).
Itโs modular, testable, and future-proof.
๐๐ผ๐๐น๐ฑ ๐๐ฒ ๐ต๐ฎ๐๐ฒ ๐๐๐ฒ๐ฑ ๐ฎ ๐๐ถ๐บ๐ฝ๐น๐ฒ ๐๐ผ๐ฟ๐ธ๐ณ๐น๐ผ๐?
Yes.
But the agentic approach unlocks scalability and extensibility.
This is critical if you want to:
โ Add new tools
โ Support multi-turn reasoning
โ Layer in observability or eval logic later
But this is just the beginning.
Weโll be expanding this system with observability:
- Evaluation
- Prompt monitoring
๐2โค1
Who is Data Scientist?
He/she is responsible for collecting, analyzing and interpreting the results, through a large amount of data. This process is used to take an important decision for the business, which can affect the growth and help to face compititon in the market.
A data scientist analyzes data to extract actionable insight from it. More specifically, a data scientist:
Determines correct datasets and variables.
Identifies the most challenging data-analytics problems.
Collects large sets of data- structured and unstructured, from different sources.
Cleans and validates data ensuring accuracy, completeness, and uniformity.
Builds and applies models and algorithms to mine stores of big data.
Analyzes data to recognize patterns and trends.
Interprets data to find solutions.
Communicates findings to stakeholders using tools like visualization.
Join our WhatsApp channel to learn more: https://whatsapp.com/channel/0029Va8v3eo1NCrQfGMseL2D
He/she is responsible for collecting, analyzing and interpreting the results, through a large amount of data. This process is used to take an important decision for the business, which can affect the growth and help to face compititon in the market.
A data scientist analyzes data to extract actionable insight from it. More specifically, a data scientist:
Determines correct datasets and variables.
Identifies the most challenging data-analytics problems.
Collects large sets of data- structured and unstructured, from different sources.
Cleans and validates data ensuring accuracy, completeness, and uniformity.
Builds and applies models and algorithms to mine stores of big data.
Analyzes data to recognize patterns and trends.
Interprets data to find solutions.
Communicates findings to stakeholders using tools like visualization.
Join our WhatsApp channel to learn more: https://whatsapp.com/channel/0029Va8v3eo1NCrQfGMseL2D
๐ฅ4โค1
๐ Data Science Project Ideas for Freshers
Exploratory Data Analysis (EDA) on a Dataset: Choose a dataset of interest and perform thorough EDA to extract insights, visualize trends, and identify patterns.
Predictive Modeling: Build a simple predictive model, such as linear regression, to predict a target variable based on input features. Use libraries like scikit-learn to implement the model.
Classification Problem: Work on a classification task using algorithms like decision trees, random forests, or support vector machines. It could involve classifying emails as spam or not spam, or predicting customer churn.
Time Series Analysis: Analyze time-dependent data, like stock prices or temperature readings, to forecast future values using techniques like ARIMA or LSTM.
Image Classification: Use convolutional neural networks (CNNs) to build an image classification model, perhaps classifying different types of objects or animals.
Natural Language Processing (NLP): Create a sentiment analysis model that classifies text as positive, negative, or neutral, or build a text generator using recurrent neural networks (RNNs).
Clustering Analysis: Apply clustering algorithms like k-means to group similar data points together, such as segmenting customers based on purchasing behaviour.
Recommendation System: Develop a recommendation engine using collaborative filtering techniques to suggest products or content to users.
Anomaly Detection: Build a model to detect anomalies in data, which could be useful for fraud detection or identifying defects in manufacturing processes.
A/B Testing: Design and analyze an A/B test to compare the effectiveness of two different versions of a web page or app feature.
Remember to document your process, explain your methodology, and showcase your projects on platforms like GitHub or a personal portfolio website.
Free datasets to build the projects
๐๐
https://t.iss.one/datasciencefun/1126
ENJOY LEARNING ๐๐
Exploratory Data Analysis (EDA) on a Dataset: Choose a dataset of interest and perform thorough EDA to extract insights, visualize trends, and identify patterns.
Predictive Modeling: Build a simple predictive model, such as linear regression, to predict a target variable based on input features. Use libraries like scikit-learn to implement the model.
Classification Problem: Work on a classification task using algorithms like decision trees, random forests, or support vector machines. It could involve classifying emails as spam or not spam, or predicting customer churn.
Time Series Analysis: Analyze time-dependent data, like stock prices or temperature readings, to forecast future values using techniques like ARIMA or LSTM.
Image Classification: Use convolutional neural networks (CNNs) to build an image classification model, perhaps classifying different types of objects or animals.
Natural Language Processing (NLP): Create a sentiment analysis model that classifies text as positive, negative, or neutral, or build a text generator using recurrent neural networks (RNNs).
Clustering Analysis: Apply clustering algorithms like k-means to group similar data points together, such as segmenting customers based on purchasing behaviour.
Recommendation System: Develop a recommendation engine using collaborative filtering techniques to suggest products or content to users.
Anomaly Detection: Build a model to detect anomalies in data, which could be useful for fraud detection or identifying defects in manufacturing processes.
A/B Testing: Design and analyze an A/B test to compare the effectiveness of two different versions of a web page or app feature.
Remember to document your process, explain your methodology, and showcase your projects on platforms like GitHub or a personal portfolio website.
Free datasets to build the projects
๐๐
https://t.iss.one/datasciencefun/1126
ENJOY LEARNING ๐๐
๐3
Excel vs SQL vs Python (pandas):
1๏ธโฃ Filtering Data
โณ Excel: =FILTER(A2:D100, B2:B100>50) (Excel 365 users)
โณ SQL: SELECT * FROM table WHERE column > 50;
โณ Python: df_filtered = df[df['column'] > 50]
2๏ธโฃ Sorting Data
โณ Excel: Data โ Sort (or =SORT(A2:A100, 1, TRUE))
โณ SQL: SELECT * FROM table ORDER BY column ASC;
โณ Python: df_sorted = df.sort_values(by="column")
3๏ธโฃ Counting Rows
โณ Excel: =COUNTA(A:A)
โณ SQL: SELECT COUNT(*) FROM table;
โณ Python: row_count = len(df)
4๏ธโฃ Removing Duplicates
โณ Excel: Data โ Remove Duplicates
โณ SQL: SELECT DISTINCT * FROM table;
โณ Python: df_unique = df.drop_duplicates()
5๏ธโฃ Joining Tables
โณ Excel: Power Query โ Merge Queries (or VLOOKUP/XLOOKUP)
โณ SQL: SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
โณ Python: df_merged = pd.merge(df1, df2, on="id")
6๏ธโฃ Ranking Data
โณ Excel: =RANK.EQ(A2, $A$2:$A$100)
โณ SQL: SELECT column, RANK() OVER (ORDER BY column DESC) AS rank FROM table;
โณ Python: df["rank"] = df["column"].rank(method="min", ascending=False)
7๏ธโฃ Moving Average Calculation
โณ Excel: =AVERAGE(B2:B4) (manually for rolling window)
โณ SQL: SELECT date, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table;
โณ Python: df["moving_avg"] = df["value"].rolling(window=3).mean()
8๏ธโฃ Running Total
โณ Excel: =SUM($B$2:B2) (drag down)
โณ SQL: SELECT date, SUM(value) OVER (ORDER BY date) AS running_total FROM table;
โณ Python: df["running_total"] = df["value"].cumsum()
1๏ธโฃ Filtering Data
โณ Excel: =FILTER(A2:D100, B2:B100>50) (Excel 365 users)
โณ SQL: SELECT * FROM table WHERE column > 50;
โณ Python: df_filtered = df[df['column'] > 50]
2๏ธโฃ Sorting Data
โณ Excel: Data โ Sort (or =SORT(A2:A100, 1, TRUE))
โณ SQL: SELECT * FROM table ORDER BY column ASC;
โณ Python: df_sorted = df.sort_values(by="column")
3๏ธโฃ Counting Rows
โณ Excel: =COUNTA(A:A)
โณ SQL: SELECT COUNT(*) FROM table;
โณ Python: row_count = len(df)
4๏ธโฃ Removing Duplicates
โณ Excel: Data โ Remove Duplicates
โณ SQL: SELECT DISTINCT * FROM table;
โณ Python: df_unique = df.drop_duplicates()
5๏ธโฃ Joining Tables
โณ Excel: Power Query โ Merge Queries (or VLOOKUP/XLOOKUP)
โณ SQL: SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
โณ Python: df_merged = pd.merge(df1, df2, on="id")
6๏ธโฃ Ranking Data
โณ Excel: =RANK.EQ(A2, $A$2:$A$100)
โณ SQL: SELECT column, RANK() OVER (ORDER BY column DESC) AS rank FROM table;
โณ Python: df["rank"] = df["column"].rank(method="min", ascending=False)
7๏ธโฃ Moving Average Calculation
โณ Excel: =AVERAGE(B2:B4) (manually for rolling window)
โณ SQL: SELECT date, AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM table;
โณ Python: df["moving_avg"] = df["value"].rolling(window=3).mean()
8๏ธโฃ Running Total
โณ Excel: =SUM($B$2:B2) (drag down)
โณ SQL: SELECT date, SUM(value) OVER (ORDER BY date) AS running_total FROM table;
โณ Python: df["running_total"] = df["value"].cumsum()
๐8โค5
Here are some project ideas for a data science and machine learning project focused on generating AI:
1. Natural Language Generation (NLG) Model: Build a model that generates human-like text based on input data. This could be used for creating product descriptions, news articles, or personalized recommendations.
2. Code Generation Model: Develop a model that generates code snippets based on a given task or problem statement. This could help automate software development tasks or assist programmers in writing code more efficiently.
3. Image Captioning Model: Create a model that generates captions for images, describing the content of the image in natural language. This could be useful for visually impaired individuals or for enhancing image search capabilities.
4. Music Generation Model: Build a model that generates music compositions based on input data, such as existing songs or musical patterns. This could be used for creating background music for videos or games.
5. Video Synthesis Model: Develop a model that generates realistic video sequences based on input data, such as a series of images or a textual description. This could be used for generating synthetic training data for computer vision models.
6. Chatbot Generation Model: Create a model that generates conversational agents or chatbots based on input data, such as dialogue datasets or user interactions. This could be used for customer service automation or virtual assistants.
7. Art Generation Model: Build a model that generates artistic images or paintings based on input data, such as art styles, color palettes, or themes. This could be used for creating unique digital artwork or personalized designs.
8. Story Generation Model: Develop a model that generates fictional stories or narratives based on input data, such as plot outlines, character descriptions, or genre preferences. This could be used for creative writing prompts or interactive storytelling applications.
9. Recipe Generation Model: Create a model that generates new recipes based on input data, such as ingredient lists, dietary restrictions, or cuisine preferences. This could be used for meal planning or culinary inspiration.
10. Financial Report Generation Model: Build a model that generates financial reports or summaries based on input data, such as company financial statements, market trends, or investment portfolios. This could be used for automated financial analysis or decision-making support.
Any project which sounds interesting to you?
1. Natural Language Generation (NLG) Model: Build a model that generates human-like text based on input data. This could be used for creating product descriptions, news articles, or personalized recommendations.
2. Code Generation Model: Develop a model that generates code snippets based on a given task or problem statement. This could help automate software development tasks or assist programmers in writing code more efficiently.
3. Image Captioning Model: Create a model that generates captions for images, describing the content of the image in natural language. This could be useful for visually impaired individuals or for enhancing image search capabilities.
4. Music Generation Model: Build a model that generates music compositions based on input data, such as existing songs or musical patterns. This could be used for creating background music for videos or games.
5. Video Synthesis Model: Develop a model that generates realistic video sequences based on input data, such as a series of images or a textual description. This could be used for generating synthetic training data for computer vision models.
6. Chatbot Generation Model: Create a model that generates conversational agents or chatbots based on input data, such as dialogue datasets or user interactions. This could be used for customer service automation or virtual assistants.
7. Art Generation Model: Build a model that generates artistic images or paintings based on input data, such as art styles, color palettes, or themes. This could be used for creating unique digital artwork or personalized designs.
8. Story Generation Model: Develop a model that generates fictional stories or narratives based on input data, such as plot outlines, character descriptions, or genre preferences. This could be used for creative writing prompts or interactive storytelling applications.
9. Recipe Generation Model: Create a model that generates new recipes based on input data, such as ingredient lists, dietary restrictions, or cuisine preferences. This could be used for meal planning or culinary inspiration.
10. Financial Report Generation Model: Build a model that generates financial reports or summaries based on input data, such as company financial statements, market trends, or investment portfolios. This could be used for automated financial analysis or decision-making support.
Any project which sounds interesting to you?
โค5๐1๐ฅ1
Want to build your first AI agent?
Join a live hands-on session by GeeksforGeeks & Salesforce for working professionals
- Build with Agent Builder
- Assign real actions
- Get a free certificate of participation
Registeration link:๐
https://gfgcdn.com/tu/V4t/
Join a live hands-on session by GeeksforGeeks & Salesforce for working professionals
- Build with Agent Builder
- Assign real actions
- Get a free certificate of participation
Registeration link:๐
https://gfgcdn.com/tu/V4t/
www.geeksforgeeks.org
Practice | GeeksforGeeks | A computer science portal for geeks
Platform to practice programming problems. Solve company interview questions and improve your coding intellect
โค1
Build an LLM app with Mixture of AI Agents using small Open Source LLMs that can beat GPT-4o in just 40 lines of Python Code (step-by-step instructions):
โฌ๏ธ
โฌ๏ธ
๐3