Pandas is a powerful and versatile library in Python, especially for data science tasks.
Here are some key Pandas methods that are widely used:
Data Loading and Creation
* read_csv(): Reads data from a CSV file into a DataFrame.
* read_excel(): Reads data from an Excel file into a DataFrame.
* DataFrame(): Creates a new DataFrame from a dictionary, list, or NumPy array.
Data Exploration and Selection
* head(): Returns the first few rows of a DataFrame.
* tail(): Returns the last few rows of a DataFrame.
* shape(): Returns the dimensions of a DataFrame (rows, columns).
* info(): Provides summary information about the DataFrame, including data types and missing values.
* describe(): Generates summary statistics for numerical columns.
* loc[]: Selects rows and columns by label.
* iloc[]: Selects rows and columns by integer position.
* filter(): Selects columns by name.
Data Cleaning and Transformation
* dropna(): Removes rows or columns with missing values.
* fillna(): Fills missing values with a specified value or strategy.
* drop_duplicates(): Removes duplicate rows.
* apply(): Applies a function to each element or row/column.
* groupby(): Groups data based on one or more columns and performs aggregate functions.
* pivot_table(): Creates a pivot table for data summarization.
* merge(): Merges DataFrames based on a common column.
Data Visualization
* plot(): Creates various types of plots (line, bar, scatter, etc.).
* hist(): Creates a histogram.
* boxplot(): Creates a box plot.
These are just a few examples of the many powerful methods that Pandas offers. By mastering these methods, you can efficiently load, clean, transform, analyze, and visualize data for your data science projects.
Example:
import pandas as pd
# Load data from a CSV file
df = pd.read_csv('data.csv')
# Select the first 5 rows
print(df.head())
# Group data by a column and calculate the mean
grouped_df = df.groupby('column_name').mean()
# Create a bar plot
grouped_df.plot(kind='bar')
Here are some key Pandas methods that are widely used:
Data Loading and Creation
* read_csv(): Reads data from a CSV file into a DataFrame.
* read_excel(): Reads data from an Excel file into a DataFrame.
* DataFrame(): Creates a new DataFrame from a dictionary, list, or NumPy array.
Data Exploration and Selection
* head(): Returns the first few rows of a DataFrame.
* tail(): Returns the last few rows of a DataFrame.
* shape(): Returns the dimensions of a DataFrame (rows, columns).
* info(): Provides summary information about the DataFrame, including data types and missing values.
* describe(): Generates summary statistics for numerical columns.
* loc[]: Selects rows and columns by label.
* iloc[]: Selects rows and columns by integer position.
* filter(): Selects columns by name.
Data Cleaning and Transformation
* dropna(): Removes rows or columns with missing values.
* fillna(): Fills missing values with a specified value or strategy.
* drop_duplicates(): Removes duplicate rows.
* apply(): Applies a function to each element or row/column.
* groupby(): Groups data based on one or more columns and performs aggregate functions.
* pivot_table(): Creates a pivot table for data summarization.
* merge(): Merges DataFrames based on a common column.
Data Visualization
* plot(): Creates various types of plots (line, bar, scatter, etc.).
* hist(): Creates a histogram.
* boxplot(): Creates a box plot.
These are just a few examples of the many powerful methods that Pandas offers. By mastering these methods, you can efficiently load, clean, transform, analyze, and visualize data for your data science projects.
Example:
import pandas as pd
# Load data from a CSV file
df = pd.read_csv('data.csv')
# Select the first 5 rows
print(df.head())
# Group data by a column and calculate the mean
grouped_df = df.groupby('column_name').mean()
# Create a bar plot
grouped_df.plot(kind='bar')
๐10โค4
10 Ways to Speed Up Your Python Code
1. List Comprehensions
numbers = [x**2 for x in range(100000) if x % 2 == 0]
instead of
numbers = []
for x in range(100000):
if x % 2 == 0:
numbers.append(x**2)
2. Use the Built-In Functions
Many of Pythonโs built-in functions are written in C, which makes them much faster than a pure python solution.
3. Function Calls Are Expensive
Function calls are expensive in Python. While it is often good practice to separate code into functions, there are times where you should be cautious about calling functions from inside of a loop. It is better to iterate inside a function than to iterate and call a function each iteration.
4. Lazy Module Importing
If you want to use the time.sleep() function in your code, you don't necessarily need to import the entire time package. Instead, you can just do from time import sleep and avoid the overhead of loading basically everything.
5. Take Advantage of Numpy
Numpy is a highly optimized library built with C. It is almost always faster to offload complex math to Numpy rather than relying on the Python interpreter.
6. Try Multiprocessing
Multiprocessing can bring large performance increases to a Python script, but it can be difficult to implement properly compared to other methods mentioned in this post.
7. Be Careful with Bulky Libraries
One of the advantages Python has over other programming languages is the rich selection of third-party libraries available to developers. But, what we may not always consider is the size of the library we are using as a dependency, which could actually decrease the performance of your Python code.
8. Avoid Global Variables
Python is slightly faster at retrieving local variables than global ones. It is simply best to avoid global variables when possible.
9. Try Multiple Solutions
Being able to solve a problem in multiple ways is nice. But, there is often a solution that is faster than the rest and sometimes it comes down to just using a different method or data structure.
10. Think About Your Data Structures
Searching a dictionary or set is insanely fast, but lists take time proportional to the length of the list. However, sets and dictionaries do not maintain order. If you care about the order of your data, you canโt make use of dictionaries or sets.
Best Programming Resources: https://topmate.io/coding/898340
All the best ๐๐
1. List Comprehensions
numbers = [x**2 for x in range(100000) if x % 2 == 0]
instead of
numbers = []
for x in range(100000):
if x % 2 == 0:
numbers.append(x**2)
2. Use the Built-In Functions
Many of Pythonโs built-in functions are written in C, which makes them much faster than a pure python solution.
3. Function Calls Are Expensive
Function calls are expensive in Python. While it is often good practice to separate code into functions, there are times where you should be cautious about calling functions from inside of a loop. It is better to iterate inside a function than to iterate and call a function each iteration.
4. Lazy Module Importing
If you want to use the time.sleep() function in your code, you don't necessarily need to import the entire time package. Instead, you can just do from time import sleep and avoid the overhead of loading basically everything.
5. Take Advantage of Numpy
Numpy is a highly optimized library built with C. It is almost always faster to offload complex math to Numpy rather than relying on the Python interpreter.
6. Try Multiprocessing
Multiprocessing can bring large performance increases to a Python script, but it can be difficult to implement properly compared to other methods mentioned in this post.
7. Be Careful with Bulky Libraries
One of the advantages Python has over other programming languages is the rich selection of third-party libraries available to developers. But, what we may not always consider is the size of the library we are using as a dependency, which could actually decrease the performance of your Python code.
8. Avoid Global Variables
Python is slightly faster at retrieving local variables than global ones. It is simply best to avoid global variables when possible.
9. Try Multiple Solutions
Being able to solve a problem in multiple ways is nice. But, there is often a solution that is faster than the rest and sometimes it comes down to just using a different method or data structure.
10. Think About Your Data Structures
Searching a dictionary or set is insanely fast, but lists take time proportional to the length of the list. However, sets and dictionaries do not maintain order. If you care about the order of your data, you canโt make use of dictionaries or sets.
Best Programming Resources: https://topmate.io/coding/898340
All the best ๐๐
๐7
PROGRAMMING LANGUAGES YOU SHOULD LEARN TO BECOME
๐ฉโ๐ป๐งโ๐ปโ๏ธ[ Web Developer]
PHP, C#, JS, JAVA, Python, Ruby
โ๏ธ[ Game Developer]
Java, C++, Python, JS, Ruby, C, C#
โ๏ธ[ Data Analysis]
R, Matlab, Java, Python
โ๏ธ[ Desktop Developer]
Java, C#, C++, Python
โ๏ธ[ Embedded System Program]
C, Python, C++
โ๏ธ[ Mobile Apps Development]
Kotlin, Dart, Objective-C, Java, Python, JS, Swift, C#
Join this community for FAANG Jobs : https://t.iss.one/faangjob
๐4
Learning Python for data science can be a rewarding experience. Here are some steps you can follow to get started:
1. Learn the Basics of Python: Start by learning the basics of Python programming language such as syntax, data types, functions, loops, and conditional statements. There are many online resources available for free to learn Python.
2. Understand Data Structures and Libraries: Familiarize yourself with data structures like lists, dictionaries, tuples, and sets. Also, learn about popular Python libraries used in data science such as NumPy, Pandas, Matplotlib, and Scikit-learn.
3. Practice with Projects: Start working on small data science projects to apply your knowledge. You can find datasets online to practice your skills and build your portfolio.
4. Take Online Courses: Enroll in online courses specifically tailored for learning Python for data science. Websites like Coursera, Udemy, and DataCamp offer courses on Python programming for data science.
5. Join Data Science Communities: Join online communities and forums like Stack Overflow, Reddit, or Kaggle to connect with other data science enthusiasts and get help with any questions you may have.
6. Read Books: There are many great books available on Python for data science that can help you deepen your understanding of the subject. Some popular books include "Python for Data Analysis" by Wes McKinney and "Data Science from Scratch" by Joel Grus.
7. Practice Regularly: Practice is key to mastering any skill. Make sure to practice regularly and work on real-world data science problems to improve your skills.
Free Resources on WhatsApp: https://whatsapp.com/channel/0029VauCKUI6WaKrgTHrRD0i
1. Learn the Basics of Python: Start by learning the basics of Python programming language such as syntax, data types, functions, loops, and conditional statements. There are many online resources available for free to learn Python.
2. Understand Data Structures and Libraries: Familiarize yourself with data structures like lists, dictionaries, tuples, and sets. Also, learn about popular Python libraries used in data science such as NumPy, Pandas, Matplotlib, and Scikit-learn.
3. Practice with Projects: Start working on small data science projects to apply your knowledge. You can find datasets online to practice your skills and build your portfolio.
4. Take Online Courses: Enroll in online courses specifically tailored for learning Python for data science. Websites like Coursera, Udemy, and DataCamp offer courses on Python programming for data science.
5. Join Data Science Communities: Join online communities and forums like Stack Overflow, Reddit, or Kaggle to connect with other data science enthusiasts and get help with any questions you may have.
6. Read Books: There are many great books available on Python for data science that can help you deepen your understanding of the subject. Some popular books include "Python for Data Analysis" by Wes McKinney and "Data Science from Scratch" by Joel Grus.
7. Practice Regularly: Practice is key to mastering any skill. Make sure to practice regularly and work on real-world data science problems to improve your skills.
Free Resources on WhatsApp: https://whatsapp.com/channel/0029VauCKUI6WaKrgTHrRD0i
๐4
If you want to learn Python for data analysis, focus on these essentials
Don't aim for this:
NumPy - 100%
Pandas - 0%
Matplotlib - 0%
Seaborn - 0%
OS - 0%
Aim for this:
NumPy - 25%
Pandas - 25%
Matplotlib - 25%
Seaborn - 25%
OS - 25%
You don't need to master everything at once.
Focus on the essentials to build a strong foundation.
#python
Don't aim for this:
NumPy - 100%
Pandas - 0%
Matplotlib - 0%
Seaborn - 0%
OS - 0%
Aim for this:
NumPy - 25%
Pandas - 25%
Matplotlib - 25%
Seaborn - 25%
OS - 25%
You don't need to master everything at once.
Focus on the essentials to build a strong foundation.
#python
๐14๐4โค1
Essential Python Libraries to build your career in Data Science ๐๐
1. NumPy:
- Efficient numerical operations and array manipulation.
2. Pandas:
- Data manipulation and analysis with powerful data structures (DataFrame, Series).
3. Matplotlib:
- 2D plotting library for creating visualizations.
4. Seaborn:
- Statistical data visualization built on top of Matplotlib.
5. Scikit-learn:
- Machine learning toolkit for classification, regression, clustering, etc.
6. TensorFlow:
- Open-source machine learning framework for building and deploying ML models.
7. PyTorch:
- Deep learning library, particularly popular for neural network research.
8. SciPy:
- Library for scientific and technical computing.
9. Statsmodels:
- Statistical modeling and econometrics in Python.
10. NLTK (Natural Language Toolkit):
- Tools for working with human language data (text).
11. Gensim:
- Topic modeling and document similarity analysis.
12. Keras:
- High-level neural networks API, running on top of TensorFlow.
13. Plotly:
- Interactive graphing library for making interactive plots.
14. Beautiful Soup:
- Web scraping library for pulling data out of HTML and XML files.
15. OpenCV:
- Library for computer vision tasks.
As a beginner, you can start with Pandas and NumPy for data manipulation and analysis. For data visualization, Matplotlib and Seaborn are great starting points. As you progress, you can explore machine learning with Scikit-learn, TensorFlow, and PyTorch.
Free Notes & Books to learn Data Science: https://t.iss.one/datasciencefree
Python Project Ideas: https://t.iss.one/dsabooks/85
Best Resources to learn Python & Data Science ๐๐
Python Tutorial
Data Science Course by Kaggle
Machine Learning Course by Google
Best Data Science & Machine Learning Resources
Interview Process for Data Science Role at Amazon
Python Interview Resources
Join @free4unow_backup for more free courses
Like for more โค๏ธ
ENJOY LEARNING๐๐
1. NumPy:
- Efficient numerical operations and array manipulation.
2. Pandas:
- Data manipulation and analysis with powerful data structures (DataFrame, Series).
3. Matplotlib:
- 2D plotting library for creating visualizations.
4. Seaborn:
- Statistical data visualization built on top of Matplotlib.
5. Scikit-learn:
- Machine learning toolkit for classification, regression, clustering, etc.
6. TensorFlow:
- Open-source machine learning framework for building and deploying ML models.
7. PyTorch:
- Deep learning library, particularly popular for neural network research.
8. SciPy:
- Library for scientific and technical computing.
9. Statsmodels:
- Statistical modeling and econometrics in Python.
10. NLTK (Natural Language Toolkit):
- Tools for working with human language data (text).
11. Gensim:
- Topic modeling and document similarity analysis.
12. Keras:
- High-level neural networks API, running on top of TensorFlow.
13. Plotly:
- Interactive graphing library for making interactive plots.
14. Beautiful Soup:
- Web scraping library for pulling data out of HTML and XML files.
15. OpenCV:
- Library for computer vision tasks.
As a beginner, you can start with Pandas and NumPy for data manipulation and analysis. For data visualization, Matplotlib and Seaborn are great starting points. As you progress, you can explore machine learning with Scikit-learn, TensorFlow, and PyTorch.
Free Notes & Books to learn Data Science: https://t.iss.one/datasciencefree
Python Project Ideas: https://t.iss.one/dsabooks/85
Best Resources to learn Python & Data Science ๐๐
Python Tutorial
Data Science Course by Kaggle
Machine Learning Course by Google
Best Data Science & Machine Learning Resources
Interview Process for Data Science Role at Amazon
Python Interview Resources
Join @free4unow_backup for more free courses
Like for more โค๏ธ
ENJOY LEARNING๐๐
โค2
Python Game Development Roadmap
Stage 1 - Learn Python basics (syntax, OOP).
Stage 2 - Study game physics and logic fundamentals.
Stage 3 - Use Pygame to prototype 2D games.
Stage 4 - Add input systems (controllers, keyboard, mouse).
Stage 5 - Add sound effects with PyGame Mixer.
Stage 6 - Explore OpenGL or Panda3D for 3D games.
Stage 7 - Add visual effects (shaders, lighting).
Stage 8 - Package and distribute games with tools like cx_Freeze or PyInstaller.
๐ โ Python Game Developer
Stage 1 - Learn Python basics (syntax, OOP).
Stage 2 - Study game physics and logic fundamentals.
Stage 3 - Use Pygame to prototype 2D games.
Stage 4 - Add input systems (controllers, keyboard, mouse).
Stage 5 - Add sound effects with PyGame Mixer.
Stage 6 - Explore OpenGL or Panda3D for 3D games.
Stage 7 - Add visual effects (shaders, lighting).
Stage 8 - Package and distribute games with tools like cx_Freeze or PyInstaller.
๐ โ Python Game Developer
๐๐ฒ๐ฎ๐ฟ๐ป ๐๐ฎ๐๐ฎ ๐ฆ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐ฒ ๐ณ๐ผ๐ฟ ๐๐ฅ๐๐ (๐ก๐ผ ๐ฆ๐๐ฟ๐ถ๐ป๐ด๐ ๐๐๐๐ฎ๐ฐ๐ต๐ฒ๐ฑ)
๐ก๐ผ ๐ณ๐ฎ๐ป๐ฐ๐ ๐ฐ๐ผ๐๐ฟ๐๐ฒ๐, ๐ป๐ผ ๐ฐ๐ผ๐ป๐ฑ๐ถ๐๐ถ๐ผ๐ป๐, ๐ท๐๐๐ ๐ฝ๐๐ฟ๐ฒ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด.
๐๐ฒ๐ฟ๐ฒโ๐ ๐ต๐ผ๐ ๐๐ผ ๐ฏ๐ฒ๐ฐ๐ผ๐บ๐ฒ ๐ฎ ๐๐ฎ๐๐ฎ ๐ฆ๐ฐ๐ถ๐ฒ๐ป๐๐ถ๐๐ ๐ณ๐ผ๐ฟ ๐๐ฅ๐๐:
1๏ธโฃ Python Programming for Data Science โ Harvardโs CS50P
The best intro to Python for absolute beginners:
โฌ Covers loops, data structures, and practical exercises.
โฌ Designed to help you build foundational coding skills.
Link: https://cs50.harvard.edu/python/
https://t.iss.one/datasciencefun
2๏ธโฃ Statistics & Probability โ Khan Academy
Want to master probability, distributions, and hypothesis testing? This is where to start:
โฌ Clear, beginner-friendly videos.
โฌ Exercises to test your skills.
Link: https://www.khanacademy.org/math/statistics-probability
https://whatsapp.com/channel/0029Vat3Dc4KAwEcfFbNnZ3O
3๏ธโฃ Linear Algebra for Data Science โ 3Blue1Brown
โฌ Learn about matrices, vectors, and transformations.
โฌ Essential for machine learning models.
Link: https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9KzVk3AjplI5PYPxkUr
4๏ธโฃ SQL Basics โ Mode Analytics
SQL is the backbone of data manipulation. This tutorial covers:
โฌ Writing queries, joins, and filtering data.
โฌ Real-world datasets to practice.
Link: https://mode.com/sql-tutorial
https://whatsapp.com/channel/0029VanC5rODzgT6TiTGoa1v
5๏ธโฃ Data Visualization โ freeCodeCamp
Learn to create stunning visualizations using Python libraries:
โฌ Covers Matplotlib, Seaborn, and Plotly.
โฌ Step-by-step projects included.
Link: https://www.youtube.com/watch?v=JLzTJhC2DZg
https://whatsapp.com/channel/0029VaxaFzoEQIaujB31SO34
6๏ธโฃ Machine Learning Basics โ Googleโs Machine Learning Crash Course
An in-depth introduction to machine learning for beginners:
โฌ Learn supervised and unsupervised learning.
โฌ Hands-on coding with TensorFlow.
Link: https://developers.google.com/machine-learning/crash-course
7๏ธโฃ Deep Learning โ Fast.aiโs Free Course
Fast.ai makes deep learning easy and accessible:
โฌ Build neural networks with PyTorch.
โฌ Learn by coding real projects.
Link: https://course.fast.ai/
8๏ธโฃ Data Science Projects โ Kaggle
โฌ Compete in challenges to practice your skills.
โฌ Great way to build your portfolio.
Link: https://www.kaggle.com/
๐ก๐ผ ๐ณ๐ฎ๐ป๐ฐ๐ ๐ฐ๐ผ๐๐ฟ๐๐ฒ๐, ๐ป๐ผ ๐ฐ๐ผ๐ป๐ฑ๐ถ๐๐ถ๐ผ๐ป๐, ๐ท๐๐๐ ๐ฝ๐๐ฟ๐ฒ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด.
๐๐ฒ๐ฟ๐ฒโ๐ ๐ต๐ผ๐ ๐๐ผ ๐ฏ๐ฒ๐ฐ๐ผ๐บ๐ฒ ๐ฎ ๐๐ฎ๐๐ฎ ๐ฆ๐ฐ๐ถ๐ฒ๐ป๐๐ถ๐๐ ๐ณ๐ผ๐ฟ ๐๐ฅ๐๐:
1๏ธโฃ Python Programming for Data Science โ Harvardโs CS50P
The best intro to Python for absolute beginners:
โฌ Covers loops, data structures, and practical exercises.
โฌ Designed to help you build foundational coding skills.
Link: https://cs50.harvard.edu/python/
https://t.iss.one/datasciencefun
2๏ธโฃ Statistics & Probability โ Khan Academy
Want to master probability, distributions, and hypothesis testing? This is where to start:
โฌ Clear, beginner-friendly videos.
โฌ Exercises to test your skills.
Link: https://www.khanacademy.org/math/statistics-probability
https://whatsapp.com/channel/0029Vat3Dc4KAwEcfFbNnZ3O
3๏ธโฃ Linear Algebra for Data Science โ 3Blue1Brown
โฌ Learn about matrices, vectors, and transformations.
โฌ Essential for machine learning models.
Link: https://www.youtube.com/playlist?list=PLZHQObOWTQDMsr9KzVk3AjplI5PYPxkUr
4๏ธโฃ SQL Basics โ Mode Analytics
SQL is the backbone of data manipulation. This tutorial covers:
โฌ Writing queries, joins, and filtering data.
โฌ Real-world datasets to practice.
Link: https://mode.com/sql-tutorial
https://whatsapp.com/channel/0029VanC5rODzgT6TiTGoa1v
5๏ธโฃ Data Visualization โ freeCodeCamp
Learn to create stunning visualizations using Python libraries:
โฌ Covers Matplotlib, Seaborn, and Plotly.
โฌ Step-by-step projects included.
Link: https://www.youtube.com/watch?v=JLzTJhC2DZg
https://whatsapp.com/channel/0029VaxaFzoEQIaujB31SO34
6๏ธโฃ Machine Learning Basics โ Googleโs Machine Learning Crash Course
An in-depth introduction to machine learning for beginners:
โฌ Learn supervised and unsupervised learning.
โฌ Hands-on coding with TensorFlow.
Link: https://developers.google.com/machine-learning/crash-course
7๏ธโฃ Deep Learning โ Fast.aiโs Free Course
Fast.ai makes deep learning easy and accessible:
โฌ Build neural networks with PyTorch.
โฌ Learn by coding real projects.
Link: https://course.fast.ai/
8๏ธโฃ Data Science Projects โ Kaggle
โฌ Compete in challenges to practice your skills.
โฌ Great way to build your portfolio.
Link: https://www.kaggle.com/
๐5โค1
Some useful PYTHON libraries for data science
NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms, advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++
SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.
Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook โpylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.
Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Pythonโs usage in data scientist community.
Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.
Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data.
Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of D3.js. Moreover, it has the capability of high-performance interactivity over very large or streaming datasets.
Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data.
Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information.
SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code.
Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient.
Additional libraries, you might need:
os for Operating system and file operations
networkx and igraph for graph based data manipulations
regular expressions for finding patterns in text data
BeautifulSoup for scrapping web. It is inferior to Scrapy as it will extract information from just a single webpage in a run.
NumPy stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms, advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++
SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.
Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook โpylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.
Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Pythonโs usage in data scientist community.
Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.
Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data.
Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of D3.js. Moreover, it has the capability of high-performance interactivity over very large or streaming datasets.
Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data.
Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information.
SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code.
Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient.
Additional libraries, you might need:
os for Operating system and file operations
networkx and igraph for graph based data manipulations
regular expressions for finding patterns in text data
BeautifulSoup for scrapping web. It is inferior to Scrapy as it will extract information from just a single webpage in a run.
โค3๐2๐1