💡 Pandas Cheatsheet
A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.
1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.
• A dictionary is defined where keys become column names and values become the data in those columns.
2. Selecting Data with
Use
•
•
3. Filtering Data
Select subsets of data based on conditions.
• The expression
• Using this Series as an index
4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.
•
•
#Python #Pandas #DataAnalysis #DataScience #Programming
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.
1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 32, 28],
'City': ['New York', 'Paris', 'New York']}
df = pd.DataFrame(data)
print(df)
# Name Age City
# 0 Alice 25 New York
# 1 Bob 32 Paris
# 2 Charlie 28 New York
• A dictionary is defined where keys become column names and values become the data in those columns.
pd.DataFrame() converts it into a tabular structure.2. Selecting Data with
.loc and .ilocUse
.loc for label-based selection and .iloc for integer-position based selection.# Select the first row by its integer position (0)
print(df.iloc[0])
# Select the row with index label 1 and only the 'Name' column
print(df.loc[1, 'Name'])
# Output for df.iloc[0]:
# Name Alice
# Age 25
# City New York
# Name: 0, dtype: object
#
# Output for df.loc[1, 'Name']:
# Bob
•
.iloc[0] gets all data from the row at index position 0.•
.loc[1, 'Name'] gets the data at the intersection of index label 1 and column label 'Name'.3. Filtering Data
Select subsets of data based on conditions.
# Select rows where Age is greater than 27
filtered_df = df[df['Age'] > 27]
print(filtered_df)
# Name Age City
# 1 Bob 32 Paris
# 2 Charlie 28 New York
• The expression
df['Age'] > 27 creates a boolean Series (True/False).• Using this Series as an index
df[...] returns only the rows where the value was True.4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.
# Group by 'City' and calculate the mean age for each city
city_ages = df.groupby('City')['Age'].mean()
print(city_ages)
# City
# New York 26.5
# Paris 32.0
# Name: Age, dtype: float64
•
.groupby('City') splits the DataFrame into groups based on unique city values.•
['Age'].mean() then calculates the mean of the 'Age' column for each of these groups.#Python #Pandas #DataAnalysis #DataScience #Programming
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❤1👍1
• Group data by a column.
• Group by a column and get the sum.
• Apply multiple aggregation functions at once.
• Get the size of each group.
• Get the frequency counts of unique values in a Series.
• Create a pivot table.
VI. Merging, Joining & Concatenating
• Merge two DataFrames (like a SQL join).
• Concatenate (stack) DataFrames along an axis.
• Join DataFrames on their indexes.
VII. Input & Output
• Write a DataFrame to a CSV file.
• Write a DataFrame to an Excel file.
• Read data from an Excel file.
• Read from a SQL database.
VIII. Time Series & Special Operations
• Use the string accessor (
• Use the datetime accessor (
• Create a rolling window calculation.
• Create a basic plot from a Series or DataFrame.
#Python #Pandas #DataAnalysis #DataScience #Programming
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
df.groupby('col1')• Group by a column and get the sum.
df.groupby('col1').sum()• Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])• Get the size of each group.
df.groupby('col1').size()• Get the frequency counts of unique values in a Series.
df['col1'].value_counts()
• Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
VI. Merging, Joining & Concatenating
• Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')
• Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows
• Join DataFrames on their indexes.
left_df.join(right_df, how='outer')
VII. Input & Output
• Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)• Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')• Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')• Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)VIII. Time Series & Special Operations
• Use the string accessor (
.str) for Series operations.s.str.lower()
s.str.contains('pattern')
• Use the datetime accessor (
.dt) for Series operations.s.dt.year
s.dt.day_name()
• Create a rolling window calculation.
df['col1'].rolling(window=3).mean()
• Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')
#Python #Pandas #DataAnalysis #DataScience #Programming
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❤6👍1🔥1