Data Science Machine Learning Data Analysis

💡 Pandas Cheatsheet

A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.

1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 32, 28],
        'City': ['New York', 'Paris', 'New York']}
df = pd.DataFrame(data)

print(df)
#       Name  Age       City
# 0    Alice   25   New York
# 1      Bob   32      Paris
# 2  Charlie   28   New York

• A dictionary is defined where keys become column names and values become the data in those columns. pd.DataFrame() converts it into a tabular structure.

2. Selecting Data with .loc and .iloc
Use .loc for label-based selection and .iloc for integer-position based selection.

# Select the first row by its integer position (0)
print(df.iloc[0])

# Select the row with index label 1 and only the 'Name' column
print(df.loc[1, 'Name'])

# Output for df.iloc[0]:
# Name       Alice
# Age           25
# City    New York
# Name: 0, dtype: object
#
# Output for df.loc[1, 'Name']:
# Bob

• .iloc[0] gets all data from the row at index position 0.
• .loc[1, 'Name'] gets the data at the intersection of index label 1 and column label 'Name'.

3. Filtering Data
Select subsets of data based on conditions.

# Select rows where Age is greater than 27
filtered_df = df[df['Age'] > 27]
print(filtered_df)
#       Name  Age       City
# 1      Bob   32      Paris
# 2  Charlie   28   New York

• The expression df['Age'] > 27 creates a boolean Series (True/False).
• Using this Series as an index df[...] returns only the rows where the value was True.

4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.

# Group by 'City' and calculate the mean age for each city
city_ages = df.groupby('City')['Age'].mean()
print(city_ages)
# City
# New York    26.5
# Paris       32.0
# Name: Age, dtype: float64

• .groupby('City') splits the DataFrame into groups based on unique city values.
• ['Age'].mean() then calculates the mean of the 'Age' column for each of these groups.

#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

❤1👍1

568 views05:00

Data Science Machine Learning Data Analysis

• Group data by a column.

df.groupby('col1')

• Group by a column and get the sum.

df.groupby('col1').sum()

• Apply multiple aggregation functions at once.

df.groupby('col1').agg(['mean', 'count'])

• Get the size of each group.

df.groupby('col1').size()

• Get the frequency counts of unique values in a Series.

df['col1'].value_counts()

• Create a pivot table.

pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])

VI. Merging, Joining & Concatenating

• Merge two DataFrames (like a SQL join).

pd.merge(left_df, right_df, on='key_column')

• Concatenate (stack) DataFrames along an axis.

pd.concat([df1, df2]) # Stacks rows

• Join DataFrames on their indexes.

left_df.join(right_df, how='outer')

VII. Input & Output

• Write a DataFrame to a CSV file.

df.to_csv('output.csv', index=False)

• Write a DataFrame to an Excel file.

df.to_excel('output.xlsx', sheet_name='Sheet1')

• Read data from an Excel file.

pd.read_excel('input.xlsx', sheet_name='Sheet1')

• Read from a SQL database.

pd.read_sql_query('SELECT * FROM my_table', connection_object)

VIII. Time Series & Special Operations

• Use the string accessor (.str) for Series operations.

s.str.lower()
s.str.contains('pattern')

• Use the datetime accessor (.dt) for Series operations.

s.dt.year
s.dt.day_name()

• Create a rolling window calculation.

df['col1'].rolling(window=3).mean()

• Create a basic plot from a Series or DataFrame.

df['col1'].plot(kind='hist')

#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

❤6👍1🔥1

690 views03:07

About

Blog

Apps

Platform