Data Science Machine Learning Data Analysis
38.9K subscribers
3.69K photos
31 videos
39 files
1.28K links
ads: @HusseinSheikho

This channel is for Programmers, Coders, Software Engineers.

1- Data Science
2- Machine Learning
3- Data Visualization
4- Artificial Intelligence
5- Data Analysis
6- Statistics
7- Deep Learning
Download Telegram
💡 Pandas Cheatsheet

A quick guide to essential Pandas operations for data manipulation, focusing on creating, selecting, filtering, and grouping data in a DataFrame.

1. Creating a DataFrame
The primary data structure in Pandas is the DataFrame. It's often created from a dictionary.
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 32, 28],
'City': ['New York', 'Paris', 'New York']}
df = pd.DataFrame(data)

print(df)
# Name Age City
# 0 Alice 25 New York
# 1 Bob 32 Paris
# 2 Charlie 28 New York

• A dictionary is defined where keys become column names and values become the data in those columns. pd.DataFrame() converts it into a tabular structure.

2. Selecting Data with .loc and .iloc
Use .loc for label-based selection and .iloc for integer-position based selection.
# Select the first row by its integer position (0)
print(df.iloc[0])

# Select the row with index label 1 and only the 'Name' column
print(df.loc[1, 'Name'])

# Output for df.iloc[0]:
# Name Alice
# Age 25
# City New York
# Name: 0, dtype: object
#
# Output for df.loc[1, 'Name']:
# Bob

.iloc[0] gets all data from the row at index position 0.
.loc[1, 'Name'] gets the data at the intersection of index label 1 and column label 'Name'.

3. Filtering Data
Select subsets of data based on conditions.
# Select rows where Age is greater than 27
filtered_df = df[df['Age'] > 27]
print(filtered_df)
# Name Age City
# 1 Bob 32 Paris
# 2 Charlie 28 New York

• The expression df['Age'] > 27 creates a boolean Series (True/False).
• Using this Series as an index df[...] returns only the rows where the value was True.

4. Grouping and Aggregating
The "group by" operation involves splitting data into groups, applying a function, and combining the results.
# Group by 'City' and calculate the mean age for each city
city_ages = df.groupby('City')['Age'].mean()
print(city_ages)
# City
# New York 26.5
# Paris 32.0
# Name: Age, dtype: float64

.groupby('City') splits the DataFrame into groups based on unique city values.
['Age'].mean() then calculates the mean of the 'Age' column for each of these groups.

#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM
1👍1
• Group data by a column.
df.groupby('col1')

• Group by a column and get the sum.
df.groupby('col1').sum()

• Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])

• Get the size of each group.
df.groupby('col1').size()

• Get the frequency counts of unique values in a Series.
df['col1'].value_counts()

• Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])


VI. Merging, Joining & Concatenating

• Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')

• Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows

• Join DataFrames on their indexes.
left_df.join(right_df, how='outer')


VII. Input & Output

• Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)

• Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')

• Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')

• Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)


VIII. Time Series & Special Operations

• Use the string accessor (.str) for Series operations.
s.str.lower()
s.str.contains('pattern')

• Use the datetime accessor (.dt) for Series operations.
s.dt.year
s.dt.day_name()

• Create a rolling window calculation.
df['col1'].rolling(window=3).mean()

• Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')


#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM
6👍1🔥1