Learn Python this summer Day 16: Data Analysis with Pandas

Learn Python this summer Day 16: Data Analysis with Pandas

Welcome back! Yesterday, we learned about working with APIs. Today, we’ll dive into data analysis with pandas, a powerful library for data manipulation and analysis. By the end of this day, you’ll know how to use pandas to handle and analyze data efficiently. Let’s get started!

What is Pandas?

Pandas is a powerful open-source data analysis and manipulation library for Python. It provides data structures and functions needed to work on structured data seamlessly.

Installing Pandas

If you haven’t installed pandas yet, you can do so using pip:

pip install pandas

Importing Pandas

To start using pandas, import it in your Python script:

import pandas as pd

Creating DataFrames

DataFrames are the primary data structure in pandas. They are similar to Excel spreadsheets or SQL tables.

Example:

import pandas as pd

# Creating a DataFrame from a dictionary
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)

print(df)

Reading Data from Files
You can read data from various file formats, such as CSV, Excel, and JSON, into a DataFrame.

Example:

import pandas as pd

# Reading data from a CSV file
df = pd.read_csv("data.csv")

print(df.head())  # Print the first 5 rows of the DataFrame

Basic DataFrame Operations

You can perform various operations on DataFrames, such as selecting columns, filtering rows, and summarizing data.

Example:

import pandas as pd

# Creating a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)

# Selecting a column
print(df["Name"])

# Filtering rows
print(df[df["Age"] > 30])

# Summarizing data
print(df.describe())

Data Cleaning

Pandas provides functions to handle missing data and clean your dataset.

Example:

import pandas as pd

# Creating a DataFrame with missing values
data = {
    "Name": ["Alice", "Bob", None],
    "Age": [25, None, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}
df = pd.DataFrame(data)

# Handling missing values
df = df.fillna({"Name": "Unknown", "Age": 0})

print(df)

Grouping and Aggregating Data

You can group data and perform aggregate functions using groupby().

Example:

import pandas as pd

# Creating a DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "Alice", "Bob"],
    "Age": [25, 30, 35, 28, 32],
    "City": ["New York", "Los Angeles", "Chicago", "New York", "Los Angeles"]
}
df = pd.DataFrame(data)

# Grouping and aggregating data
grouped = df.groupby("Name").mean()

print(grouped)

Practice Time!

Let’s put what we’ve learned into practice. Write a Python program that reads data from a CSV file, performs basic data analysis, and prints the results.

Example: Analyzing a CSV file with information about students.

import pandas as pd

# Reading data from a CSV file
df = pd.read_csv("students.csv")

# Print the first 5 rows of the DataFrame
print(df.head())

# Selecting a column
print(df["Name"])

# Filtering rows
print(df[df["Grade"] > 85])

# Summarizing data
print(df.describe())

# Grouping and aggregating data
grouped = df.groupby("Class").mean()
print(grouped)

Conclusion

Great job today! You’ve learned how to use pandas for data analysis, which is a powerful tool for handling and analyzing data efficiently. Tomorrow, we’ll dive into data visualization with matplotlib and learn how to create visualizations to better understand our data. Keep practicing and having fun coding!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *