What are the two powerful accessors in Pandas for handling strings and datetime operations?

Pandas provides the .str accessor for vectorized string operations and the .dt accessor for vectorized datetime operations.

What is the purpose of vectorized operations in Pandas?

Vectorized operations in Pandas apply one operation to many values at once, allowing for efficient processing of entire columns without breaking on missing values.

How can you convert a string column to title case using Pandas?

You can convert a string column to title case using the .str.title method, which can be chained with other methods like .str.strip to clean and format the text.

What is the function of the errors='coerce' parameter in Pandas datetime operations?

The errors='coerce' parameter in Pandas datetime operations is used to handle invalid date values by converting them to NaT (Not a Time), ensuring the operation does not fail.

How does Pandas handle missing values in string operations?

Pandas handles missing values safely in string operations by preserving them when applying methods through the .str accessor, avoiding errors that occur with traditional Python string methods.

Pandas Vectorized String & DateTime Operations

Pandas Vectorized String Operations and DateTime: Clean Text and Analyze Time

Real datasets are rarely clean.

You may receive columns like:

names with extra spaces
emails with mixed case
phone numbers with symbols
plan codes packed into one string
notes with inconsistent keywords
dates stored as plain text
event timestamps with invalid values

Pandas gives you two powerful accessors for this kind of work:

.str for vectorized string operations
.dt for vectorized datetime operations

This guide uses original sample datasets created for this lesson. It does not use copied passenger, finance, ecommerce, or public dataset examples.

Files used in this lesson:

pandas_string_profiles.csv
pandas_datetime_events.csv

Place both CSV files in the same folder as this Markdown file before running the examples.

What You Will Learn

By the end, you should be able to use:

vectorized operations
Series.str.lower(), upper(), title(), and capitalize()
Series.str.strip(), lstrip(), and rstrip()
Series.str.len()
Series.str.split()
Series.str.get()
Series.str.extract()
Series.str.replace()
Series.str.contains()
Series.str.startswith() and endswith()
Series.str.isdigit(), isalpha(), and isalnum()
string slicing with .str[]
pd.Timestamp
pd.DatetimeIndex
pd.date_range()
pd.to_datetime()
errors="coerce"
.dt.year, .dt.month, .dt.month_name(), .dt.day_name(), .dt.hour
date filtering
pd.to_timedelta()
resampling time series data

1. Setup

python

import pandas as pd
import numpy as np

Load the datasets:

python

profiles = pd.read_csv("pandas_string_profiles.csv")
events = pd.read_csv("pandas_datetime_events.csv")

Inspect them:

python

profiles.head()

python

events.head()

2. What Vectorized Operations Mean

In NumPy, vectorized operations apply one operation to many values at once.

python

arr = np.array([10, 20, 30, 40])
arr * 2

Pandas brings the same idea to columns.

Without vectorized string methods, this kind of code can break:

python

names = ["  diya rao  ", "KABIR SEN", None, "meera iyer"]

[name.strip().title() for name in names]

The None value causes a problem.

Pandas handles missing values more safely:

python

s = pd.Series(names)
s.str.strip().str.title()

The .str accessor applies string methods across the Series while preserving missing values.

3. Clean Text With Case Methods

Start with raw profile names:

python

profiles["raw_name"].head()

Convert to lowercase:

python

profiles["raw_name"].str.lower()

Convert to uppercase:

python

profiles["raw_name"].str.upper()

Convert to title case:

python

profiles["raw_name"].str.title()

Clean the name with strip plus title:

python

profiles["clean_name"] = profiles["raw_name"].str.strip().str.title()
profiles[["raw_name", "clean_name"]].head()

String methods can be chained when each step returns another Series.

4. Strip Extra Spaces

Extra spaces are common in CSV files.

python

profiles["raw_name"].str.strip()

Remove only left-side spaces:

python

profiles["raw_name"].str.lstrip()

Remove only right-side spaces:

python

profiles["raw_name"].str.rstrip()

Clean city names:

python

profiles["city_clean"] = profiles["city"].str.strip().str.title()
profiles[["city", "city_clean"]].head()

This is a simple but high-value cleaning step.

5. String Length

Find name lengths:

python

profiles["clean_name"].str.len()

Add a length column:

python

profiles["name_length"] = profiles["clean_name"].str.len()

Find the longest name:

python

profiles.loc[
    profiles["name_length"] == profiles["name_length"].max(),
    ["profile_id", "clean_name", "name_length"]
]

Use .str.len() for:

validating IDs
detecting suspiciously short names
finding long text fields
checking code formats

6. Split Strings Into Parts

Clean the plan code:

python

profiles["plan_code"].head()

Each plan code follows this pattern:

text

TRACK-TIER-YEAR

Split it:

python

plan_parts = profiles["plan_code"].str.split("-", expand=True)
plan_parts.head()

Name the columns:

python

profiles[["track_code", "tier_code", "plan_year"]] = plan_parts
profiles[["plan_code", "track_code", "tier_code", "plan_year"]].head()

Convert year to number:

python

profiles["plan_year"] = profiles["plan_year"].astype(int)

Get only the first piece without expanding:

python

profiles["plan_code"].str.split("-").str.get(0)

.str.get(0) gets the first item from each split list.

7. Extract With Regex

Splitting works when the delimiter is simple.

Regex extraction is better when you need named pieces.

Extract an email domain:

python

profiles["email_clean"] = profiles["email"].str.strip().str.lower()

profiles["email_domain"] = profiles["email_clean"].str.extract(
    r"@(?P<domain>[^@\s]+)$"
)

profiles[["email", "email_clean", "email_domain"]].head()

Extract name title, first name, and last name:

python

name_parts = profiles["clean_name"].str.extract(
    r"^(?:(?P<title>Mr|Ms|Dr|Prof)\s+)?(?P<first_name>[A-Za-z]+)\s+(?P<last_name>[A-Za-z]+)$"
)

name_parts.head()

Join the extracted columns back:

python

profiles = pd.concat([profiles, name_parts], axis=1)
profiles[["clean_name", "title", "first_name", "last_name"]].head()

Use .str.extract() when you want a clean DataFrame of pattern groups.

8. Replace Text

Standardize plan tiers:

python

profiles["tier_code"].value_counts()

python

profiles["tier_label"] = profiles["tier_code"].str.replace("PRO", "Premium", regex=False)
profiles["tier_label"] = profiles["tier_label"].str.replace("FREE", "Free", regex=False)
profiles["tier_label"] = profiles["tier_label"].str.replace("TRIAL", "Trial", regex=False)

Use regex=False when you want a literal replacement.

Clean phone numbers by keeping only digits:

python

profiles["phone_digits"] = profiles["phone"].str.replace(r"\D", "", regex=True)
profiles[["phone", "phone_digits"]].head()

Now check digit counts:

python

profiles["phone_digits"].str.len().value_counts(dropna=False)

9. Filter With Contains, Startswith, And Endswith

Find profiles related to Pandas:

python

profiles[profiles["plan_code"].str.contains("PANDAS", case=False, na=False)]

Find Gmail users:

python

profiles[profiles["email_clean"].str.endswith("@gmail.com", na=False)]

Find learners whose first name starts with A:

python

profiles[profiles["first_name"].str.startswith("A", na=False)]

Find notes that mention refund:

python

profiles[profiles["support_note"].str.contains("refund", case=False, na=False)]

Always think about missing values.

Use na=False when missing text should behave like "not matched".

10. Character Checks

Check whether a value contains only digits:

python

profiles["phone_digits"].str.isdigit()

Check whether first names are alphabetic:

python

profiles["first_name"].str.isalpha()

Check whether plan codes are alphanumeric after removing hyphens:

python

profiles["plan_code"].str.replace("-", "", regex=False).str.isalnum()

These checks are useful for validation.

11. String Slicing

Get the first three characters of the plan code:

python

profiles["plan_code"].str[:3]

Get the last four characters:

python

profiles["plan_code"].str[-4:]

Reverse cleaned names:

python

profiles["clean_name"].str[::-1].head()

String slicing with .str[] behaves like Python slicing, but across the whole Series.

12. Create Display Labels

Use .str.cat() to combine text columns.

python

profiles["display_label"] = (
    profiles["first_name"]
    .str.cat(profiles["last_name"], sep=" ")
    .str.cat(profiles["tier_label"], sep=" - ")
)

profiles[["profile_id", "display_label"]].head()

This is useful when building labels for reports, dropdowns, and dashboards.

13. Timestamp Objects

A Timestamp represents one point in time.

python

pd.Timestamp("2026-02-14")

Create a timestamp with time:

python

pd.Timestamp("2026-02-14 17:42:00")

Read timestamp attributes:

python

ts = pd.Timestamp("2026-02-14 17:42:00")

ts.year, ts.month, ts.day, ts.hour, ts.minute

Pandas timestamps work well with Series and DataFrames.

14. DatetimeIndex

A DatetimeIndex is a collection of timestamps.

python

date_index = pd.DatetimeIndex([
    "2026-01-01",
    "2026-02-01",
    "2026-03-01"
])

date_index

Use it as a Series index:

python

monthly_visits = pd.Series([1200, 1800, 2400], index=date_index)
monthly_visits

Now the index understands time.

15. Generate Dates With `date_range()`

Daily dates:

python

pd.date_range(start="2026-01-01", end="2026-01-07", freq="D")

Every 2 days:

python

pd.date_range(start="2026-01-01", end="2026-01-15", freq="2D")

Business days:

python

pd.date_range(start="2026-01-01", end="2026-01-15", freq="B")

Every 6 hours:

python

pd.date_range(start="2026-01-01", periods=8, freq="6h")

Month starts:

python

pd.date_range(start="2026-01-01", periods=6, freq="MS")

Use date_range() when you need a planned calendar, missing-date scaffold, or regular time index.

16. Convert Text To Datetime

The event dataset stores time as text.

python

events.info()

Convert it:

python

events["event_at"] = pd.to_datetime(events["event_time"], errors="coerce")

Check invalid timestamps:

python

events[events["event_at"].isna()]

errors="coerce" turns invalid dates into NaT.

NaT means "not a time", similar to missing datetime.

17. Use The `.dt` Accessor

Extract year:

python

events["event_at"].dt.year

Extract month number:

python

events["event_at"].dt.month

Extract month name:

python

events["event_at"].dt.month_name()

Extract day name:

python

events["event_at"].dt.day_name()

Extract hour:

python

events["event_at"].dt.hour

Add useful columns:

python

events["event_date"] = events["event_at"].dt.date
events["event_month"] = events["event_at"].dt.month_name()
events["event_day_name"] = events["event_at"].dt.day_name()
events["event_hour"] = events["event_at"].dt.hour

Now the event log is ready for time-based analysis.

18. Date Flags

Check month start and month end:

python

events["is_month_start"] = events["event_at"].dt.is_month_start
events["is_month_end"] = events["event_at"].dt.is_month_end

Check quarter start and quarter end:

python

events["is_quarter_start"] = events["event_at"].dt.is_quarter_start
events["is_quarter_end"] = events["event_at"].dt.is_quarter_end

View month-end events:

python

events[events["is_month_end"]]

These flags help in reporting and finance-style analysis.

19. Filter By Date

Filter events after February 1:

python

events[events["event_at"] >= pd.Timestamp("2026-02-01")]

Filter events between two dates:

python

events[
    events["event_at"].between(
        pd.Timestamp("2026-02-01"),
        pd.Timestamp("2026-02-28 23:59:59")
    )
]

Filter weekend events:

python

events[events["event_at"].dt.dayofweek >= 5]

In Pandas:

Monday is 0
Sunday is 6

20. Timedelta Basics

Convert duration seconds into timedeltas:

python

events["session_duration"] = pd.to_timedelta(events["duration_seconds"], unit="s")
events[["duration_seconds", "session_duration"]].head()

Convert duration to minutes:

python

events["duration_minutes"] = events["session_duration"].dt.total_seconds() / 60

Round it:

python

events["duration_minutes"] = events["duration_minutes"].round(2)
events[["event_id", "duration_minutes"]].head()

Timedeltas are useful for session length, response time, age of record, and retention analysis.

21. Recency Analysis

Choose a fixed analysis date:

python

analysis_date = pd.Timestamp("2026-03-15")

Find each profile's latest valid event:

python

latest_event = events.groupby("profile_id")["event_at"].max().reset_index()
latest_event = latest_event.rename(columns={"event_at": "last_event_at"})

Merge with profiles:

python

profile_recency = profiles.merge(latest_event, on="profile_id", how="left")
profile_recency["days_since_last_event"] = (
    analysis_date - profile_recency["last_event_at"]
).dt.days

profile_recency[["profile_id", "clean_name", "last_event_at", "days_since_last_event"]]

This is a common product analytics pattern.

22. Group By Date Parts

Events by day name:

python

events.groupby("event_day_name")["event_id"].count()

Events by hour:

python

events.groupby("event_hour")["event_id"].count().sort_index()

Revenue by month:

python

events.groupby("event_month")["amount"].sum()

Average duration by event type:

python

events.groupby("event_type")["duration_minutes"].mean().round(2)

The .dt accessor often prepares columns for normal groupby().

23. Resample Time Series Data

To use resample(), set a datetime column as the index.

python

events_by_time = events.dropna(subset=["event_at"]).set_index("event_at").sort_index()

Daily event count:

python

events_by_time["event_id"].resample("D").count()

Weekly amount:

python

events_by_time["amount"].resample("W").sum()

Monthly event type count:

python

events_by_time.groupby("event_type")["event_id"].resample("MS").count()

Use resample() when your rows are time-indexed and you want regular time buckets.

24. Mini Project 1: Clean Profile Table

Build a clean profile table with:

profile ID
clean name
first name
last name
clean email
email domain
clean city
track code
tier label
phone digits

Solution:

python

clean_profiles = profiles.copy()

clean_profiles["clean_name"] = clean_profiles["raw_name"].str.strip().str.title()
clean_profiles["email_clean"] = clean_profiles["email"].str.strip().str.lower()
clean_profiles["city_clean"] = clean_profiles["city"].str.strip().str.title()
clean_profiles["phone_digits"] = clean_profiles["phone"].str.replace(r"\D", "", regex=True)

clean_profiles[["track_code", "tier_code", "plan_year"]] = (
    clean_profiles["plan_code"].str.split("-", expand=True)
)

clean_profiles["tier_label"] = clean_profiles["tier_code"].replace({
    "PRO": "Premium",
    "FREE": "Free",
    "TRIAL": "Trial"
})

name_parts = clean_profiles["clean_name"].str.extract(
    r"^(?:(?P<title>Mr|Ms|Dr|Prof)\s+)?(?P<first_name>[A-Za-z]+)\s+(?P<last_name>[A-Za-z]+)$"
)

clean_profiles = pd.concat([clean_profiles, name_parts], axis=1)
clean_profiles["email_domain"] = clean_profiles["email_clean"].str.extract(r"@([^@\s]+)$")

clean_profiles[[
    "profile_id",
    "clean_name",
    "first_name",
    "last_name",
    "email_clean",
    "email_domain",
    "city_clean",
    "track_code",
    "tier_label",
    "phone_digits"
]]

25. Mini Project 2: Event Activity Report

Build a report with:

valid event timestamp
event month
event day name
event hour
duration minutes
revenue amount

Solution:

python

activity_report = events.copy()

activity_report["event_at"] = pd.to_datetime(activity_report["event_time"], errors="coerce")
activity_report = activity_report.dropna(subset=["event_at"])

activity_report["event_month"] = activity_report["event_at"].dt.month_name()
activity_report["event_day_name"] = activity_report["event_at"].dt.day_name()
activity_report["event_hour"] = activity_report["event_at"].dt.hour
activity_report["duration_minutes"] = (
    pd.to_timedelta(activity_report["duration_seconds"], unit="s").dt.total_seconds() / 60
).round(2)

activity_report[[
    "event_id",
    "profile_id",
    "event_type",
    "event_month",
    "event_day_name",
    "event_hour",
    "duration_minutes",
    "amount"
]].head()

26. Mini Project 3: Weekly Revenue And Engagement

Build a weekly table with:

event count
total amount
average session duration

Solution:

python

weekly_report = (
    activity_report
    .set_index("event_at")
    .resample("W")
    .agg(
        event_count=("event_id", "count"),
        total_amount=("amount", "sum"),
        average_duration=("duration_minutes", "mean")
    )
)

weekly_report["average_duration"] = weekly_report["average_duration"].round(2)
weekly_report

27. Common Mistakes

Mistake 1: Using Python string methods directly on a Series

This is wrong:

python

profiles["raw_name"].strip()

Explanation

Removes leading and trailing whitespace characters from the 'raw_name' column values
Returns a new string value without any surrounding spaces, tabs, or newline characters
Commonly used for data cleaning to standardize text formatting before processing
Preserves internal spaces within names while eliminating unwanted boundary whitespace
Typically applied as part of data preprocessing pipelines to ensure consistent data quality

Use .str:

python

profiles["raw_name"].str.strip()

Explanation

The code removes leading and trailing whitespace characters from the "raw_name" column in a pandas DataFrame
Uses the str.strip() method which eliminates empty spaces, tabs, and newline characters from both ends of strings
This operation is commonly used for data cleaning to standardize text formatting before analysis
The method returns a new Series with cleaned string values while preserving the original DataFrame structure
Useful for eliminating data inconsistencies caused by accidental spacing during data entry or import processes

Mistake 2: Forgetting missing values in filters

This may fail when values are missing:

python

profiles[profiles["support_note"].str.contains("refund")]

Explanation

This code filters a DataFrame called 'profiles' to return only rows where the 'support_note' column contains the substring "refund"
The str.contains() method performs a case-sensitive search for the specified pattern within each string value
The result is a new DataFrame containing only the matching rows that have "refund" anywhere in their support_note field
This filtering approach is commonly used for data analysis tasks like identifying customer service cases related to refunds
The operation preserves all columns from the original DataFrame while narrowing down to relevant records based on text content

Use:

python

profiles[profiles["support_note"].str.contains("refund", case=False, na=False)]

Explanation

Uses pandas boolean indexing to filter rows where the "support_note" column contains the word "refund" regardless of case sensitivity
The case=False parameter ensures the search is case-insensitive, matching "Refund", "REFUND", or "refund"
The na=False parameter handles missing values by treating them as False rather than raising an error
This operation returns a filtered DataFrame containing only profiles with refund-related support notes
Commonly used for customer service analysis to identify cases requiring refund processing

Mistake 3: Forgetting `regex=False` for literal replacement

When replacing exact text, use:

python

profiles["tier_code"].str.replace("PRO", "Premium", regex=False)

Explanation

This code performs a string replacement on the 'tier_code' column in the profiles DataFrame
It substitutes all occurrences of "PRO" with "Premium" without using regular expressions
The regex=False parameter ensures literal string matching rather than pattern matching
This transformation standardizes tier naming conventions by converting abbreviated codes to full labels
The operation returns a new Series with the modified string values while preserving the original DataFrame structure

Mistake 4: Using `.dt` before converting to datetime

This fails if the column is still text:

python

events["event_time"].dt.year

Explanation

Accesses the year attribute of datetime data using pandas' dt accessor for efficient time-based data extraction
Enables temporal analysis by isolating specific year values from full datetime timestamps
Returns a Series containing only the year integers, useful for grouping, filtering, or aggregating time-series data
Commonly used in data preprocessing workflows for time-based feature engineering and chronological analysis

Convert first:

python

events["event_at"] = pd.to_datetime(events["event_time"], errors="coerce")
events["event_at"].dt.year

Explanation

The first line converts the 'event_time' column values to datetime objects using pd.to_datetime() with error handling that coerces invalid parsing attempts into NaT (Not a Time) values
The second line accesses the year component from the converted datetime objects using the .dt accessor and .year attribute
This approach handles potential data quality issues by gracefully managing malformed date strings
The operation returns a pandas Series containing only the extracted year integers from valid datetime entries
This pattern is commonly used for time-based data analysis and grouping operations

Mistake 5: Ignoring invalid dates

Use:

python

pd.to_datetime(events["event_time"], errors="coerce")

Explanation

Converts the event_time column from string or other formats to proper datetime objects using pandas to_datetime function
The errors="coerce" parameter handles invalid date values by converting them to NaT (Not a Time) instead of raising an exception
This approach ensures data cleaning and consistency when working with temporal data in pandas DataFrames
Useful for time-series analysis and chronological sorting of event data
Prevents data type conversion errors that would occur with malformed timestamp entries

Then inspect NaT rows.

28. Practice Questions

Try these before looking at the solutions.

Practice Task

Q1. Create a clean lowercase email column.

python

profiles["email_clean"] = profiles["email"].str.strip().str.lower()
profiles[["email", "email_clean"]].head()

Explanation

The code creates a new column "email_clean" by applying string methods to the existing "email" column
First .str.strip() removes leading and trailing whitespace from each email address
Then .str.lower() converts all characters to lowercase for consistent formatting
The final .head() displays the original and cleaned email columns side-by-side for verification
This approach ensures standardized email data format for better data processing and matching

Practice Task

Q2. Extract the domain from each email.

python

profiles["email_domain"] = profiles["email_clean"].str.extract(r"@([^@\s]+)$")
profiles["email_domain"].value_counts(dropna=False)

Explanation

The first line uses pandas string methods to extract the domain portion of email addresses by capturing text after the @ symbol until the end of the string
The regex pattern r"@([^@\s]+)$" captures one or more characters that are not @ or whitespace, starting after @ and ending at the string end
The second line displays frequency counts of all unique email domains including null values to show distribution across different email providers
This approach efficiently categorizes users by their email service providers for analytics and segmentation purposes
The extracted domains can be used for targeted marketing campaigns or identifying user demographics based on email provider trends

Practice Task

Q3. Find all profiles whose support note mentions invoice.

python

profiles[profiles["support_note"].str.contains("invoice", case=False, na=False)]

Explanation

Uses pandas DataFrame boolean indexing to filter rows where the "support_note" column contains the substring "invoice"
The str.contains() method performs case-insensitive matching with case=False parameter
Handles missing values gracefully with na=False parameter, treating NaN values as False rather than raising errors
Returns a new DataFrame containing only rows that match the filtering criteria
Commonly used for data cleaning and analysis tasks involving text-based filtering

Practice Task

Q4. Split plan code into track, tier, and year.

python

profiles[["track_code", "tier_code", "plan_year"]] = profiles["plan_code"].str.split("-", expand=True)
profiles[["plan_code", "track_code", "tier_code", "plan_year"]].head()

Explanation

The code uses pandas str.split() method to decompose the plan_code column into four separate columns using the hyphen delimiter
The expand=True parameter ensures the split results are returned as separate columns rather than a single Series of lists
The assignment creates three new columns (track_code, tier_code, plan_year) by splitting the original plan_code values at each hyphen character
The final line displays the first few rows of all four columns to verify the successful transformation and data extraction

Practice Task

Q5. Convert event time into datetime and find invalid rows.

python

events["event_at"] = pd.to_datetime(events["event_time"], errors="coerce")
events[events["event_at"].isna()]

Explanation

Converts the event_time column to datetime format using pd.to_datetime() with error handling that coerces invalid parsing attempts to NaT (Not a Time) values
The errors="coerce" parameter ensures that any malformed date strings are converted to null values instead of raising exceptions
Filters and displays only the rows where the conversion resulted in null values, helping identify problematic timestamp entries
This approach is useful for data cleaning and validation when working with potentially inconsistent datetime data from external sources

Practice Task

Q6. Count events by day name.

python

events["day_name"] = events["event_at"].dt.day_name()
events.groupby("day_name")["event_id"].count()

Explanation

Extracts the day name from a datetime column using the dt accessor and assigns it as a new column called "day_name"
Groups the dataframe by the newly created "day_name" column to aggregate events by weekday
Counts the number of unique event IDs for each day to determine event frequency distribution
This pattern enables time-based analysis to identify peak activity days in event data

Practice Task

Q7. Find events that happened after 6 PM.

python

events[events["event_at"].dt.hour >= 18]

Explanation

Filters a DataFrame named events to include only rows where the event occurred after 6 PM (hour >= 18)
Uses the dt accessor to extract hour component from the datetime column "event_at"
The comparison operator >= selects events from 6 PM onwards including the exact 6 PM timestamp
This operation returns a new DataFrame containing only the filtered evening/night events
Commonly used for time-based data analysis to isolate specific time periods of interest

Practice Task

Q8. Find total amount by month.

python

events["month"] = events["event_at"].dt.month_name()
events.groupby("month")["amount"].sum()

Explanation

Extracts the full month name from a datetime column named "event_at" and stores it in a new "month" column
Groups all events by the newly created month column to aggregate data by time periods
Calculates the sum of monetary amounts for each month to show total spending or revenue per month
This pattern enables easy monthly trend analysis and reporting of financial data

Practice Task

Q9. Convert duration seconds into minutes.

python

events["duration_minutes"] = (
    pd.to_timedelta(events["duration_seconds"], unit="s").dt.total_seconds() / 60
).round(2)

events[["event_id", "duration_seconds", "duration_minutes"]].head()

Explanation

Converts the duration_seconds column to timedelta format using pd.to_timedelta with seconds unit
Extracts total seconds from the timedelta object and divides by 60 to convert to minutes
Rounds the resulting minutes value to 2 decimal places for precision
Displays the first few rows showing event_id, original duration in seconds, and converted duration in minutes

Practice Task

Q10. Create daily event counts with resample.

python

events.dropna(subset=["event_at"]).set_index("event_at")["event_id"].resample("D").count()

Explanation

The code removes rows with missing values in the "event_at" column using dropna() to ensure valid timestamps
It sets the "event_at" column as the DataFrame index to enable time-based operations and grouping
The event_id column is selected and resampled on a daily frequency ("D") to aggregate events by date
The count() method aggregates the number of events per day by counting non-null values in each time period
This creates a time series showing daily event counts for analysis and visualization purposes

Need	Use
Clean text values in a Series	`.str`
Remove spaces	`.str.strip()`
Standardize case	`.str.lower()`, `.str.title()`
Split codes into columns	`.str.split(..., expand=True)`
Extract regex groups	`.str.extract()`
Filter text by pattern	`.str.contains()`
Convert text to datetime	`pd.to_datetime()`
Generate regular dates	`pd.date_range()`
Read parts of datetime	`.dt.year`, `.dt.month_name()`, `.dt.day_name()`
Convert seconds to duration	`pd.to_timedelta()`
Group time-indexed data	`.resample()`

If you remember only one thing, remember this:

Use .str before text analysis and .dt before time analysis. Clean text and parsed time are the foundation of reliable Pandas workflows.

Official References

Pandas working with text data user guide: https://pandas.pydata.org/docs/user_guide/text.html
Pandas time series user guide: https://pandas.pydata.org/docs/user_guide/timeseries.html
Pandas Series.str API reference: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.html
Pandas to_datetime API reference: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html
Pandas date_range API reference: https://pandas.pydata.org/docs/reference/api/pandas.date_range.html

Pandas Vectorized String & DateTime Operations Guide

AI Insights

Q1. Create a clean lowercase email column.

Q2. Extract the domain from each email.

Q3. Find all profiles whose support note mentions invoice.

Q4. Split plan code into track, tier, and year.

Q5. Convert event time into datetime and find invalid rows.

Q6. Count events by day name.

Q7. Find events that happened after 6 PM.

Q8. Find total amount by month.

Q9. Convert duration seconds into minutes.

Q10. Create daily event counts with resample.