Matplotlib Guide: Line, Scatter, Bar, and More Charts

Jun 9, 2026
45 min read

AI Insights

Powered by GPT-4o-mini

Verified Context: matplotlib-guide-line-scatter-bar-and-more-charts
Quick Answer

Learn Matplotlib from scratch with original examples: line plots, scatter plots, bar charts, grouped and stacked bars, histograms, pie charts, styles, subplots, annotations, and saving figures.

Quick Summary

Learn to create stunning visualizations with Matplotlib. Explore line, scatter, bar, and pie charts using real data examples.

Matplotlib Plotting: Line, Scatter, Bar, Histogram, Pie, Styles, and Saving Figures

Matplotlib is one of the most important Python libraries for data visualization.

You use it when you want to turn numbers into pictures:

  • how a metric changes over time
  • whether two numeric columns move together
  • which category contributes the most
  • how values are distributed
  • whether an outlier is hiding inside the data
  • how multiple products, countries, learners, or experiments compare

This guide teaches Matplotlib with original examples and small synthetic CSV files.

You will not use copied sports, course, or public datasets. The examples here use fresh data about air-quality monitoring and product sales so you can practice the same plotting ideas safely.

Files Used In This Guide

Place these files in the same folder as your notebook or script:

  • matplotlib_air_quality_trends.csv
  • matplotlib_product_sales.csv

You can also place them inside a data/ folder. If you do that, update the paths:

python
air = pd.read_csv("data/matplotlib_air_quality_trends.csv")
sales = pd.read_csv("data/matplotlib_product_sales.csv")

What You Will Learn

By the end, you should be able to:

  • explain when to use a line plot, scatter plot, bar chart, histogram, and pie chart
  • identify numerical and categorical data
  • create simple plots with plt.plot, plt.scatter, plt.bar, plt.hist, and plt.pie
  • add labels, titles, legends, grid lines, and axis limits
  • change colors, markers, line styles, line widths, and marker sizes
  • compare multiple series in one chart
  • create vertical, horizontal, grouped, and stacked bar charts
  • use histograms for frequency and probability-style distributions
  • resize figures with figsize
  • use built-in Matplotlib styles
  • create subplots for side-by-side comparison
  • save charts with savefig
  • solve practical plotting exercises from raw CSV data

1. Types Of Data

Before choosing a chart, understand the columns.

Most beginner plotting problems use two broad types of data.

Numerical data

Numerical data is made of numbers where mathematical operations make sense.

Examples:

  • sales revenue
  • profit
  • temperature
  • PM2.5 pollution level
  • study minutes
  • exam score
  • product units sold
  • customer age

Categorical data

Categorical data describes groups or labels.

Examples:

  • country
  • product name
  • quarter
  • plan type
  • city
  • department
  • course category
  • payment method

Chart choice depends on the relationship you want to inspect.

QuestionData PatternGood Chart
How does profit change month by month?numerical over timeline plot
Do PM2.5 and PM10 rise together?numerical vs numericalscatter plot
Which product sold the most?categorical vs numericalbar chart
How are PM10 values distributed?one numerical columnhistogram
What share does each category contribute?category contributionpie chart, used carefully

2. Import Matplotlib

Most examples use this standard setup:

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pyplot is commonly imported as plt.

You will usually follow this pattern:

python
plt.figure(figsize=(8, 4))
plt.plot([1, 2, 3], [10, 20, 15])
plt.title("Simple Line Plot")
plt.xlabel("Step")
plt.ylabel("Value")
plt.show()

In notebooks, plt.show() is not always required, but using it is a good habit because it makes your intent clear.

3. The Basic Mental Model

Matplotlib has two common styles:

  • the beginner-friendly plt style
  • the object-oriented fig, ax style

The plt style is fast for learning:

python
plt.plot(x, y)
plt.title("My Chart")
plt.show()

The object-oriented style is better for serious projects:

python
fig, ax = plt.subplots(figsize=(8, 4))
ax.plot(x, y)
ax.set_title("My Chart")
plt.show()

In this guide, we start with plt because it is simple. Then we use subplots when multiple charts need to sit together.

4. Load The Practice Data

python
import pandas as pd

air = pd.read_csv("matplotlib_air_quality_trends.csv")
sales = pd.read_csv("matplotlib_product_sales.csv")

print(air.head())
print(sales.head())

Check the shape:

python
print(air.shape)
print(sales.shape)

Expected idea:

  • air has country-year pollution metrics
  • sales has monthly sales and profit metrics for five products

5. 2D Line Plot

A line plot is useful when the x-axis has an ordered sequence.

Common use cases:

  • time series
  • monthly profit
  • yearly pollution level
  • daily website visits
  • model accuracy over training epochs
  • cumulative learning progress

Line plots work well for:

  • numerical vs numerical
  • categorical ordered labels vs numerical

5.1 Simple Line Plot

python
x = [1, 2, 3, 4, 5]
y = [3, 7, 5, 9, 12]

plt.plot(x, y)
plt.show()

Explanation

  • Two lists x and y are defined containing numerical values that represent coordinates for plotting
  • The plt.plot() function creates a line graph connecting the points formed by corresponding elements from both lists
  • The plt.show() command displays the generated plot in a graphical window
  • This approach visualizes relationships between paired data points in a simple and effective manner
  • The resulting chart shows how values in list y change relative to values in list x across the specified range

5.2 Line Plot From CSV Data

Plot India PM2.5 trend over years:

python
india = air[air["country"] == "India"]

plt.figure(figsize=(8, 4))
plt.plot(india["year"], india["pm25"])
plt.title("India PM2.5 Trend")
plt.xlabel("Year")
plt.ylabel("PM2.5")
plt.show()

Explanation

  • Filters the air quality dataset to isolate records where the country is India using boolean indexing
  • Sets up a matplotlib figure with specific dimensions (8x4 inches) for optimal visualization
  • Plots PM2.5 levels against years for India, creating a time series trend visualization
  • Adds appropriate title and axis labels to make the chart informative and readable
  • Displays the final plot showing the historical PM2.5 pollution pattern in India

5.3 Plot Multiple Lines

Compare India and Brazil:

python
india = air[air["country"] == "India"]
brazil = air[air["country"] == "Brazil"]

plt.figure(figsize=(8, 4))
plt.plot(india["year"], india["pm25"], label="India")
plt.plot(brazil["year"], brazil["pm25"], label="Brazil")
plt.title("PM2.5 Trend: India vs Brazil")
plt.xlabel("Year")
plt.ylabel("PM2.5")
plt.legend()
plt.show()

Explanation

  • Filters air quality data to isolate records for India and Brazil countries using boolean indexing
  • Creates a line plot showing PM2.5 concentration levels over time for both countries on the same graph
  • Sets appropriate chart labels including title, x-axis (year), and y-axis (PM2.5 levels) with legend for distinction
  • Configures figure size to ensure proper display and renders the final visualization
  • Uses pandas DataFrame filtering and matplotlib plotting functions to create comparative environmental trend analysis

label names each line.

legend() displays those names.

5.4 Colors, Line Styles, And Line Width

python
plt.figure(figsize=(8, 4))
plt.plot(
    india["year"],
    india["pm25"],
    color="#f59e0b",
    linestyle="--",
    linewidth=2.5,
    label="India",
)
plt.plot(
    brazil["year"],
    brazil["pm25"],
    color="#2563eb",
    linestyle="-.",
    linewidth=2.5,
    label="Brazil",
)
plt.title("Styled PM2.5 Lines")
plt.xlabel("Year")
plt.ylabel("PM2.5")
plt.legend()
plt.show()

Explanation

  • Creates a figure with specified dimensions (8x4 inches) for displaying the plot
  • Plots two distinct line series using different styling options: India data with orange dashed line and Brazil data with blue dash-dot line
  • Adds comprehensive chart elements including title, axis labels, and legend to clearly identify both datasets
  • Uses matplotlib's plotting functions to visualize temporal trends in PM2.5 pollution levels across the two countries
  • Displays the final combined visualization with proper formatting and styling for clear data comparison

Useful linestyle values:

python
"-"    # solid
"--"   # dashed
"-."   # dash-dot
":"    # dotted

Explanation

  • These four string constants represent different line style patterns used in matplotlib and similar plotting libraries
  • The single dash "-" creates a solid line, the double dash "--" produces a dashed line pattern
  • The dash-dot pattern "-." combines dashes and dots, while the colon ":" generates a dotted line
  • These styles are commonly used when customizing plot appearance to distinguish between multiple data series
  • Each pattern can be applied to line plots, scatter plots with connecting lines, or any visualization requiring styled line rendering

5.5 Markers

Markers show each actual data point.

python
plt.figure(figsize=(8, 4))
plt.plot(
    india["year"],
    india["pm25"],
    marker="o",
    markersize=7,
    linewidth=2,
    label="India",
)
plt.title("PM2.5 With Markers")
plt.xlabel("Year")
plt.ylabel("PM2.5")
plt.legend()
plt.show()

Explanation

  • Creates a matplotlib figure with specified dimensions (8 inches wide by 4 inches tall) for optimal visualization
  • Plots PM2.5 data from the india DataFrame using year as x-axis and pm25 values as y-axis with circular markers and line styling
  • Adds title "PM2.5 With Markers" and labels for both axes to provide context and clarity
  • Includes a legend to identify the India data series and displays the final plot with plt.show()
  • Uses marker styling (o shape, 7 size) and line styling (2 width) to make data points clearly visible on the trend line

Useful marker values:

python
"o"   # circle
"s"   # square
"D"   # diamond
"^"   # triangle up
"+"   # plus
"x"   # x marker

Explanation

  • These are matplotlib marker style codes used to represent different shapes in scatter plots and line charts
  • Each character represents a specific geometric symbol: "o" for circles, "s" for squares, "D" for diamonds, "^" for upward triangles
  • The "+" and "x" markers create plus signs and cross marks respectively for data point differentiation
  • These markers are commonly used in data visualization to distinguish between multiple datasets or categories in the same plot
  • They can be combined with color and size parameters to create comprehensive visual representations of data relationships

5.6 Axis Limits

Axis limits help when one outlier stretches the plot too much.

python
months = [1, 2, 3, 4, 5, 6, 7]
price = [48000, 54000, 57000, 49000, 47000, 45000, 4500000]

plt.plot(months, price, marker="o")
plt.title("Price With One Extreme Outlier")
plt.show()

Explanation

  • The code creates two lists representing months (1-7) and corresponding price values, including one significantly higher value (4,500,000)
  • It generates a line plot using matplotlib with circular markers at each data point to clearly show the relationship between months and prices
  • The plot includes a title indicating the presence of an extreme outlier to alert viewers to the unusual data point
  • The visualization makes it easy to observe the general trend while highlighting how the outlier dramatically affects the overall price pattern
  • This type of chart is useful for identifying anomalies and understanding data distribution patterns in time series analysis

The final value dominates the chart. Limit the y-axis:

python
plt.plot(months, price, marker="o")
plt.ylim(40000, 65000)
plt.title("Price Trend With Y-Axis Limited")
plt.show()

Explanation

  • Plots monthly price data using circular markers to visualize trends over time
  • Sets the y-axis limits between 40,000 and 65,000 to focus on the relevant price range
  • Adds a descriptive title "Price Trend With Y-Axis Limited" to contextualize the visualization
  • Displays the resulting plot to show the relationship between months and price values

You can also limit both axes:

python
plt.plot(months, price, marker="o")
plt.xlim(1, 6)
plt.ylim(40000, 65000)
plt.title("Focused Price View")
plt.show()

Explanation

  • Creates a line plot showing price trends across months with circular markers at data points
  • Sets x-axis range from 1 to 6 months and y-axis range from 40,000 to 65,000 dollars
  • Applies a custom title "Focused Price View" to provide context for the visualization
  • Displays the plot with all configured formatting and scaling applied

Use limits carefully. They are useful for focus, but they can also hide important values.

5.7 Grid Lines

python
plt.figure(figsize=(8, 4))
plt.plot(sales["month_number"], sales["total_profit"], marker="o")
plt.title("Monthly Total Profit")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.grid(True, alpha=0.3)
plt.show()

Explanation

  • Creates a figure with specified dimensions (8 inches wide by 4 inches tall) for optimal visualization
  • Plots monthly profit data as a line graph with circular markers at each data point to highlight individual values
  • Adds descriptive labels including title, x-axis (month), and y-axis (profit) to make the chart self-explanatory
  • Enables a subtle grid overlay with transparency to improve readability of values along both axes
  • Displays the finalized plot with all formatting applied to show the monthly profit trend clearly

alpha controls transparency. Lower values make grid lines softer.

6. Scatter Plot

A scatter plot is used for numerical vs numerical analysis.

Common use cases:

  • correlation
  • clusters
  • outliers
  • relationship between two measurements
  • impact of one metric on another

6.1 Simple Scatter Plot

python
x = [5, 7, 8, 9, 11, 13]
y = [45, 52, 50, 61, 67, 72]

plt.scatter(x, y)
plt.title("Simple Scatter Plot")
plt.xlabel("Input")
plt.ylabel("Output")
plt.show()

Explanation

  • Two lists x and y are defined containing paired numerical values representing input and output data points
  • The matplotlib scatter plot function plots each pair of coordinates (x[i], y[i]) as individual dots on a 2D graph
  • Axis labels and a title are added to make the visualization informative and properly labeled
  • The plt.show() command renders the completed scatter plot graphic for viewing and analysis

6.2 PM2.5 vs PM10

python
plt.figure(figsize=(7, 5))
plt.scatter(air["pm25"], air["pm10"])
plt.title("PM2.5 vs PM10")
plt.xlabel("PM2.5")
plt.ylabel("PM10")
plt.grid(True, alpha=0.25)
plt.show()

Explanation

  • Creates a scatter plot using matplotlib to visualize the correlation between two air quality metrics: PM2.5 (fine particulate matter) and PM10 (coarse particulate matter)
  • Sets the figure size to 7x5 inches for optimal display of the scatter plot visualization
  • Adds axis labels and title to clearly identify the variables being compared and the plot's purpose
  • Enables a subtle grid overlay with 25% transparency to improve readability of data points
  • Displays the final scatter plot showing the distribution pattern between the two pollutant measurements

If the points rise from left to right, the two values likely move together.

6.3 Scatter Plot With Color And Marker

python
india = air[air["country"] == "India"]
germany = air[air["country"] == "Germany"]

plt.figure(figsize=(7, 5))
plt.scatter(india["pm25"], india["pm10"], color="orange", marker="o", label="India")
plt.scatter(germany["pm25"], germany["pm10"], color="green", marker="^", label="Germany")
plt.title("PM2.5 vs PM10 By Country")
plt.xlabel("PM2.5")
plt.ylabel("PM10")
plt.legend()
plt.grid(True, alpha=0.25)
plt.show()

Explanation

  • Filters air quality data to create separate datasets for India and Germany based on country column values
  • Plots two scatter plots on the same figure using different colors and markers to distinguish between the two countries
  • Sets up chart formatting including title, axis labels, legend, and grid for better visualization of the relationship between PM2.5 and PM10 concentrations
  • Displays the resulting scatter plot showing pollution level comparisons between the two nations
  • Uses matplotlib to render the visualization with appropriate styling and labeling for clear data interpretation

6.4 Bubble Scatter Plot

You can use marker size to show a third variable.

python
plt.figure(figsize=(8, 5))
plt.scatter(
    air["pm25"],
    air["pm10"],
    s=air["monitoring_sites"] * 8,
    alpha=0.65,
)
plt.title("PM2.5 vs PM10, Sized By Monitoring Sites")
plt.xlabel("PM2.5")
plt.ylabel("PM10")
plt.grid(True, alpha=0.25)
plt.show()

Explanation

  • Creates a scatter plot showing the relationship between PM2.5 and PM10 air pollution measurements
  • Uses bubble size to represent the number of monitoring sites, with each site scaled by a factor of 8 for visibility
  • Adds descriptive labels including title, x-axis (PM2.5), and y-axis (PM10) with a grid overlay for better data interpretation
  • Sets transparency (alpha=0.65) to handle potential overlapping data points and improve visual clarity
  • Displays the final plot with proper figure sizing and grid formatting for enhanced readability

s controls marker size.

alpha is especially useful when points overlap.

6.5 Scatter-Like Plot With plt.plot

plt.plot can also draw points only:

python
plt.plot(air["pm25"], air["pm10"], "o")
plt.title("Scatter-Like Plot Using plt.plot")
plt.xlabel("PM2.5")
plt.ylabel("PM10")
plt.show()

Explanation

  • The code generates a scatter-like plot by plotting PM2.5 values on the x-axis against PM10 values on the y-axis
  • Each data point is represented as a circle marker ("o") rather than traditional scatter plot dots
  • The plot includes proper axis labels for both PM2.5 and PM10 concentrations along with a descriptive title
  • The matplotlib.pyplot.show() function displays the resulting visualization to the user
  • This type of visualization helps identify potential correlations or patterns between particulate matter concentration levels

For normal scatter plots, prefer plt.scatter.

7. Bar Chart

A bar chart compares categories.

Use it for:

  • product revenue
  • country counts
  • sales by quarter
  • learners by plan
  • tickets by priority
  • average score by course

7.1 Simple Bar Chart

python
products = ["Speaker", "Band", "Charger", "Stand", "Buds"]
units = [318, 315, 402, 268, 301]

plt.bar(products, units)
plt.title("December Units Sold")
plt.xlabel("Product")
plt.ylabel("Units")
plt.show()

Explanation

  • Creates two lists containing product names and their corresponding unit sales figures for December
  • Uses matplotlib's bar function to generate a horizontal bar chart comparing sales across different products
  • Adds descriptive labels including title, x-axis label for products, and y-axis label for units sold
  • Displays the completed bar chart visualization showing which products had the highest and lowest sales volumes
  • The chart helps identify top-selling products by visually representing the numerical differences in unit sales

7.2 Bar Chart From Sales CSV

python
december = sales[sales["month_name"] == "Dec"].iloc[0]
products = ["smart_speaker", "fitness_band", "wireless_charger", "tablet_stand", "noise_canceling_buds"]
values = december[products]

plt.figure(figsize=(10, 4))
plt.bar(products, values)
plt.title("December Units Sold By Product")
plt.xlabel("Product")
plt.ylabel("Units")
plt.xticks(rotation=25, ha="right")
plt.show()

Explanation

  • Filters sales data to isolate the December month records and extracts the first row of data
  • Selects specific product columns and their corresponding sales values for the December record
  • Generates a horizontal bar chart displaying units sold for each product category
  • Formats the chart with appropriate labels, title, and rotated x-axis labels for better readability
  • Displays the completed bar chart visualization showing December sales performance across different products

rotation helps when category names are long.

7.3 Horizontal Bar Chart

python
plt.figure(figsize=(8, 4))
plt.barh(products, values)
plt.title("December Units Sold By Product")
plt.xlabel("Units")
plt.ylabel("Product")
plt.show()

Explanation

  • Creates a horizontal bar chart using matplotlib with specified figure dimensions of 8 by 4 inches
  • Plots product names on the y-axis and corresponding sales values on the x-axis using horizontal bars
  • Adds a descriptive title "December Units Sold By Product" and labels both axes appropriately
  • Displays the completed chart with units on the x-axis and product names on the y-axis
  • Uses plt.show() to render and display the final visualization to the user

Horizontal bars are easier to read when labels are long.

7.4 Bar Width

python
plt.bar(products, values, width=0.5)
plt.title("Bar Width Example")
plt.xticks(rotation=25, ha="right")
plt.show()

Explanation

  • Creates a vertical bar chart using matplotlib's bar function with products on x-axis and values on y-axis
  • Sets the bar width to 0.5 units for a more compact appearance compared to default width
  • Rotates x-axis labels by 25 degrees and aligns them to the right for better readability when labels are long
  • Displays the chart with a title "Bar Width Example" to indicate the purpose of the visualization
  • Shows the final plot with all formatting applied including axis labels and title

Very wide bars can look crowded.

Very narrow bars can make values harder to compare.

7.5 Grouped Bar Chart

Grouped bars compare categories across multiple groups.

Example: compare Q1, Q2, Q3, and Q4 product totals.

python
product_cols = ["smart_speaker", "fitness_band", "wireless_charger", "tablet_stand", "noise_canceling_buds"]
quarter_product = sales.groupby("quarter")[product_cols].sum()

x = np.arange(len(product_cols))
width = 0.2

plt.figure(figsize=(11, 5))
for offset, quarter in enumerate(quarter_product.index):
    plt.bar(
        x + (offset - 1.5) * width,
        quarter_product.loc[quarter],
        width=width,
        label=quarter,
    )

plt.xticks(x, product_cols, rotation=25, ha="right")
plt.title("Quarter-Wise Product Sales")
plt.xlabel("Product")
plt.ylabel("Units")
plt.legend()
plt.tight_layout()
plt.show()

Explanation

  • Groups sales data by quarter and calculates total units sold for each product category using sum aggregation
  • Sets up horizontal positioning for bars with numpy array and defines bar width for proper spacing between groups
  • Creates side-by-side bar charts for each quarter using matplotlib's bar function with calculated offsets
  • Configures chart appearance with rotated x-axis labels, title, axis labels, and legend for clear visualization
  • Uses tight layout to ensure proper spacing and displays the final grouped bar chart visualization

Important idea:

  • np.arange creates numeric positions
  • each quarter is shifted slightly left or right
  • xticks puts product names back on the x-axis

7.6 Stacked Bar Chart

Stacked bars show how parts combine into a total.

python
quarter_product = sales.groupby("quarter")[product_cols].sum()

bottom = np.zeros(len(quarter_product))

plt.figure(figsize=(8, 5))
for product in product_cols:
    plt.bar(
        quarter_product.index,
        quarter_product[product],
        bottom=bottom,
        label=product,
    )
    bottom += quarter_product[product].values

plt.title("Stacked Product Sales By Quarter")
plt.xlabel("Quarter")
plt.ylabel("Units")
plt.legend(bbox_to_anchor=(1.02, 1), loc="upper left")
plt.tight_layout()
plt.show()

Explanation

  • Groups sales data by quarter and calculates total units sold for each product category using sum aggregation
  • Initializes a zero array to track cumulative bottom positions for stacking bar segments
  • Iterates through each product column to create stacked bars where each product's contribution is added on top of previous products
  • Sets chart title, axis labels, and legend positioning while maintaining proper layout spacing
  • Displays the final stacked bar chart showing how different products contribute to total quarterly sales

Use stacked bars when the total and contribution both matter.

Avoid stacked bars when there are too many categories.

8. Histogram

A histogram shows the distribution of a numerical column.

Use it for:

  • frequency count
  • shape of values
  • spread
  • skew
  • outliers
  • comparing before and after cleaning

8.1 Simple Histogram

python
plt.hist(air["pm10"])
plt.title("PM10 Distribution")
plt.xlabel("PM10")
plt.ylabel("Frequency")
plt.show()

Explanation

  • The code generates a histogram showing how PM10 (particulate matter) levels are distributed across the dataset
  • It uses matplotlib's hist function to plot the frequency distribution of pm10 values from the air dataframe
  • The chart is customized with a title "PM10 Distribution" and labeled axes for better readability
  • The visualization helps identify patterns such as normal distribution, skewness, or outliers in air quality measurements
  • The plt.show() command renders the final histogram plot for viewing and analysis

8.2 Change Bins

python
plt.hist(air["pm10"], bins=8)
plt.title("PM10 Distribution With 8 Bins")
plt.xlabel("PM10")
plt.ylabel("Frequency")
plt.show()

Explanation

  • The code utilizes the matplotlib.pyplot library to create a histogram of PM10 data from the air DataFrame.
  • It specifies 8 bins to categorize the PM10 values, allowing for a clearer understanding of the data distribution.
  • The histogram is titled "PM10 Distribution With 8 Bins" to provide context for the visualization.
  • The x-axis is labeled "PM10" to indicate the variable being measured, while the y-axis is labeled "Frequency" to show how often each range of PM10 values occurs.
  • Finally, plt.show() is called to display the generated histogram to the user.

More bins reveal more detail.

Fewer bins give a smoother summary.

8.3 Custom Bin Edges

python
bins = [0, 20, 40, 60, 80, 100, 120, 140]

plt.hist(air["pm10"], bins=bins, edgecolor="black")
plt.title("PM10 Distribution With Custom Bins")
plt.xlabel("PM10 Range")
plt.ylabel("Frequency")
plt.show()

Explanation

  • Defines a list of bin edges to categorize PM10 values into specific ranges.
  • Utilizes Matplotlib's hist function to create a histogram of the "pm10" data from the air DataFrame.
  • Sets the histogram's edge color to black for better visibility of the bars.
  • Adds a title and labels for the x-axis and y-axis to enhance the plot's readability.
  • Displays the histogram using plt.show() to visualize the distribution of PM10 levels.

8.4 Probability-Style Histogram

Use density=True when you want the histogram to represent a probability density instead of raw counts.

python
plt.hist(air["pm10"], bins=8, density=True, edgecolor="black")
plt.title("PM10 Density Histogram")
plt.xlabel("PM10")
plt.ylabel("Density")
plt.show()

Explanation

  • The code utilizes Matplotlib's hist function to create a histogram of the "pm10" values from the air DataFrame.
  • The bins=8 parameter specifies that the data should be divided into 8 equal-width intervals for the histogram.
  • Setting density=True normalizes the histogram, allowing the area under the histogram to sum to 1, representing a probability density.
  • The edgecolor="black" argument adds a black outline to each bin for better visual distinction.
  • The title, xlabel, and ylabel functions are used to label the histogram, enhancing its readability before displaying it with plt.show().

8.5 Log Scale

Log scale helps when values have a long tail.

python
values = np.array([12, 15, 17, 18, 22, 24, 28, 35, 42, 55, 70, 120, 300, 900])

plt.hist(values, bins=8, log=True, edgecolor="black")
plt.title("Histogram With Log Y-Axis")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

Explanation

  • The code initializes a NumPy array containing a set of numerical values.
  • It uses Matplotlib to create a histogram with 8 bins, displaying the frequency of values on a logarithmic scale for better visibility of data distribution.
  • The histogram is styled with black edges for each bin to enhance clarity.
  • Titles and axis labels are added to provide context for the data being represented.
  • Finally, the histogram is displayed using plt.show(), rendering the visual output.

9. Pie Chart

A pie chart shows contribution to a whole.

Use it only when:

  • there are few categories
  • values add up to a meaningful total
  • you want a quick share-of-total view

For exact comparisons, a bar chart is usually better.

9.1 Simple Pie Chart

python
quarter_sales = sales.groupby("quarter")["total_units"].sum()

plt.pie(quarter_sales, labels=quarter_sales.index)
plt.title("Unit Sales Share By Quarter")
plt.show()

Explanation

  • The code aggregates total unit sales by quarter using the groupby method on the sales DataFrame.
  • It calculates the sum of total_units for each quarter, resulting in a Series called quarter_sales.
  • A pie chart is created using plt.pie, with the sales data represented as slices and the quarter labels displayed.
  • The chart is titled "Unit Sales Share By Quarter" to provide context for the visualization.
  • Finally, plt.show() is called to render the pie chart for display.

9.2 Percentages

python
plt.pie(
    quarter_sales,
    labels=quarter_sales.index,
    autopct="%0.1f%%",
)
plt.title("Unit Sales Share By Quarter")
plt.show()

Explanation

  • Creates a pie chart using matplotlib's pyplot interface to display the proportional share of unit sales across different quarters
  • Uses autopct parameter to automatically format and display percentage values with one decimal place on each pie slice
  • Sets custom labels from the quarter_sales index values to identify each quarter segment in the chart
  • Applies a descriptive title "Unit Sales Share By Quarter" to provide context for the visualization
  • Renders the final chart using plt.show() to display the graphical representation of sales distribution

9.3 Colors, Explode, And Shadow

python
plt.pie(
    quarter_sales,
    labels=quarter_sales.index,
    autopct="%0.1f%%",
    colors=["#60a5fa", "#34d399", "#f59e0b", "#f472b6"],
    explode=[0, 0, 0, 0.08],
    shadow=True,
)
plt.title("Quarter Sales Share")
plt.show()

Explanation

  • Creates a pie chart using matplotlib's pyplot module to display quarterly sales data with custom styling options
  • Uses autopct parameter to show percentages with one decimal place and labels to display quarter names from the index
  • Applies a color palette of four distinct colors and creates a slight separation effect on the fourth slice using the explode parameter
  • Adds a shadow effect for visual depth and sets a descriptive title for the chart before displaying it
  • The chart effectively communicates the proportional share of each quarter's sales within the total dataset

Use explode sparingly. It draws attention to one slice.

10. Styles

Matplotlib has built-in styles.

Check available styles:

python
print(plt.style.available)

Explanation

  • This code snippet prints a list of all available matplotlib style options that can be applied to plots
  • The plt.style.available attribute contains a tuple of style names that can be used with plt.style.use() to change the appearance of matplotlib figures
  • Common styles include 'default', 'seaborn', 'ggplot', 'dark_background', and 'bmh' among others
  • This is useful for quickly exploring different visual themes without manually adjusting colors, fonts, and spacing
  • The output helps developers choose appropriate styling for their data visualizations based on presentation needs

Use a style:

python
plt.style.use("seaborn-v0_8")

plt.plot(sales["month_number"], sales["total_profit"], marker="o")
plt.title("Profit With Seaborn Style")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.show()

Explanation

  • Applies the seaborn v0.8 styling theme to enhance plot appearance and consistency
  • Plots monthly profit data using circle markers to visualize trends over time
  • Sets appropriate axis labels and title to clearly communicate the visualization's purpose
  • Displays the final styled plot with improved visual formatting compared to default matplotlib styles

Reset to default:

python
plt.style.use("default")

Explanation

  • Configures matplotlib to use the built-in default styling rather than any custom or alternative themes
  • Ensures consistent appearance of plots with standard colors, fonts, and layout settings
  • Provides a clean baseline for data visualization without additional styling overrides
  • Resets any previously applied style modifications to maintain predictable chart rendering
  • Establishes a professional look for matplotlib figures with proper spacing and visual hierarchy

Good beginner styles:

python
"default"
"ggplot"
"seaborn-v0_8"
"fivethirtyeight"
"bmh"

Explanation

  • These strings represent predefined style sheets available in matplotlib and seaborn that instantly change the appearance of plots
  • Each theme provides a consistent visual aesthetic including colors, fonts, and layout elements for professional-looking data visualizations
  • Commonly used themes include "default" for standard matplotlib styling, "ggplot" for R's ggplot2 style, and "seaborn-v0_8" for seaborn's modern default theme
  • The "fivethirtyeight" theme mimics the visual style of FiveThirtyEight's data journalism, while "bmh" provides a clean, minimalist appearance
  • These styles can be applied using plt.style.use() or sns.set_style() functions to quickly transform plot appearance without manual formatting

11. Figure Size, DPI, And Layout

Use figsize to control chart size.

python
plt.figure(figsize=(10, 4))
plt.plot(sales["month_number"], sales["total_profit"], marker="o")
plt.title("Monthly Profit")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.show()

Explanation

  • Creates a figure with a width of 10 inches and height of 4 inches for optimal visualization
  • Plots monthly profit data as a line graph with circular markers at each data point to highlight individual values
  • Sets the chart title to "Monthly Profit" and labels the x-axis as "Month" and y-axis as "Profit" for clarity
  • Displays the completed plot with all formatting applied to show the profit trend over time

Use dpi for sharper output:

python
plt.figure(figsize=(10, 4), dpi=120)
plt.plot(sales["month_number"], sales["total_profit"], marker="o")
plt.title("Sharper Monthly Profit Chart")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.show()

Explanation

  • Creates a new figure with specified dimensions (10 inches wide by 4 inches tall) and resolution (120 DPI) for high-quality plotting
  • Plots the relationship between month numbers and total profit values with circular markers at each data point to emphasize individual measurements
  • Adds a descriptive title "Sharper Monthly Profit Chart" and labels both axes appropriately with "Month" and "Profit" for clear data interpretation
  • Displays the finalized plot with all formatting elements applied to visualize profit patterns over time

Use tight_layout when labels are getting cut:

python
plt.figure(figsize=(10, 4))
plt.bar(products, values)
plt.xticks(rotation=25, ha="right")
plt.title("Product Sales")
plt.tight_layout()
plt.show()

Explanation

  • Creates a figure with specified dimensions (10 inches wide by 4 inches tall) for optimal display
  • Generates a vertical bar chart using product names as x-axis categories and corresponding values as bar heights
  • Rotates x-axis labels by 25 degrees and aligns them to the right to prevent overlapping text issues
  • Sets the chart title to "Product Sales" for clear identification of the data visualization
  • Applies tight layout to automatically adjust spacing and prevent label cutoff before displaying the final plot

12. Subplots

Subplots help you compare charts in one figure.

python
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].plot(sales["month_number"], sales["total_profit"], marker="o")
axes[0].set_title("Monthly Profit")
axes[0].set_xlabel("Month")
axes[0].set_ylabel("Profit")
axes[0].grid(True, alpha=0.25)

axes[1].hist(air["pm10"], bins=8, edgecolor="black")
axes[1].set_title("PM10 Distribution")
axes[1].set_xlabel("PM10")
axes[1].set_ylabel("Frequency")

plt.tight_layout()
plt.show()

Explanation

  • Creates a figure with two subplots arranged horizontally using matplotlib's subplot functionality
  • First subplot displays a line chart showing monthly profit trends with circular markers and grid lines for better readability
  • Second subplot shows a histogram of PM10 pollution levels with specified bin count and black edges for clear visualization
  • Applies proper labeling and titles to both charts for clear data interpretation
  • Uses tight_layout to automatically adjust spacing between subplots and displays the final visualization

axes[0] controls the first chart.

axes[1] controls the second chart.

13. Annotations

Annotations explain an important point on the chart.

python
best_month = sales.loc[sales["total_profit"].idxmax()]

plt.figure(figsize=(10, 4))
plt.plot(sales["month_number"], sales["total_profit"], marker="o")
plt.annotate(
    "Best month",
    xy=(best_month["month_number"], best_month["total_profit"]),
    xytext=(best_month["month_number"] - 2, best_month["total_profit"] - 12000),
    arrowprops={"arrowstyle": "->"},
)
plt.title("Monthly Profit With Annotation")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.grid(True, alpha=0.25)
plt.show()

Explanation

  • Identifies the month with maximum total profit by finding the index of the maximum value in the total_profit column and selecting that row from the sales DataFrame
  • Creates a line plot showing the monthly profit trend with circular markers at each data point to visualize the profit progression throughout the year
  • Adds an annotation arrow pointing to the best performing month, with text label indicating "Best month" positioned slightly offset from the data point for clarity
  • Applies formatting including title, axis labels, grid lines with transparency, and specified figure size to enhance chart readability and presentation quality
  • Displays the final visualization showing the monthly profit pattern with the peak performance clearly highlighted

Use annotations for insight, not decoration.

14. Save Figure

Use savefig to export a chart.

python
plt.figure(figsize=(10, 4))
plt.plot(sales["month_number"], sales["total_profit"], marker="o")
plt.title("Monthly Profit")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.grid(True, alpha=0.25)
plt.tight_layout()
plt.savefig("monthly_profit.png", dpi=150, bbox_inches="tight")
plt.show()

Explanation

  • Creates a figure with specified dimensions (10 inches wide by 4 inches tall) for optimal display
  • Plots monthly profit data as a line graph with circular markers at each data point to highlight individual values
  • Adds title, axis labels, and grid lines with transparency to improve readability and visual appeal
  • Saves the generated plot as a PNG file with high resolution (150 DPI) and tight bounding box to minimize whitespace
  • Displays the final plot in the current environment for immediate viewing

Useful formats:

python
"chart.png"
"chart.jpg"
"chart.svg"
"chart.pdf"

Explanation

  • This code demonstrates common file extensions used for saving chart images in various formats including PNG, JPEG, SVG, and PDF
  • Each string represents a valid file extension that can be used when exporting visualizations from data analysis libraries
  • The format supports both raster images (PNG, JPG) and vector graphics (SVG, PDF) for different use cases
  • These extensions are commonly used in data visualization workflows for storing charts and graphs
  • The code shows how to define and work with file naming conventions for chart output files

For blog posts and dashboards, PNG is usually easiest.

For reports and print, PDF or SVG can be useful.

15. Common Beginner Mistakes

Mistake 1: Forgetting Labels

Bad chart:

python
plt.plot(sales["month_number"], sales["total_profit"])
plt.show()

Explanation

  • The code creates a line graph using matplotlib's plot function to display the relationship between month numbers and total profit values
  • It uses the sales DataFrame with columns "month_number" as x-axis values and "total_profit" as y-axis values
  • The plt.show() command renders and displays the generated plot visualization
  • This visualization helps identify profit patterns, trends, and seasonal variations across different months
  • The resulting chart provides a clear graphical representation of how profits change over time

Better chart:

python
plt.plot(sales["month_number"], sales["total_profit"])
plt.title("Monthly Profit")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.show()

Explanation

  • Plots the relationship between month numbers and total profit values using matplotlib's plot function
  • Sets the chart title to "Monthly Profit" and labels the x-axis as "Month" and y-axis as "Profit"
  • Displays the resulting line graph showing profit performance across different months
  • The visualization helps identify profit trends, peaks, and valleys throughout the year
  • This type of chart is commonly used for time series analysis and business performance monitoring

Mistake 2: Using Pie Charts For Too Many Categories

If you have more than five or six categories, use a bar chart.

Mistake 3: Not Rotating Long Labels

python
plt.bar(products, values)
plt.xticks(rotation=25, ha="right")
plt.tight_layout()
plt.show()

Explanation

  • Utilizes the plt.bar function from the Matplotlib library to generate a bar chart using products as the categories and values as their corresponding heights.
  • The plt.xticks function adjusts the orientation of the x-axis labels by rotating them 25 degrees and aligning them to the right for better readability.
  • plt.tight_layout() is called to automatically adjust subplot parameters for a neat fit within the figure area.
  • Finally, plt.show() displays the generated bar chart to the user.

Mistake 4: Comparing Raw Counts When Groups Have Different Sizes

Sometimes percentages are more useful than counts.

Ask:

  • Are categories equally sized?
  • Should I plot totals, averages, or rates?
  • Is the chart answering the right question?

Mistake 5: Hiding Outliers Without Saying So

Axis limits can help, but always mention when you use them.

16. Practice Problems

Use the two CSV files from this guide.

Problem 1: Line Plot For Two Countries

Draw a line plot where:

  • x-axis is year
  • y-axis is pm25
  • two lines compare India and Brazil
  • chart includes title, labels, legend, and grid
python
india = air[air["country"] == "India"]
brazil = air[air["country"] == "Brazil"]

plt.figure(figsize=(8, 4))
plt.plot(india["year"], india["pm25"], marker="o", label="India")
plt.plot(brazil["year"], brazil["pm25"], marker="o", label="Brazil")
plt.title("PM2.5 Trend: India vs Brazil")
plt.xlabel("Year")
plt.ylabel("PM2.5")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Explanation

  • Filters the dataset air to create two separate DataFrames for India and Brazil based on the "country" column.
  • Initializes a plot with a specified figure size of 8x4 inches to visualize the data.
  • Plots the PM2.5 levels against the years for both countries, using markers for clarity and labeling each line accordingly.
  • Sets the title and labels for the x-axis and y-axis to provide context for the graph.
  • Displays a legend to differentiate between the two countries and adds a grid for better readability before showing the plot.

Problem 2: Probability Histogram

Draw a density histogram of pm10.

python
plt.figure(figsize=(7, 4))
plt.hist(air["pm10"], bins=8, density=True, edgecolor="black")
plt.title("PM10 Density Histogram")
plt.xlabel("PM10")
plt.ylabel("Density")
plt.grid(True, alpha=0.25)
plt.show()

Explanation

  • Initializes a figure with a specified size of 7 inches by 4 inches for better visualization.
  • Creates a histogram of the "pm10" data from the "air" dataset, using 8 bins to represent the distribution.
  • Sets the histogram to display density instead of frequency, allowing for a normalized view of the data.
  • Adds a title, x-axis label, and y-axis label to provide context for the data being represented.
  • Enables a grid with a low alpha value for improved readability of the histogram without overwhelming the visual.

Problem 3: Scatter Plot For Two Countries

Draw a scatter plot where:

  • x-axis is pm25
  • y-axis is pm10
  • compare Germany and South Africa
  • use different colors and markers
python
germany = air[air["country"] == "Germany"]
south_africa = air[air["country"] == "South Africa"]

plt.figure(figsize=(7, 5))
plt.scatter(germany["pm25"], germany["pm10"], label="Germany", marker="o", color="green")
plt.scatter(south_africa["pm25"], south_africa["pm10"], label="South Africa", marker="^", color="purple")
plt.title("PM2.5 vs PM10")
plt.xlabel("PM2.5")
plt.ylabel("PM10")
plt.legend()
plt.grid(True, alpha=0.25)
plt.show()

Explanation

  • Filters air quality data to isolate records from Germany and South Africa using boolean indexing
  • Plots two distinct scatter plots on the same axes with different markers and colors to visualize PM2.5 vs PM10 relationships
  • Adds proper labeling including title, axis labels, legend, and grid for enhanced readability
  • Uses figure sizing to create an appropriately proportioned visualization for clear data presentation
  • Displays the resulting scatter plot showing pollution level correlations for both countries

Problem 4: Pie Chart Of Top Countries By Monitoring Sites

python
latest = air[air["year"] == air["year"].max()]
top_sites = latest.nlargest(5, "monitoring_sites").set_index("country")["monitoring_sites"]

plt.figure(figsize=(6, 6))
plt.pie(top_sites, labels=top_sites.index, autopct="%0.1f%%")
plt.title("Top Countries By Monitoring Sites")
plt.show()

Explanation

  • Filters the air quality dataset to select only the most recent year's data using the maximum year value
  • Identifies the top 5 countries with the highest number of monitoring sites and prepares them for visualization
  • Generates a pie chart showing the percentage distribution of monitoring sites across these top countries
  • Sets appropriate chart formatting including figure size, labels, percentage display, and title
  • Displays the final pie chart visualization with country names as labels and their respective percentages

Problem 5: Bar Chart Of Top Countries By Monitoring Sites

python
plt.figure(figsize=(8, 4))
plt.bar(top_sites.index, top_sites.values)
plt.title("Top Countries By Monitoring Sites")
plt.xlabel("Country")
plt.ylabel("Monitoring Sites")
plt.xticks(rotation=20, ha="right")
plt.tight_layout()
plt.show()

Explanation

  • Creates a matplotlib figure with specified dimensions (8 inches wide by 4 inches tall) for optimal display
  • Generates a vertical bar chart using the index values (country names) as x-axis positions and their corresponding values (monitoring site counts) as bar heights
  • Adds descriptive labels including title, x-axis label ("Country"), and y-axis label ("Monitoring Sites") to provide context
  • Rotates x-axis tick labels by 20 degrees and aligns them to the right to prevent overlapping text when country names are long
  • Applies tight layout to automatically adjust spacing and prevent label cutoff before displaying the final visualization

Problem 6: Month-On-Month Profit Line Plot

python
plt.figure(figsize=(9, 4))
plt.plot(
    sales["month_number"],
    sales["total_profit"],
    marker="o",
    linestyle=":",
    color="blue",
    label="Total Profit",
)
plt.title("Month-On-Month Profit")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Explanation

  • Creates a line plot showing profit trends across months using the sales dataset
  • Configures plot appearance with circular markers, dotted line style, and blue color scheme
  • Adds descriptive labels for title, x-axis (Month), and y-axis (Profit) with legend support
  • Implements grid overlay with transparency for improved readability of data points
  • Displays the final visualization with proper figure sizing and legend positioning

Problem 7: Product Share Pie Chart For December

python
december = sales[sales["month_name"] == "Dec"].iloc[0]
product_cols = ["smart_speaker", "fitness_band", "wireless_charger", "tablet_stand", "noise_canceling_buds"]
dec_values = december[product_cols]

plt.figure(figsize=(6, 6))
plt.pie(
    dec_values,
    labels=product_cols,
    autopct="%0.1f%%",
    explode=[0, 0, 0.08, 0, 0],
    shadow=True,
)
plt.title("December Product Unit Share")
plt.show()

Explanation

  • Filters the sales dataframe to isolate records from December and extracts the first row of data
  • Selects specific product columns representing different item categories for analysis
  • Generates a pie chart visualization showing the proportional distribution of units sold across product categories
  • Applies visual formatting including percentage labels, slight separation for one category, and shadow effects
  • Displays the chart with a descriptive title indicating the time period and data focus

Problem 8: Multi-Line Plot Of Product Sales

python
plt.figure(figsize=(10, 5))
for product in product_cols:
    plt.plot(sales["month_number"], sales[product], marker="o", label=product)

plt.title("Monthly Sales For All Products")
plt.xlabel("Month")
plt.ylabel("Units Sold")
plt.legend()
plt.grid(True, alpha=0.25)
plt.show()

Explanation

  • The code generates a line chart comparing sales performance across different products over time using matplotlib
  • It iterates through product columns and plots each product's monthly sales data with circular markers for better visibility
  • The chart includes proper labeling with title, x-axis ("Month"), and y-axis ("Units Sold") for clear data interpretation
  • A legend displays product names and a subtle grid helps with reading values from the chart
  • The figure is sized appropriately at 10x5 inches for optimal viewing of the sales comparison visualization

Problem 9: Quarter-Wise Grouped Bar Chart

python
quarter_product = sales.groupby("quarter")[product_cols].sum()
x = np.arange(len(product_cols))
width = 0.2

plt.figure(figsize=(11, 5))
for offset, quarter in enumerate(quarter_product.index):
    plt.bar(x + (offset - 1.5) * width, quarter_product.loc[quarter], width=width, label=quarter)

plt.xticks(x, product_cols, rotation=25, ha="right")
plt.title("Quarter-Wise Sales By Product")
plt.xlabel("Product")
plt.ylabel("Units")
plt.legend()
plt.tight_layout()
plt.show()

Explanation

  • Groups sales data by quarter and calculates total units sold for each product category using pandas groupby and sum operations
  • Sets up bar chart positioning using numpy to create evenly spaced bars with calculated offsets for each quarter's data
  • Plots multiple bar series on the same chart with different colors representing each quarter's sales performance
  • Configures axis labels, title, legend, and formatting including rotated x-axis labels for better readability
  • Uses matplotlib's tight_layout to ensure proper spacing and display of all chart elements

Problem 10: Quarter-Wise Stacked Bar Chart

python
quarter_product = sales.groupby("quarter")[product_cols].sum()
bottom = np.zeros(len(quarter_product))

plt.figure(figsize=(8, 5))
for product in product_cols:
    plt.bar(quarter_product.index, quarter_product[product], bottom=bottom, label=product)
    bottom += quarter_product[product].values

plt.title("Quarter-Wise Stacked Sales")
plt.xlabel("Quarter")
plt.ylabel("Units")
plt.legend(bbox_to_anchor=(1.02, 1), loc="upper left")
plt.tight_layout()
plt.show()

Explanation

  • Groups sales data by quarter and calculates total units sold for each product category using groupby and sum operations
  • Initializes a zero array to track cumulative heights for stacking bar segments in the visualization
  • Iterates through each product column to create stacked bars where each product's contribution is added on top of previous products using the bottom parameter
  • Adds proper chart formatting including title, axis labels, legend positioning, and layout optimization for better readability
  • Displays the final stacked bar chart showing how different products contribute to total quarterly sales

17. Mini Project: Build A Simple Visualization Report

Create a script named matplotlib_report.py.

Your report should produce four saved charts:

  • profit_line.png
  • pm25_pm10_scatter.png
  • quarter_product_grouped_bar.png
  • pm10_histogram.png

Starter structure:

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

air = pd.read_csv("matplotlib_air_quality_trends.csv")
sales = pd.read_csv("matplotlib_product_sales.csv")

plt.figure(figsize=(9, 4))
plt.plot(sales["month_number"], sales["total_profit"], marker="o")
plt.title("Monthly Profit")
plt.xlabel("Month")
plt.ylabel("Profit")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("profit_line.png", dpi=150, bbox_inches="tight")
plt.close()

plt.figure(figsize=(7, 5))
plt.scatter(air["pm25"], air["pm10"], alpha=0.7)
plt.title("PM2.5 vs PM10")
plt.xlabel("PM2.5")
plt.ylabel("PM10")
plt.grid(True, alpha=0.25)
plt.tight_layout()
plt.savefig("pm25_pm10_scatter.png", dpi=150, bbox_inches="tight")
plt.close()

Explanation

  • Loads two CSV datasets containing product sales and air quality measurements for visualization analysis
  • Generates a line plot showing monthly profit trends with markers and grid lines, then saves it as a PNG file
  • Creates a scatter plot comparing PM2.5 and PM10 air pollutant levels with transparency and grid formatting
  • Uses consistent figure sizing and tight layout optimization for professional-looking chart outputs
  • Saves both visualizations with high DPI resolution and tight bounding boxes for clear image quality

plt.close() closes the current figure after saving it. This is useful in scripts that create many charts.

18. Chart Selection Cheat Sheet

ChartBest ForAvoid When
Line plottrend over ordered x-axisx-axis has unordered categories
Scatter plotrelationship between two numeric columnsone column is categorical
Bar chartcategory comparisontoo many categories
Histogramdistribution of one numeric columnvalues are categories
Pie chartsimple part-to-whole sharemany categories or precise comparison
Grouped barcomparing categories across groupstoo many groups
Stacked bartotal plus contributionexact segment comparison is important

19. Interview-Style Questions

What is the difference between plot and scatter?

plot is mainly used for lines, though it can draw markers. scatter is designed for point clouds and supports point size and color mapping more naturally.

Why do we use legend?

Use legend when a chart has multiple lines, groups, or categories and the viewer needs to identify them.

Why should every chart have labels?

Without labels, the viewer may not know what the x-axis, y-axis, or units represent.

What does figsize do?

figsize controls the width and height of the figure in inches.

What is the purpose of bins in a histogram?

Bins divide continuous numeric values into intervals. The histogram counts how many values fall into each interval.

When should you avoid pie charts?

Avoid pie charts when there are many categories, very similar values, or when precise comparison matters.

Why use tight_layout?

It reduces label and title clipping by adjusting spacing around subplots and axes.

What is the difference between savefig and show?

savefig writes the chart to a file. show displays the chart on screen or in a notebook.

20. Final Practice Checklist

Before moving to advanced visualization, make sure you can:

  • load CSV data with Pandas
  • choose the right chart for the question
  • draw line, scatter, bar, histogram, and pie charts
  • style lines with colors, markers, and line styles
  • add title, x-label, y-label, legend, and grid
  • control axis limits
  • rotate category labels
  • make grouped and stacked bars
  • use figsize, dpi, and tight_layout
  • save charts as PNG files
  • explain why a chart is useful for the question being asked

Matplotlib becomes much easier when you stop memorizing isolated commands and start thinking in questions:

  • What am I comparing?
  • What changes over time?
  • What is the distribution?
  • What is the relationship?
  • What part contributes to the whole?

Choose the chart that answers the question with the least confusion.


Next in this series: Mastering Advanced Matplotlib: Visualizations and Techniques →

Frequently Asked Questions

What types of plots can you create with Matplotlib according to the guide?
You can create line plots, scatter plots, bar charts, histograms, and pie charts using Matplotlib.
What are examples of numerical data mentioned in the guide?
Examples of numerical data include sales revenue, profit, temperature, PM2.5 pollution level, study minutes, exam score, product units sold, and customer age.
What are some examples of categorical data provided in the guide?
Examples of categorical data include country, product name, quarter, plan type, city, department, course category, and payment method.
What is the purpose of using histograms according to the guide?
Histograms are used for frequency and probability-style distributions.
What files are used in the guide for practice, and where should they be placed?
The files used are 'matplotlibairqualitytrends.csv' and 'matplotlibproductsales.csv', and they should be placed in the same folder as your notebook or script, or inside a data/ folder with updated paths.

Related Work

See how this thinking shows up in shipped systems.