# Mastering Matplotlib: A Comprehensive Guide to Data Visualization URL: https://madhudadi.in/blog/posts/matplotlib-guide-line-scatter-bar-and-more-charts Published: 2026-06-09 Tags: python, Data Visualization, Matplotlib Read time: 45 min Difficulty: beginner > Learn Matplotlib from scratch with original examples: line plots, scatter plots, bar charts, grouped and stacked bars, histograms, pie charts, styles, subplots, annotations, and saving figures.# Matplotlib Plotting: Line, Scatter, Bar, Histogram, Pie, Styles, and Saving Figures Matplotlib is one of the most important Python libraries for data visualization. You use it when you want to turn numbers into pictures: - how a metric changes over time - whether two numeric columns move together - which category contributes the most - how values are distributed - whether an outlier is hiding inside the data - how multiple products, countries, learners, or experiments compare This guide teaches Matplotlib with original examples and small synthetic CSV files. You will not use copied sports, course, or public datasets. The examples here use fresh data about air-quality monitoring and product sales so you can practice the same plotting ideas safely. ## Files Used In This Guide Place these files in the same folder as your notebook or script: - `matplotlib_air_quality_trends.csv` - `matplotlib_product_sales.csv` You can also place them inside a `data/` folder. If you do that, update the paths: ```python air = pd.read_csv("data/matplotlib_air_quality_trends.csv") sales = pd.read_csv("data/matplotlib_product_sales.csv") ``` ## What You Will Learn By the end, you should be able to: - explain when to use a line plot, scatter plot, bar chart, histogram, and pie chart - identify numerical and categorical data - create simple plots with `plt.plot`, `plt.scatter`, `plt.bar`, `plt.hist`, and `plt.pie` - add labels, titles, legends, grid lines, and axis limits - change colors, markers, line styles, line widths, and marker sizes - compare multiple series in one chart - create vertical, horizontal, grouped, and stacked bar charts - use histograms for frequency and probability-style distributions - resize figures with `figsize` - use built-in Matplotlib styles - create subplots for side-by-side comparison - save charts with `savefig` - solve practical plotting exercises from raw CSV data ## 1. Types Of Data Before choosing a chart, understand the columns. Most beginner plotting problems use two broad types of data. **Numerical data** Numerical data is made of numbers where mathematical operations make sense. Examples: - sales revenue - profit - temperature - PM2.5 pollution level - study minutes - exam score - product units sold - customer age **Categorical data** Categorical data describes groups or labels. Examples: - country - product name - quarter - plan type - city - department - course category - payment method Chart choice depends on the relationship you want to inspect. | Question | Data Pattern | Good Chart | |---|---:|---| | How does profit change month by month? | numerical over time | line plot | | Do PM2.5 and PM10 rise together? | numerical vs numerical | scatter plot | | Which product sold the most? | categorical vs numerical | bar chart | | How are PM10 values distributed? | one numerical column | histogram | | What share does each category contribute? | category contribution | pie chart, used carefully | ## 2. Import Matplotlib Most examples use this standard setup: ```python import numpy as np import pandas as pd import matplotlib.pyplot as plt ``` `pyplot` is commonly imported as `plt`. You will usually follow this pattern: ```python plt.figure(figsize=(8, 4)) plt.plot([1, 2, 3], [10, 20, 15]) plt.title("Simple Line Plot") plt.xlabel("Step") plt.ylabel("Value") plt.show() ``` In notebooks, `plt.show()` is not always required, but using it is a good habit because it makes your intent clear. ## 3. The Basic Mental Model Matplotlib has two common styles: - the beginner-friendly `plt` style - the object-oriented `fig, ax` style The `plt` style is fast for learning: ```python plt.plot(x, y) plt.title("My Chart") plt.show() ``` The object-oriented style is better for serious projects: ```python fig, ax = plt.subplots(figsize=(8, 4)) ax.plot(x, y) ax.set_title("My Chart") plt.show() ``` In this guide, we start with `plt` because it is simple. Then we use `subplots` when multiple charts need to sit together. ## 4. Load The Practice Data ```python import pandas as pd air = pd.read_csv("matplotlib_air_quality_trends.csv") sales = pd.read_csv("matplotlib_product_sales.csv") print(air.head()) print(sales.head()) ``` Check the shape: ```python print(air.shape) print(sales.shape) ``` Expected idea: - `air` has country-year pollution metrics - `sales` has monthly sales and profit metrics for five products ## 5. 2D Line Plot A line plot is useful when the x-axis has an ordered sequence. Common use cases: - time series - monthly profit - yearly pollution level - daily website visits - model accuracy over training epochs - cumulative learning progress Line plots work well for: - numerical vs numerical - categorical ordered labels vs numerical ### 5.1 Simple Line Plot ```python x = [1, 2, 3, 4, 5] y = [3, 7, 5, 9, 12] plt.plot(x, y) plt.show() ``` **Explanation** - Two lists x and y are defined containing numerical values that represent coordinates for plotting - The plt.plot() function creates a line graph connecting the points formed by corresponding elements from both lists - The plt.show() command displays the generated plot in a graphical window - This approach visualizes relationships between paired data points in a simple and effective manner - The resulting chart shows how values in list y change relative to values in list x across the specified range ### 5.2 Line Plot From CSV Data Plot India PM2.5 trend over years: ```python india = air[air["country"] == "India"] plt.figure(figsize=(8, 4)) plt.plot(india["year"], india["pm25"]) plt.title("India PM2.5 Trend") plt.xlabel("Year") plt.ylabel("PM2.5") plt.show() ``` **Explanation** - Filters the air quality dataset to isolate records where the country is India using boolean indexing - Sets up a matplotlib figure with specific dimensions (8x4 inches) for optimal visualization - Plots PM2.5 levels against years for India, creating a time series trend visualization - Adds appropriate title and axis labels to make the chart informative and readable - Displays the final plot showing the historical PM2.5 pollution pattern in India ### 5.3 Plot Multiple Lines Compare India and Brazil: ```python india = air[air["country"] == "India"] brazil = air[air["country"] == "Brazil"] plt.figure(figsize=(8, 4)) plt.plot(india["year"], india["pm25"], label="India") plt.plot(brazil["year"], brazil["pm25"], label="Brazil") plt.title("PM2.5 Trend: India vs Brazil") plt.xlabel("Year") plt.ylabel("PM2.5") plt.legend() plt.show() ``` **Explanation** - Filters air quality data to isolate records for India and Brazil countries using boolean indexing - Creates a line plot showing PM2.5 concentration levels over time for both countries on the same graph - Sets appropriate chart labels including title, x-axis (year), and y-axis (PM2.5 levels) with legend for distinction - Configures figure size to ensure proper display and renders the final visualization - Uses pandas DataFrame filtering and matplotlib plotting functions to create comparative environmental trend analysis `label` names each line. `legend()` displays those names. ### 5.4 Colors, Line Styles, And Line Width ```python plt.figure(figsize=(8, 4)) plt.plot( india["year"], india["pm25"], color="#f59e0b", linestyle="--", linewidth=2.5, label="India", ) plt.plot( brazil["year"], brazil["pm25"], color="#2563eb", linestyle="-.", linewidth=2.5, label="Brazil", ) plt.title("Styled PM2.5 Lines") plt.xlabel("Year") plt.ylabel("PM2.5") plt.legend() plt.show() ``` **Explanation** - Creates a figure with specified dimensions (8x4 inches) for displaying the plot - Plots two distinct line series using different styling options: India data with orange dashed line and Brazil data with blue dash-dot line - Adds comprehensive chart elements including title, axis labels, and legend to clearly identify both datasets - Uses matplotlib's plotting functions to visualize temporal trends in PM2.5 pollution levels across the two countries - Displays the final combined visualization with proper formatting and styling for clear data comparison Useful `linestyle` values: ```python "-" # solid "--" # dashed "-." # dash-dot ":" # dotted ``` **Explanation** - These four string constants represent different line style patterns used in matplotlib and similar plotting libraries - The single dash "-" creates a solid line, the double dash "--" produces a dashed line pattern - The dash-dot pattern "-." combines dashes and dots, while the colon ":" generates a dotted line - These styles are commonly used when customizing plot appearance to distinguish between multiple data series - Each pattern can be applied to line plots, scatter plots with connecting lines, or any visualization requiring styled line rendering ### 5.5 Markers Markers show each actual data point. ```python plt.figure(figsize=(8, 4)) plt.plot( india["year"], india["pm25"], marker="o", markersize=7, linewidth=2, label="India", ) plt.title("PM2.5 With Markers") plt.xlabel("Year") plt.ylabel("PM2.5") plt.legend() plt.show() ``` **Explanation** - Creates a matplotlib figure with specified dimensions (8 inches wide by 4 inches tall) for optimal visualization - Plots PM2.5 data from the india DataFrame using year as x-axis and pm25 values as y-axis with circular markers and line styling - Adds title "PM2.5 With Markers" and labels for both axes to provide context and clarity - Includes a legend to identify the India data series and displays the final plot with plt.show() - Uses marker styling (o shape, 7 size) and line styling (2 width) to make data points clearly visible on the trend line Useful marker values: ```python "o" # circle "s" # square "D" # diamond "^" # triangle up "+" # plus "x" # x marker ``` **Explanation** - These are matplotlib marker style codes used to represent different shapes in scatter plots and line charts - Each character represents a specific geometric symbol: "o" for circles, "s" for squares, "D" for diamonds, "^" for upward triangles - The "+" and "x" markers create plus signs and cross marks respectively for data point differentiation - These markers are commonly used in data visualization to distinguish between multiple datasets or categories in the same plot - They can be combined with color and size parameters to create comprehensive visual representations of data relationships ### 5.6 Axis Limits Axis limits help when one outlier stretches the plot too much. ```python months = [1, 2, 3, 4, 5, 6, 7] price = [48000, 54000, 57000, 49000, 47000, 45000, 4500000] plt.plot(months, price, marker="o") plt.title("Price With One Extreme Outlier") plt.show() ``` **Explanation** - The code creates two lists representing months (1-7) and corresponding price values, including one significantly higher value (4,500,000) - It generates a line plot using matplotlib with circular markers at each data point to clearly show the relationship between months and prices - The plot includes a title indicating the presence of an extreme outlier to alert viewers to the unusual data point - The visualization makes it easy to observe the general trend while highlighting how the outlier dramatically affects the overall price pattern - This type of chart is useful for identifying anomalies and understanding data distribution patterns in time series analysis The final value dominates the chart. Limit the y-axis: ```python plt.plot(months, price, marker="o") plt.ylim(40000, 65000) plt.title("Price Trend With Y-Axis Limited") plt.show() ``` **Explanation** - Plots monthly price data using circular markers to visualize trends over time - Sets the y-axis limits between 40,000 and 65,000 to focus on the relevant price range - Adds a descriptive title "Price Trend With Y-Axis Limited" to contextualize the visualization - Displays the resulting plot to show the relationship between months and price values You can also limit both axes: ```python plt.plot(months, price, marker="o") plt.xlim(1, 6) plt.ylim(40000, 65000) plt.title("Focused Price View") plt.show() ``` **Explanation** - Creates a line plot showing price trends across months with circular markers at data points - Sets x-axis range from 1 to 6 months and y-axis range from 40,000 to 65,000 dollars - Applies a custom title "Focused Price View" to provide context for the visualization - Displays the plot with all configured formatting and scaling applied Use limits carefully. They are useful for focus, but they can also hide important values. ### 5.7 Grid Lines ```python plt.figure(figsize=(8, 4)) plt.plot(sales["month_number"], sales["total_profit"], marker="o") plt.title("Monthly Total Profit") plt.xlabel("Month") plt.ylabel("Profit") plt.grid(True, alpha=0.3) plt.show() ``` **Explanation** - Creates a figure with specified dimensions (8 inches wide by 4 inches tall) for optimal visualization - Plots monthly profit data as a line graph with circular markers at each data point to highlight individual values - Adds descriptive labels including title, x-axis (month), and y-axis (profit) to make the chart self-explanatory - Enables a subtle grid overlay with transparency to improve readability of values along both axes - Displays the finalized plot with all formatting applied to show the monthly profit trend clearly `alpha` controls transparency. Lower values make grid lines softer. ## 6. Scatter Plot A scatter plot is used for numerical vs numerical analysis. Common use cases: - correlation - clusters - outliers - relationship between two measurements - impact of one metric on another ### 6.1 Simple Scatter Plot ```python x = [5, 7, 8, 9, 11, 13] y = [45, 52, 50, 61, 67, 72] plt.scatter(x, y) plt.title("Simple Scatter Plot") plt.xlabel("Input") plt.ylabel("Output") plt.show() ``` **Explanation** - Two lists x and y are defined containing paired numerical values representing input and output data points - The matplotlib scatter plot function plots each pair of coordinates (x[i], y[i]) as individual dots on a 2D graph - Axis labels and a title are added to make the visualization informative and properly labeled - The plt.show() command renders the completed scatter plot graphic for viewing and analysis ### 6.2 PM2.5 vs PM10 ```python plt.figure(figsize=(7, 5)) plt.scatter(air["pm25"], air["pm10"]) plt.title("PM2.5 vs PM10") plt.xlabel("PM2.5") plt.ylabel("PM10") plt.grid(True, alpha=0.25) plt.show() ``` **Explanation** - Creates a scatter plot using matplotlib to visualize the correlation between two air quality metrics: PM2.5 (fine particulate matter) and PM10 (coarse particulate matter) - Sets the figure size to 7x5 inches for optimal display of the scatter plot visualization - Adds axis labels and title to clearly identify the variables being compared and the plot's purpose - Enables a subtle grid overlay with 25% transparency to improve readability of data points - Displays the final scatter plot showing the distribution pattern between the two pollutant measurements If the points rise from left to right, the two values likely move together. ### 6.3 Scatter Plot With Color And Marker ```python india = air[air["country"] == "India"] germany = air[air["country"] == "Germany"] plt.figure(figsize=(7, 5)) plt.scatter(india["pm25"], india["pm10"], color="orange", marker="o", label="India") plt.scatter(germany["pm25"], germany["pm10"], color="green", marker="^", label="Germany") plt.title("PM2.5 vs PM10 By Country") plt.xlabel("PM2.5") plt.ylabel("PM10") plt.legend() plt.grid(True, alpha=0.25) plt.show() ``` **Explanation** - Filters air quality data to create separate datasets for India and Germany based on country column values - Plots two scatter plots on the same figure using different colors and markers to distinguish between the two countries - Sets up chart formatting including title, axis labels, legend, and grid for better visualization of the relationship between PM2.5 and PM10 concentrations - Displays the resulting scatter plot showing pollution level comparisons between the two nations - Uses matplotlib to render the visualization with appropriate styling and labeling for clear data interpretation ### 6.4 Bubble Scatter Plot You can use marker size to show a third variable. ```python plt.figure(figsize=(8, 5)) plt.scatter( air["pm25"], air["pm10"], s=air["monitoring_sites"] * 8, alpha=0.65, ) plt.title("PM2.5 vs PM10, Sized By Monitoring Sites") plt.xlabel("PM2.5") plt.ylabel("PM10") plt.grid(True, alpha=0.25) plt.show() ``` **Explanation** - Creates a scatter plot showing the relationship between PM2.5 and PM10 air pollution measurements - Uses bubble size to represent the number of monitoring sites, with each site scaled by a factor of 8 for visibility - Adds descriptive labels including title, x-axis (PM2.5), and y-axis (PM10) with a grid overlay for better data interpretation - Sets transparency (alpha=0.65) to handle potential overlapping data points and improve visual clarity - Displays the final plot with proper figure sizing and grid formatting for enhanced readability `s` controls marker size. `alpha` is especially useful when points overlap. ### 6.5 Scatter-Like Plot With `plt.plot` `plt.plot` can also draw points only: ```python plt.plot(air["pm25"], air["pm10"], "o") plt.title("Scatter-Like Plot Using plt.plot") plt.xlabel("PM2.5") plt.ylabel("PM10") plt.show() ``` **Explanation** - The code generates a scatter-like plot by plotting PM2.5 values on the x-axis against PM10 values on the y-axis - Each data point is represented as a circle marker ("o") rather than traditional scatter plot dots - The plot includes proper axis labels for both PM2.5 and PM10 concentrations along with a descriptive title - The matplotlib.pyplot.show() function displays the resulting visualization to the user - This type of visualization helps identify potential correlations or patterns between particulate matter concentration levels For normal scatter plots, prefer `plt.scatter`. ## 7. Bar Chart A bar chart compares categories. Use it for: - product revenue - country counts - sales by quarter - learners by plan - tickets by priority - average score by course ### 7.1 Simple Bar Chart ```python products = ["Speaker", "Band", "Charger", "Stand", "Buds"] units = [318, 315, 402, 268, 301] plt.bar(products, units) plt.title("December Units Sold") plt.xlabel("Product") plt.ylabel("Units") plt.show() ``` **Explanation** - Creates two lists containing product names and their corresponding unit sales figures for December - Uses matplotlib's bar function to generate a horizontal bar chart comparing sales across different products - Adds descriptive labels including title, x-axis label for products, and y-axis label for units sold - Displays the completed bar chart visualization showing which products had the highest and lowest sales volumes - The chart helps identify top-selling products by visually representing the numerical differences in unit sales ### 7.2 Bar Chart From Sales CSV ```python december = sales[sales["month_name"] == "Dec"].iloc[0] products = ["smart_speaker", "fitness_band", "wireless_charger", "tablet_stand", "noise_canceling_buds"] values = december[products] plt.figure(figsize=(10, 4)) plt.bar(products, values) plt.title("December Units Sold By Product") plt.xlabel("Product") plt.ylabel("Units") plt.xticks(rotation=25, ha="right") plt.show() ``` **Explanation** - Filters sales data to isolate the December month records and extracts the first row of data - Selects specific product columns and their corresponding sales values for the December record - Generates a horizontal bar chart displaying units sold for each product category - Formats the chart with appropriate labels, title, and rotated x-axis labels for better readability - Displays the completed bar chart visualization showing December sales performance across different products `rotation` helps when category names are long. ### 7.3 Horizontal Bar Chart ```python plt.figure(figsize=(8, 4)) plt.barh(products, values) plt.title("December Units Sold By Product") plt.xlabel("Units") plt.ylabel("Product") plt.show() ``` **Explanation** - Creates a horizontal bar chart using matplotlib with specified figure dimensions of 8 by 4 inches - Plots product names on the y-axis and corresponding sales values on the x-axis using horizontal bars - Adds a descriptive title "December Units Sold By Product" and labels both axes appropriately - Displays the completed chart with units on the x-axis and product names on the y-axis - Uses plt.show() to render and display the final visualization to the user Horizontal bars are easier to read when labels are long. ### 7.4 Bar Width ```python plt.bar(products, values, width=0.5) plt.title("Bar Width Example") plt.xticks(rotation=25, ha="right") plt.show() ``` **Explanation** - Creates a vertical bar chart using matplotlib's bar function with products on x-axis and values on y-axis - Sets the bar width to 0.5 units for a more compact appearance compared to default width - Rotates x-axis labels by 25 degrees and aligns them to the right for better readability when labels are long - Displays the chart with a title "Bar Width Example" to indicate the purpose of the visualization - Shows the final plot with all formatting applied including axis labels and title Very wide bars can look crowded. Very narrow bars can make values harder to compare. ### 7.5 Grouped Bar Chart Grouped bars compare categories across multiple groups. Example: compare Q1, Q2, Q3, and Q4 product totals. ```python product_cols = ["smart_speaker", "fitness_band", "wireless_charger", "tablet_stand", "noise_canceling_buds"] quarter_product = sales.groupby("quarter")[product_cols].sum() x = np.arange(len(product_cols)) width = 0.2 plt.figure(figsize=(11, 5)) for offset, quarter in enumerate(quarter_product.index): plt.bar( x + (offset - 1.5) * width, quarter_product.loc[quarter], width=width, label=quarter, ) plt.xticks(x, product_cols, rotation=25, ha="right") plt.title("Quarter-Wise Product Sales") plt.xlabel("Product") plt.ylabel("Units") plt.legend() plt.tight_layout() plt.show() ``` **Explanation** - Groups sales data by quarter and calculates total units sold for each product category using sum aggregation - Sets up horizontal positioning for bars with numpy array and defines bar width for proper spacing between groups - Creates side-by-side bar charts for each quarter using matplotlib's bar function with calculated offsets - Configures chart appearance with rotated x-axis labels, title, axis labels, and legend for clear visualization - Uses tight layout to ensure proper spacing and displays the final grouped bar chart visualization Important idea: - `np.arange` creates numeric positions - each quarter is shifted slightly left or right - `xticks` puts product names back on the x-axis ### 7.6 Stacked Bar Chart Stacked bars show how parts combine into a total. ```python quarter_product = sales.groupby("quarter")[product_cols].sum() bottom = np.zeros(len(quarter_product)) plt.figure(figsize=(8, 5)) for product in product_cols: plt.bar( quarter_product.index, quarter_product[product], bottom=bottom, label=product, ) bottom += quarter_product[product].values plt.title("Stacked Product Sales By Quarter") plt.xlabel("Quarter") plt.ylabel("Units") plt.legend(bbox_to_anchor=(1.02, 1), loc="upper left") plt.tight_layout() plt.show() ``` **Explanation** - Groups sales data by quarter and calculates total units sold for each product category using sum aggregation - Initializes a zero array to track cumulative bottom positions for stacking bar segments - Iterates through each product column to create stacked bars where each product's contribution is added on top of previous products - Sets chart title, axis labels, and legend positioning while maintaining proper layout spacing - Displays the final stacked bar chart showing how different products contribute to total quarterly sales Use stacked bars when the total and contribution both matter. Avoid stacked bars when there are too many categories. ## 8. Histogram A histogram shows the distribution of a numerical column. Use it for: - frequency count - shape of values - spread - skew - outliers - comparing before and after cleaning ### 8.1 Simple Histogram ```python plt.hist(air["pm10"]) plt.title("PM10 Distribution") plt.xlabel("PM10") plt.ylabel("Frequency") plt.show() ``` **Explanation** - The code generates a histogram showing how PM10 (particulate matter) levels are distributed across the dataset - It uses matplotlib's hist function to plot the frequency distribution of pm10 values from the air dataframe - The chart is customized with a title "PM10 Distribution" and labeled axes for better readability - The visualization helps identify patterns such as normal distribution, skewness, or outliers in air quality measurements - The plt.show() command renders the final histogram plot for viewing and analysis ### 8.2 Change Bins ```python plt.hist(air["pm10"], bins=8) plt.title("PM10 Distribution With 8 Bins") plt.xlabel("PM10") plt.ylabel("Frequency") plt.show() ``` **Explanation** - The code utilizes the `matplotlib.pyplot` library to create a histogram of PM10 data from the `air` DataFrame. - It specifies 8 bins to categorize the PM10 values, allowing for a clearer understanding of the data distribution. - The histogram is titled "PM10 Distribution With 8 Bins" to provide context for the visualization. - The x-axis is labeled "PM10" to indicate the variable being measured, while the y-axis is labeled "Frequency" to show how often each range of PM10 values occurs. - Finally, `plt.show()` is called to display the generated histogram to the user. More bins reveal more detail. Fewer bins give a smoother summary. ### 8.3 Custom Bin Edges ```python bins = [0, 20, 40, 60, 80, 100, 120, 140] plt.hist(air["pm10"], bins=bins, edgecolor="black") plt.title("PM10 Distribution With Custom Bins") plt.xlabel("PM10 Range") plt.ylabel("Frequency") plt.show() ``` **Explanation** - Defines a list of bin edges to categorize PM10 values into specific ranges. - Utilizes Matplotlib's `hist` function to create a histogram of the "pm10" data from the `air` DataFrame. - Sets the histogram's edge color to black for better visibility of the bars. - Adds a title and labels for the x-axis and y-axis to enhance the plot's readability. - Displays the histogram using `plt.show()` to visualize the distribution of PM10 levels. ### 8.4 Probability-Style Histogram Use `density=True` when you want the histogram to represent a probability density instead of raw counts. ```python plt.hist(air["pm10"], bins=8, density=True, edgecolor="black") plt.title("PM10 Density Histogram") plt.xlabel("PM10") plt.ylabel("Density") plt.show() ``` **Explanation** - The code utilizes Matplotlib's `hist` function to create a histogram of the "pm10" values from the `air` DataFrame. - The `bins=8` parameter specifies that the data should be divided into 8 equal-width intervals for the histogram. - Setting `density=True` normalizes the histogram, allowing the area under the histogram to sum to 1, representing a probability density. - The `edgecolor="black"` argument adds a black outline to each bin for better visual distinction. - The `title`, `xlabel`, and `ylabel` functions are used to label the histogram, enhancing its readability before displaying it with `plt.show()`. ### 8.5 Log Scale Log scale helps when values have a long tail. ```python values = np.array([12, 15, 17, 18, 22, 24, 28, 35, 42, 55, 70, 120, 300, 900]) plt.hist(values, bins=8, log=True, edgecolor="black") plt.title("Histogram With Log Y-Axis") plt.xlabel("Value") plt.ylabel("Frequency") plt.show() ``` **Explanation** - The code initializes a NumPy array containing a set of numerical values. - It uses Matplotlib to create a histogram with 8 bins, displaying the frequency of values on a logarithmic scale for better visibility of data distribution. - The histogram is styled with black edges for each bin to enhance clarity. - Titles and axis labels are added to provide context for the data being represented. - Finally, the histogram is displayed using `plt.show()`, rendering the visual output. ## 9. Pie Chart A pie chart shows contribution to a whole. Use it only when: - there are few categories - values add up to a meaningful total - you want a quick share-of-total view For exact comparisons, a bar chart is usually better. ### 9.1 Simple Pie Chart ```python quarter_sales = sales.groupby("quarter")["total_units"].sum() plt.pie(quarter_sales, labels=quarter_sales.index) plt.title("Unit Sales Share By Quarter") plt.show() ``` **Explanation** - The code aggregates total unit sales by quarter using the `groupby` method on the `sales` DataFrame. - It calculates the sum of `total_units` for each quarter, resulting in a Series called `quarter_sales`. - A pie chart is created using `plt.pie`, with the sales data represented as slices and the quarter labels displayed. - The chart is titled "Unit Sales Share By Quarter" to provide context for the visualization. - Finally, `plt.show()` is called to render the pie chart for display. ### 9.2 Percentages ```python plt.pie( quarter_sales, labels=quarter_sales.index, autopct="%0.1f%%", ) plt.title("Unit Sales Share By Quarter") plt.show() ``` **Explanation** - Creates a pie chart using matplotlib's pyplot interface to display the proportional share of unit sales across different quarters - Uses autopct parameter to automatically format and display percentage values with one decimal place on each pie slice - Sets custom labels from the quarter_sales index values to identify each quarter segment in the chart - Applies a descriptive title "Unit Sales Share By Quarter" to provide context for the visualization - Renders the final chart using plt.show() to display the graphical representation of sales distribution ### 9.3 Colors, Explode, And Shadow ```python plt.pie( quarter_sales, labels=quarter_sales.index, autopct="%0.1f%%", colors=["#60a5fa", "#34d399", "#f59e0b", "#f472b6"], explode=[0, 0, 0, 0.08], shadow=True, ) plt.title("Quarter Sales Share") plt.show() ``` **Explanation** - Creates a pie chart using matplotlib's pyplot module to display quarterly sales data with custom styling options - Uses autopct parameter to show percentages with one decimal place and labels to display quarter names from the index - Applies a color palette of four distinct colors and creates a slight separation effect on the fourth slice using the explode parameter - Adds a shadow effect for visual depth and sets a descriptive title for the chart before displaying it - The chart effectively communicates the proportional share of each quarter's sales within the total dataset Use `explode` sparingly. It draws attention to one slice. ## 10. Styles Matplotlib has built-in styles. Check available styles: ```python print(plt.style.available) ``` **Explanation** - This code snippet prints a list of all available matplotlib style options that can be applied to plots - The plt.style.available attribute contains a tuple of style names that can be used with plt.style.use() to change the appearance of matplotlib figures - Common styles include 'default', 'seaborn', 'ggplot', 'dark_background', and 'bmh' among others - This is useful for quickly exploring different visual themes without manually adjusting colors, fonts, and spacing - The output helps developers choose appropriate styling for their data visualizations based on presentation needs Use a style: ```python plt.style.use("seaborn-v0_8") plt.plot(sales["month_number"], sales["total_profit"], marker="o") plt.title("Profit With Seaborn Style") plt.xlabel("Month") plt.ylabel("Profit") plt.show() ``` **Explanation** - Applies the seaborn v0.8 styling theme to enhance plot appearance and consistency - Plots monthly profit data using circle markers to visualize trends over time - Sets appropriate axis labels and title to clearly communicate the visualization's purpose - Displays the final styled plot with improved visual formatting compared to default matplotlib styles Reset to default: ```python plt.style.use("default") ``` **Explanation** - Configures matplotlib to use the built-in default styling rather than any custom or alternative themes - Ensures consistent appearance of plots with standard colors, fonts, and layout settings - Provides a clean baseline for data visualization without additional styling overrides - Resets any previously applied style modifications to maintain predictable chart rendering - Establishes a professional look for matplotlib figures with proper spacing and visual hierarchy Good beginner styles: ```python "default" "ggplot" "seaborn-v0_8" "fivethirtyeight" "bmh" ``` **Explanation** - These strings represent predefined style sheets available in matplotlib and seaborn that instantly change the appearance of plots - Each theme provides a consistent visual aesthetic including colors, fonts, and layout elements for professional-looking data visualizations - Commonly used themes include "default" for standard matplotlib styling, "ggplot" for R's ggplot2 style, and "seaborn-v0_8" for seaborn's modern default theme - The "fivethirtyeight" theme mimics the visual style of FiveThirtyEight's data journalism, while "bmh" provides a clean, minimalist appearance - These styles can be applied using plt.style.use() or sns.set_style() functions to quickly transform plot appearance without manual formatting ## 11. Figure Size, DPI, And Layout Use `figsize` to control chart size. ```python plt.figure(figsize=(10, 4)) plt.plot(sales["month_number"], sales["total_profit"], marker="o") plt.title("Monthly Profit") plt.xlabel("Month") plt.ylabel("Profit") plt.show() ``` **Explanation** - Creates a figure with a width of 10 inches and height of 4 inches for optimal visualization - Plots monthly profit data as a line graph with circular markers at each data point to highlight individual values - Sets the chart title to "Monthly Profit" and labels the x-axis as "Month" and y-axis as "Profit" for clarity - Displays the completed plot with all formatting applied to show the profit trend over time Use `dpi` for sharper output: ```python plt.figure(figsize=(10, 4), dpi=120) plt.plot(sales["month_number"], sales["total_profit"], marker="o") plt.title("Sharper Monthly Profit Chart") plt.xlabel("Month") plt.ylabel("Profit") plt.show() ``` **Explanation** - Creates a new figure with specified dimensions (10 inches wide by 4 inches tall) and resolution (120 DPI) for high-quality plotting - Plots the relationship between month numbers and total profit values with circular markers at each data point to emphasize individual measurements - Adds a descriptive title "Sharper Monthly Profit Chart" and labels both axes appropriately with "Month" and "Profit" for clear data interpretation - Displays the finalized plot with all formatting elements applied to visualize profit patterns over time Use `tight_layout` when labels are getting cut: ```python plt.figure(figsize=(10, 4)) plt.bar(products, values) plt.xticks(rotation=25, ha="right") plt.title("Product Sales") plt.tight_layout() plt.show() ``` **Explanation** - Creates a figure with specified dimensions (10 inches wide by 4 inches tall) for optimal display - Generates a vertical bar chart using product names as x-axis categories and corresponding values as bar heights - Rotates x-axis labels by 25 degrees and aligns them to the right to prevent overlapping text issues - Sets the chart title to "Product Sales" for clear identification of the data visualization - Applies tight layout to automatically adjust spacing and prevent label cutoff before displaying the final plot ## 12. Subplots Subplots help you compare charts in one figure. ```python fig, axes = plt.subplots(1, 2, figsize=(12, 4)) axes[0].plot(sales["month_number"], sales["total_profit"], marker="o") axes[0].set_title("Monthly Profit") axes[0].set_xlabel("Month") axes[0].set_ylabel("Profit") axes[0].grid(True, alpha=0.25) axes[1].hist(air["pm10"], bins=8, edgecolor="black") axes[1].set_title("PM10 Distribution") axes[1].set_xlabel("PM10") axes[1].set_ylabel("Frequency") plt.tight_layout() plt.show() ``` **Explanation** - Creates a figure with two subplots arranged horizontally using matplotlib's subplot functionality - First subplot displays a line chart showing monthly profit trends with circular markers and grid lines for better readability - Second subplot shows a histogram of PM10 pollution levels with specified bin count and black edges for clear visualization - Applies proper labeling and titles to both charts for clear data interpretation - Uses tight_layout to automatically adjust spacing between subplots and displays the final visualization `axes[0]` controls the first chart. `axes[1]` controls the second chart. ## 13. Annotations Annotations explain an important point on the chart. ```python best_month = sales.loc[sales["total_profit"].idxmax()] plt.figure(figsize=(10, 4)) plt.plot(sales["month_number"], sales["total_profit"], marker="o") plt.annotate( "Best month", xy=(best_month["month_number"], best_month["total_profit"]), xytext=(best_month["month_number"] - 2, best_month["total_profit"] - 12000), arrowprops={"arrowstyle": "->"}, ) plt.title("Monthly Profit With Annotation") plt.xlabel("Month") plt.ylabel("Profit") plt.grid(True, alpha=0.25) plt.show() ``` **Explanation** - Identifies the month with maximum total profit by finding the index of the maximum value in the total_profit column and selecting that row from the sales DataFrame - Creates a line plot showing the monthly profit trend with circular markers at each data point to visualize the profit progression throughout the year - Adds an annotation arrow pointing to the best performing month, with text label indicating "Best month" positioned slightly offset from the data point for clarity - Applies formatting including title, axis labels, grid lines with transparency, and specified figure size to enhance chart readability and presentation quality - Displays the final visualization showing the monthly profit pattern with the peak performance clearly highlighted Use annotations for insight, not decoration. ## 14. Save Figure Use `savefig` to export a chart. ```python plt.figure(figsize=(10, 4)) plt.plot(sales["month_number"], sales["total_profit"], marker="o") plt.title("Monthly Profit") plt.xlabel("Month") plt.ylabel("Profit") plt.grid(True, alpha=0.25) plt.tight_layout() plt.savefig("monthly_profit.png", dpi=150, bbox_inches="tight") plt.show() ``` **Explanation** - Creates a figure with specified dimensions (10 inches wide by 4 inches tall) for optimal display - Plots monthly profit data as a line graph with circular markers at each data point to highlight individual values - Adds title, axis labels, and grid lines with transparency to improve readability and visual appeal - Saves the generated plot as a PNG file with high resolution (150 DPI) and tight bounding box to minimize whitespace - Displays the final plot in the current environment for immediate viewing Useful formats: ```python "chart.png" "chart.jpg" "chart.svg" "chart.pdf" ``` **Explanation** - This code demonstrates common file extensions used for saving chart images in various formats including PNG, JPEG, SVG, and PDF - Each string represents a valid file extension that can be used when exporting visualizations from data analysis libraries - The format supports both raster images (PNG, JPG) and vector graphics (SVG, PDF) for different use cases - These extensions are commonly used in data visualization workflows for storing charts and graphs - The code shows how to define and work with file naming conventions for chart output files For blog posts and dashboards, PNG is usually easiest. For reports and print, PDF or SVG can be useful. ## 15. Common Beginner Mistakes ### Mistake 1: Forgetting Labels Bad chart: ```python plt.plot(sales["month_number"], sales["total_profit"]) plt.show() ``` **Explanation** - The code creates a line graph using matplotlib's plot function to display the relationship between month numbers and total profit values - It uses the sales DataFrame with columns "month_number" as x-axis values and "total_profit" as y-axis values - The plt.show() command renders and displays the generated plot visualization - This visualization helps identify profit patterns, trends, and seasonal variations across different months - The resulting chart provides a clear graphical representation of how profits change over time Better chart: ```python plt.plot(sales["month_number"], sales["total_profit"]) plt.title("Monthly Profit") plt.xlabel("Month") plt.ylabel("Profit") plt.show() ``` **Explanation** - Plots the relationship between month numbers and total profit values using matplotlib's plot function - Sets the chart title to "Monthly Profit" and labels the x-axis as "Month" and y-axis as "Profit" - Displays the resulting line graph showing profit performance across different months - The visualization helps identify profit trends, peaks, and valleys throughout the year - This type of chart is commonly used for time series analysis and business performance monitoring ### Mistake 2: Using Pie Charts For Too Many Categories If you have more than five or six categories, use a bar chart. ### Mistake 3: Not Rotating Long Labels ```python plt.bar(products, values) plt.xticks(rotation=25, ha="right") plt.tight_layout() plt.show() ``` **Explanation** - Utilizes the `plt.bar` function from the Matplotlib library to generate a bar chart using `products` as the categories and `values` as their corresponding heights. - The `plt.xticks` function adjusts the orientation of the x-axis labels by rotating them 25 degrees and aligning them to the right for better readability. - `plt.tight_layout()` is called to automatically adjust subplot parameters for a neat fit within the figure area. - Finally, `plt.show()` displays the generated bar chart to the user. ### Mistake 4: Comparing Raw Counts When Groups Have Different Sizes Sometimes percentages are more useful than counts. Ask: - Are categories equally sized? - Should I plot totals, averages, or rates? - Is the chart answering the right question? ### Mistake 5: Hiding Outliers Without Saying So Axis limits can help, but always mention when you use them. ## 16. Practice Problems Use the two CSV files from this guide. ### Problem 1: Line Plot For Two Countries Draw a line plot where: - x-axis is `year` - y-axis is `pm25` - two lines compare India and Brazil - chart includes title, labels, legend, and grid ```python india = air[air["country"] == "India"] brazil = air[air["country"] == "Brazil"] plt.figure(figsize=(8, 4)) plt.plot(india["year"], india["pm25"], marker="o", label="India") plt.plot(brazil["year"], brazil["pm25"], marker="o", label="Brazil") plt.title("PM2.5 Trend: India vs Brazil") plt.xlabel("Year") plt.ylabel("PM2.5") plt.legend() plt.grid(True, alpha=0.3) plt.show() ``` **Explanation** - Filters the dataset `air` to create two separate DataFrames for India and Brazil based on the "country" column. - Initializes a plot with a specified figure size of 8x4 inches to visualize the data. - Plots the PM2.5 levels against the years for both countries, using markers for clarity and labeling each line accordingly. - Sets the title and labels for the x-axis and y-axis to provide context for the graph. - Displays a legend to differentiate between the two countries and adds a grid for better readability before showing the plot. ### Problem 2: Probability Histogram Draw a density histogram of `pm10`. ```python plt.figure(figsize=(7, 4)) plt.hist(air["pm10"], bins=8, density=True, edgecolor="black") plt.title("PM10 Density Histogram") plt.xlabel("PM10") plt.ylabel("Density") plt.grid(True, alpha=0.25) plt.show() ``` **Explanation** - Initializes a figure with a specified size of 7 inches by 4 inches for better visualization. - Creates a histogram of the "pm10" data from the "air" dataset, using 8 bins to represent the distribution. - Sets the histogram to display density instead of frequency, allowing for a normalized view of the data. - Adds a title, x-axis label, and y-axis label to provide context for the data being represented. - Enables a grid with a low alpha value for improved readability of the histogram without overwhelming the visual. ### Problem 3: Scatter Plot For Two Countries Draw a scatter plot where: - x-axis is `pm25` - y-axis is `pm10` - compare Germany and South Africa - use different colors and markers ```python germany = air[air["country"] == "Germany"] south_africa = air[air["country"] == "South Africa"] plt.figure(figsize=(7, 5)) plt.scatter(germany["pm25"], germany["pm10"], label="Germany", marker="o", color="green") plt.scatter(south_africa["pm25"], south_africa["pm10"], label="South Africa", marker="^", color="purple") plt.title("PM2.5 vs PM10") plt.xlabel("PM2.5") plt.ylabel("PM10") plt.legend() plt.grid(True, alpha=0.25) plt.show() ``` **Explanation** - Filters air quality data to isolate records from Germany and South Africa using boolean indexing - Plots two distinct scatter plots on the same axes with different markers and colors to visualize PM2.5 vs PM10 relationships - Adds proper labeling including title, axis labels, legend, and grid for enhanced readability - Uses figure sizing to create an appropriately proportioned visualization for clear data presentation - Displays the resulting scatter plot showing pollution level correlations for both countries ### Problem 4: Pie Chart Of Top Countries By Monitoring Sites ```python latest = air[air["year"] == air["year"].max()] top_sites = latest.nlargest(5, "monitoring_sites").set_index("country")["monitoring_sites"] plt.figure(figsize=(6, 6)) plt.pie(top_sites, labels=top_sites.index, autopct="%0.1f%%") plt.title("Top Countries By Monitoring Sites") plt.show() ``` **Explanation** - Filters the air quality dataset to select only the most recent year's data using the maximum year value - Identifies the top 5 countries with the highest number of monitoring sites and prepares them for visualization - Generates a pie chart showing the percentage distribution of monitoring sites across these top countries - Sets appropriate chart formatting including figure size, labels, percentage display, and title - Displays the final pie chart visualization with country names as labels and their respective percentages ### Problem 5: Bar Chart Of Top Countries By Monitoring Sites ```python plt.figure(figsize=(8, 4)) plt.bar(top_sites.index, top_sites.values) plt.title("Top Countries By Monitoring Sites") plt.xlabel("Country") plt.ylabel("Monitoring Sites") plt.xticks(rotation=20, ha="right") plt.tight_layout() plt.show() ``` **Explanation** - Creates a matplotlib figure with specified dimensions (8 inches wide by 4 inches tall) for optimal display - Generates a vertical bar chart using the index values (country names) as x-axis positions and their corresponding values (monitoring site counts) as bar heights - Adds descriptive labels including title, x-axis label ("Country"), and y-axis label ("Monitoring Sites") to provide context - Rotates x-axis tick labels by 20 degrees and aligns them to the right to prevent overlapping text when country names are long - Applies tight layout to automatically adjust spacing and prevent label cutoff before displaying the final visualization ### Problem 6: Month-On-Month Profit Line Plot ```python plt.figure(figsize=(9, 4)) plt.plot( sales["month_number"], sales["total_profit"], marker="o", linestyle=":", color="blue", label="Total Profit", ) plt.title("Month-On-Month Profit") plt.xlabel("Month") plt.ylabel("Profit") plt.legend() plt.grid(True, alpha=0.3) plt.show() ``` **Explanation** - Creates a line plot showing profit trends across months using the sales dataset - Configures plot appearance with circular markers, dotted line style, and blue color scheme - Adds descriptive labels for title, x-axis (Month), and y-axis (Profit) with legend support - Implements grid overlay with transparency for improved readability of data points - Displays the final visualization with proper figure sizing and legend positioning ### Problem 7: Product Share Pie Chart For December ```python december = sales[sales["month_name"] == "Dec"].iloc[0] product_cols = ["smart_speaker", "fitness_band", "wireless_charger", "tablet_stand", "noise_canceling_buds"] dec_values = december[product_cols] plt.figure(figsize=(6, 6)) plt.pie( dec_values, labels=product_cols, autopct="%0.1f%%", explode=[0, 0, 0.08, 0, 0], shadow=True, ) plt.title("December Product Unit Share") plt.show() ``` **Explanation** - Filters the sales dataframe to isolate records from December and extracts the first row of data - Selects specific product columns representing different item categories for analysis - Generates a pie chart visualization showing the proportional distribution of units sold across product categories - Applies visual formatting including percentage labels, slight separation for one category, and shadow effects - Displays the chart with a descriptive title indicating the time period and data focus ### Problem 8: Multi-Line Plot Of Product Sales ```python plt.figure(figsize=(10, 5)) for product in product_cols: plt.plot(sales["month_number"], sales[product], marker="o", label=product) plt.title("Monthly Sales For All Products") plt.xlabel("Month") plt.ylabel("Units Sold") plt.legend() plt.grid(True, alpha=0.25) plt.show() ``` **Explanation** - The code generates a line chart comparing sales performance across different products over time using matplotlib - It iterates through product columns and plots each product's monthly sales data with circular markers for better visibility - The chart includes proper labeling with title, x-axis ("Month"), and y-axis ("Units Sold") for clear data interpretation - A legend displays product names and a subtle grid helps with reading values from the chart - The figure is sized appropriately at 10x5 inches for optimal viewing of the sales comparison visualization ### Problem 9: Quarter-Wise Grouped Bar Chart ```python quarter_product = sales.groupby("quarter")[product_cols].sum() x = np.arange(len(product_cols)) width = 0.2 plt.figure(figsize=(11, 5)) for offset, quarter in enumerate(quarter_product.index): plt.bar(x + (offset - 1.5) * width, quarter_product.loc[quarter], width=width, label=quarter) plt.xticks(x, product_cols, rotation=25, ha="right") plt.title("Quarter-Wise Sales By Product") plt.xlabel("Product") plt.ylabel("Units") plt.legend() plt.tight_layout() plt.show() ``` **Explanation** - Groups sales data by quarter and calculates total units sold for each product category using pandas groupby and sum operations - Sets up bar chart positioning using numpy to create evenly spaced bars with calculated offsets for each quarter's data - Plots multiple bar series on the same chart with different colors representing each quarter's sales performance - Configures axis labels, title, legend, and formatting including rotated x-axis labels for better readability - Uses matplotlib's tight_layout to ensure proper spacing and display of all chart elements ### Problem 10: Quarter-Wise Stacked Bar Chart ```python quarter_product = sales.groupby("quarter")[product_cols].sum() bottom = np.zeros(len(quarter_product)) plt.figure(figsize=(8, 5)) for product in product_cols: plt.bar(quarter_product.index, quarter_product[product], bottom=bottom, label=product) bottom += quarter_product[product].values plt.title("Quarter-Wise Stacked Sales") plt.xlabel("Quarter") plt.ylabel("Units") plt.legend(bbox_to_anchor=(1.02, 1), loc="upper left") plt.tight_layout() plt.show() ``` **Explanation** - Groups sales data by quarter and calculates total units sold for each product category using groupby and sum operations - Initializes a zero array to track cumulative heights for stacking bar segments in the visualization - Iterates through each product column to create stacked bars where each product's contribution is added on top of previous products using the bottom parameter - Adds proper chart formatting including title, axis labels, legend positioning, and layout optimization for better readability - Displays the final stacked bar chart showing how different products contribute to total quarterly sales ## 17. Mini Project: Build A Simple Visualization Report Create a script named `matplotlib_report.py`. Your report should produce four saved charts: - `profit_line.png` - `pm25_pm10_scatter.png` - `quarter_product_grouped_bar.png` - `pm10_histogram.png` Starter structure: ```python import numpy as np import pandas as pd import matplotlib.pyplot as plt air = pd.read_csv("matplotlib_air_quality_trends.csv") sales = pd.read_csv("matplotlib_product_sales.csv") plt.figure(figsize=(9, 4)) plt.plot(sales["month_number"], sales["total_profit"], marker="o") plt.title("Monthly Profit") plt.xlabel("Month") plt.ylabel("Profit") plt.grid(True, alpha=0.3) plt.tight_layout() plt.savefig("profit_line.png", dpi=150, bbox_inches="tight") plt.close() plt.figure(figsize=(7, 5)) plt.scatter(air["pm25"], air["pm10"], alpha=0.7) plt.title("PM2.5 vs PM10") plt.xlabel("PM2.5") plt.ylabel("PM10") plt.grid(True, alpha=0.25) plt.tight_layout() plt.savefig("pm25_pm10_scatter.png", dpi=150, bbox_inches="tight") plt.close() ``` **Explanation** - Loads two CSV datasets containing product sales and air quality measurements for visualization analysis - Generates a line plot showing monthly profit trends with markers and grid lines, then saves it as a PNG file - Creates a scatter plot comparing PM2.5 and PM10 air pollutant levels with transparency and grid formatting - Uses consistent figure sizing and tight layout optimization for professional-looking chart outputs - Saves both visualizations with high DPI resolution and tight bounding boxes for clear image quality `plt.close()` closes the current figure after saving it. This is useful in scripts that create many charts. ## 18. Chart Selection Cheat Sheet | Chart | Best For | Avoid When | |---|---|---| | Line plot | trend over ordered x-axis | x-axis has unordered categories | | Scatter plot | relationship between two numeric columns | one column is categorical | | Bar chart | category comparison | too many categories | | Histogram | distribution of one numeric column | values are categories | | Pie chart | simple part-to-whole share | many categories or precise comparison | | Grouped bar | comparing categories across groups | too many groups | | Stacked bar | total plus contribution | exact segment comparison is important | ## 19. Interview-Style Questions ### What is the difference between `plot` and `scatter`? `plot` is mainly used for lines, though it can draw markers. `scatter` is designed for point clouds and supports point size and color mapping more naturally. ### Why do we use `legend`? Use `legend` when a chart has multiple lines, groups, or categories and the viewer needs to identify them. ### Why should every chart have labels? Without labels, the viewer may not know what the x-axis, y-axis, or units represent. ### What does `figsize` do? `figsize` controls the width and height of the figure in inches. ### What is the purpose of `bins` in a histogram? Bins divide continuous numeric values into intervals. The histogram counts how many values fall into each interval. ### When should you avoid pie charts? Avoid pie charts when there are many categories, very similar values, or when precise comparison matters. ### Why use `tight_layout`? It reduces label and title clipping by adjusting spacing around subplots and axes. ### What is the difference between `savefig` and `show`? `savefig` writes the chart to a file. `show` displays the chart on screen or in a notebook. ## 20. Final Practice Checklist Before moving to advanced visualization, make sure you can: - load CSV data with Pandas - choose the right chart for the question - draw line, scatter, bar, histogram, and pie charts - style lines with colors, markers, and line styles - add title, x-label, y-label, legend, and grid - control axis limits - rotate category labels - make grouped and stacked bars - use `figsize`, `dpi`, and `tight_layout` - save charts as PNG files - explain why a chart is useful for the question being asked Matplotlib becomes much easier when you stop memorizing isolated commands and start thinking in questions: - What am I comparing? - What changes over time? - What is the distribution? - What is the relationship? - What part contributes to the whole? Choose the chart that answers the question with the least confusion.