Interactive India District Dashboard with Plotly & Streamlit

Jun 14, 2026
35 min read

AI Insights

Powered by GPT-4o-mini

Verified Context: interactive-india-district-dashboard-with-plotly-streamlit
Quick Answer

Build an original Plotly project that maps district-level indicators, lets users choose metrics, filters by state, and turns the analysis into a small Streamlit dashboard.

Quick Summary

Learn to build an interactive India district dashboard using Plotly and Streamlit, visualizing key metrics like population and literacy.

Build an Interactive India District Dashboard with Plotly and Streamlit

Plotly is useful when a static chart is not enough.

In this project, you will build an interactive district-level dashboard for India-style geographic data. The dashboard lets a user:

  • choose a state or view all districts
  • choose one metric for marker size
  • choose another metric for marker color
  • inspect each district by hovering over the map
  • run the result as a Streamlit web app

This is an original teaching project. The sample CSV included with this guide uses approximate locations and synthetic metrics for practice. It is not copied from a course notebook or proprietary dataset.

Files Used In This Guide

Use this CSV file:

  • plotly_india_district_sample.csv

Place it in the same folder as your notebook or script.

If you keep it in a data/ folder, load it like this:

python
df = pd.read_csv("data/plotly_india_district_sample.csv")

What You Will Build

By the end, you will have:

  • a cleaned district metrics table
  • a reusable plot_district_map() function
  • an all-India interactive bubble map
  • a state-filtered map
  • a Streamlit dashboard with sidebar controls

The final app will use Plotly Express and Streamlit:

bash
pip install pandas plotly streamlit

1. Load The Dataset

Start with Pandas and Plotly Express:

python
import pandas as pd
import plotly.express as px

Load the CSV:

python
df = pd.read_csv("plotly_india_district_sample.csv")

print(df.head())
print(df.shape)

Expected columns include:

  • State
  • District
  • Latitude
  • Longitude
  • Population
  • Households
  • Households_with_Internet
  • Households_with_Computer
  • Housholds_with_Electric_Lighting
  • Workers
  • sex_ratio
  • literacy_rate
  • internet_household_pct
  • urban_household_pct

The column Housholds_with_Electric_Lighting keeps the same misspelling that often appears in raw public data extracts. In a real project, you may rename it. In this tutorial, we keep it visible so you learn how to handle imperfect source schemas.

2. Validate The Data

Before plotting, check whether the map has valid coordinates and numeric metrics.

python
required_columns = [
    "State",
    "District",
    "Latitude",
    "Longitude",
    "Population",
    "Households",
    "Households_with_Internet",
    "Households_with_Computer",
    "Housholds_with_Electric_Lighting",
    "Workers",
    "sex_ratio",
    "literacy_rate",
    "internet_household_pct",
    "urban_household_pct",
]

missing_columns = [col for col in required_columns if col not in df.columns]
print("Missing columns:", missing_columns)

print(df[["Latitude", "Longitude"]].isna().sum())
print(df.duplicated(subset=["State", "District"]).sum())

For this sample dataset, you should see:

  • no missing required columns
  • no missing coordinates
  • no duplicate state-district pairs

3. Create A Basic District Map

Plotly can draw points on map tiles with px.scatter_mapbox.

python
fig = px.scatter_mapbox(
    df,
    lat="Latitude",
    lon="Longitude",
    hover_name="District",
    hover_data=["State", "Population", "literacy_rate"],
    zoom=3.8,
    height=650,
    mapbox_style="carto-positron",
    title="District Sample Map",
)

fig.show()

The carto-positron style works without a Mapbox token, which makes it convenient for notebooks and small teaching apps.

4. Encode Population With Marker Size

A dashboard becomes more useful when visual properties carry meaning.

Here, marker size represents population:

python
fig = px.scatter_mapbox(
    df,
    lat="Latitude",
    lon="Longitude",
    size="Population",
    size_max=35,
    hover_name="District",
    hover_data=["State", "Population"],
    zoom=3.8,
    height=650,
    mapbox_style="carto-positron",
    title="Population By District",
)

fig.show()

Use size_max to prevent the largest districts from covering the whole map.

5. Add Color For A Second Metric

Now encode literacy_rate with color:

python
fig = px.scatter_mapbox(
    df,
    lat="Latitude",
    lon="Longitude",
    size="Population",
    color="literacy_rate",
    size_max=35,
    color_continuous_scale="Viridis",
    hover_name="District",
    hover_data={
        "State": True,
        "Population": ":,",
        "literacy_rate": ":.1f",
        "Latitude": False,
        "Longitude": False,
    },
    zoom=3.8,
    height=650,
    mapbox_style="carto-positron",
    title="Population Size And Literacy Rate Color",
)

fig.show()

This creates a two-metric visualization:

  • larger bubbles mean larger population
  • brighter or darker colors show literacy differences

6. Filter To One State

Dashboards usually need filters.

Filter the data to one state:

python
state_name = "Maharashtra"
state_df = df[df["State"] == state_name]

fig = px.scatter_mapbox(
    state_df,
    lat="Latitude",
    lon="Longitude",
    size="Population",
    color="internet_household_pct",
    size_max=35,
    color_continuous_scale="Plasma",
    hover_name="District",
    hover_data=["State", "Population", "Households_with_Internet"],
    zoom=5.5,
    height=650,
    mapbox_style="carto-positron",
    title=f"Internet Access Sample Metrics In {state_name}",
)

fig.show()

The same charting logic works for a national view and a state-level view.

7. Build A Reusable Plot Function

Instead of rewriting the same Plotly call, create a function.

python
def plot_district_map(data, primary_metric, secondary_metric, title, zoom):
    fig = px.scatter_mapbox(
        data,
        lat="Latitude",
        lon="Longitude",
        size=primary_metric,
        color=secondary_metric,
        size_max=35,
        color_continuous_scale="Viridis",
        hover_name="District",
        hover_data={
            "State": True,
            primary_metric: ":,",
            secondary_metric: ":.2f" if "pct" in secondary_metric or "rate" in secondary_metric else ":,",
            "Latitude": False,
            "Longitude": False,
        },
        zoom=zoom,
        height=700,
        mapbox_style="carto-positron",
        title=title,
    )

    fig.update_layout(
        margin={"r": 0, "t": 50, "l": 0, "b": 0},
        coloraxis_colorbar_title=secondary_metric.replace("_", " ").title(),
    )

    return fig

Test it:

python
fig = plot_district_map(
    df,
    primary_metric="Population",
    secondary_metric="literacy_rate",
    title="District Population And Literacy Rate",
    zoom=3.8,
)

fig.show()

This function is the heart of the dashboard.

8. Choose Metrics Programmatically

Create a list of numeric columns users can select.

python
protected_columns = {"Latitude", "Longitude"}

numeric_metrics = [
    col
    for col in df.select_dtypes(include="number").columns
    if col not in protected_columns
]

print(numeric_metrics)

Example output:

text
['Population', 'Male_Literate', 'Female_Literate', 'Households_with_Internet', ...]

This lets the dashboard remain flexible even if you add more metrics later.

9. Build The Streamlit App

Create a file named app.py:

python
import pandas as pd
import plotly.express as px
import streamlit as st

@st.cache_data
def load_data():
    return pd.read_csv("plotly_india_district_sample.csv")

def plot_district_map(data, primary_metric, secondary_metric, title, zoom):
    fig = px.scatter_mapbox(
        data,
        lat="Latitude",
        lon="Longitude",
        size=primary_metric,
        color=secondary_metric,
        size_max=35,
        color_continuous_scale="Viridis",
        hover_name="District",
        hover_data={
            "State": True,
            primary_metric: ":,",
            secondary_metric: ":.2f" if "pct" in secondary_metric or "rate" in secondary_metric else ":,",
            "Latitude": False,
            "Longitude": False,
        },
        zoom=zoom,
        height=700,
        mapbox_style="carto-positron",
        title=title,
    )

    fig.update_layout(margin={"r": 0, "t": 50, "l": 0, "b": 0})
    return fig

df = load_data()

st.set_page_config(page_title="India District Metrics Dashboard", layout="wide")
st.title("India District Metrics Dashboard")
st.caption("Original practice dataset with approximate locations and synthetic indicators.")

states = ["Overall India"] + sorted(df["State"].unique())

numeric_metrics = [
    col
    for col in df.select_dtypes(include="number").columns
    if col not in {"Latitude", "Longitude"}
]

selected_state = st.sidebar.selectbox("Select a state", states)
primary_metric = st.sidebar.selectbox("Marker size metric", numeric_metrics, index=numeric_metrics.index("Population"))
secondary_metric = st.sidebar.selectbox("Marker color metric", numeric_metrics, index=numeric_metrics.index("literacy_rate"))

if selected_state == "Overall India":
    filtered = df
    zoom = 3.8
    title = f"All Districts: Size = {primary_metric}, Color = {secondary_metric}"
else:
    filtered = df[df["State"] == selected_state]
    zoom = 5.2
    title = f"{selected_state}: Size = {primary_metric}, Color = {secondary_metric}"

left, right = st.columns([3, 1])

with left:
    fig = plot_district_map(filtered, primary_metric, secondary_metric, title, zoom)
    st.plotly_chart(fig, use_container_width=True)

with right:
    st.subheader("Selected Data")
    st.metric("Districts", len(filtered))
    st.metric("Total Population", f"{int(filtered['Population'].sum()):,}")
    st.metric("Avg Literacy Rate", f"{filtered['literacy_rate'].mean():.1f}%")
    st.metric("Avg Internet Household %", f"{filtered['internet_household_pct'].mean():.1f}%")

st.dataframe(
    filtered[["State", "District", primary_metric, secondary_metric]].sort_values(primary_metric, ascending=False),
    use_container_width=True,
)

Run the app:

bash
streamlit run app.py

10. How The Dashboard Works

The dashboard has three layers:

  1. Data loading
  2. User controls
  3. Plot rendering

The data loading layer uses @st.cache_data so Streamlit does not reread the CSV on every interaction.

The sidebar controls change:

  • which rows are displayed
  • which column controls marker size
  • which column controls marker color

The Plotly function receives those choices and returns a new figure.

11. Improve The Hover Tooltip

Hover labels are where interactive charts become useful.

Try adding more context:

python
fig = px.scatter_mapbox(
    filtered,
    lat="Latitude",
    lon="Longitude",
    size=primary_metric,
    color=secondary_metric,
    hover_name="District",
    hover_data={
        "State": True,
        "Population": ":,",
        "Households": ":,",
        "literacy_rate": ":.1f",
        "internet_household_pct": ":.1f",
        "urban_household_pct": ":.1f",
        "Latitude": False,
        "Longitude": False,
    },
    mapbox_style="carto-positron",
)

Hide coordinates unless they are analytically useful. Most users care more about the district name and metrics than raw latitude and longitude.

12. Add A Ranking Chart

A map answers "where".

A bar chart answers "who is highest or lowest".

Add a ranking chart below the map:

python
top_districts = filtered.nlargest(10, secondary_metric)

bar_fig = px.bar(
    top_districts,
    x=secondary_metric,
    y="District",
    color=secondary_metric,
    orientation="h",
    title=f"Top Districts By {secondary_metric}",
)

bar_fig.update_layout(yaxis={"categoryorder": "total ascending"})
st.plotly_chart(bar_fig, use_container_width=True)

This turns the app from a map-only demo into a small analytical dashboard.

13. Common Problems

The map does not display

Check these items:

  • Latitude and Longitude are numeric
  • the values are not missing
  • the map style is token-free, such as carto-positron

Markers are too large

Lower size_max:

python
size_max=20

Explanation

  • Defines a constant variable size_max with value 20 that likely represents an upper limit for data structures or processing operations
  • This variable can be used throughout the codebase to maintain consistent sizing constraints without hardcoding the number 20 in multiple locations
  • The naming convention suggests this is part of a configuration or parameter setup phase in a program
  • Commonly used in scenarios like array bounds checking, buffer size limitations, or iterative process termination conditions
  • This approach improves code maintainability by centralizing the maximum size value in one location

The app reruns too often

Streamlit reruns the script when a widget changes. Use @st.cache_data for loading data and keep expensive transformations inside cached functions.

The chart has too many hover fields

Use hover_data to hide columns:

python
hover_data={"Latitude": False, "Longitude": False}

Explanation

  • This code snippet defines a dictionary that controls which data fields appear when hovering over elements in Plotly visualizations
  • The keys "Latitude" and "Longitude" are set to False, meaning these coordinates won't be displayed during hover interactions
  • This approach helps reduce clutter in tooltips by hiding less essential coordinate information while keeping other data visible
  • Commonly used in dash applications to create cleaner, more focused user interfaces for geographic data visualization
  • The configuration can be passed to Plotly's hover_data parameter to customize interactive tooltip behavior

14. Practice Tasks

Try these improvements:

  1. Add a dropdown that switches the color scale between Viridis, Plasma, and Turbo.
  2. Add a checkbox to show only districts above a selected literacy rate.
  3. Add a slider for minimum population.
  4. Create a second tab for ranking charts.
  5. Add a download button for the filtered data.

Example filter:

python
minimum_population = st.sidebar.slider(
    "Minimum population",
    min_value=int(df["Population"].min()),
    max_value=int(df["Population"].max()),
    value=int(df["Population"].min()),
)

filtered = filtered[filtered["Population"] >= minimum_population]

Explanation

  • Creates an interactive slider widget in Streamlit sidebar to select minimum population threshold
  • Sets slider range from the minimum to maximum population values in the dataset
  • Initializes slider to the minimum population value by default
  • Filters the dataset to include only rows where population meets or exceeds the selected minimum threshold
  • Updates the filtered dataframe based on user selection for downstream analysis or visualization

Final Takeaway

This project teaches a practical visualization workflow:

  • clean and validate tabular data
  • map numeric columns to visual encodings
  • use hover labels for context
  • wrap Plotly logic in reusable functions
  • expose filters through Streamlit controls

That pattern is reusable for many real dashboards: public datasets, operations monitoring, sales territories, service locations, logistics, and education analytics.

Frequently Asked Questions

What is the purpose of the interactive district dashboard for India?
The dashboard allows users to choose a state or view all districts, select metrics for marker size and color, inspect each district by hovering over the map, and run the result as a Streamlit web app.
What tools are used to build the interactive dashboard?
The dashboard is built using Plotly Express for visualization and Streamlit for creating the web app.
What dataset is used in this guide?
The guide uses a sample CSV file named 'plotlyindiadistrictsample.csv' which contains approximate locations and synthetic metrics for practice.
What are some of the expected columns in the dataset?
Expected columns include State, District, Latitude, Longitude, Population, Households, HouseholdswithInternet, HouseholdswithComputer, HousholdswithElectricLighting, Workers, sexratio, literacyrate, internethouseholdpct, and urbanhouseholdpct.
How do you handle the misspelled column name in the dataset?
The column 'HousholdswithElectricLighting' is kept with its misspelling to teach how to handle imperfect source schemas, although in a real project, it may be renamed.

Related Work

See how this thinking shows up in shipped systems.