Build an Interactive India District Dashboard with Plotly and Streamlit
Plotly is useful when a static chart is not enough.
In this project, you will build an interactive district-level dashboard for India-style geographic data. The dashboard lets a user:
- choose a state or view all districts
- choose one metric for marker size
- choose another metric for marker color
- inspect each district by hovering over the map
- run the result as a Streamlit web app
This is an original teaching project. The sample CSV included with this guide uses approximate locations and synthetic metrics for practice. It is not copied from a course notebook or proprietary dataset.
Files Used In This Guide
Use this CSV file:
plotly_india_district_sample.csv
Place it in the same folder as your notebook or script.
If you keep it in a data/ folder, load it like this:
df = pd.read_csv("data/plotly_india_district_sample.csv")What You Will Build
By the end, you will have:
- a cleaned district metrics table
- a reusable
plot_district_map()function - an all-India interactive bubble map
- a state-filtered map
- a Streamlit dashboard with sidebar controls
The final app will use Plotly Express and Streamlit:
pip install pandas plotly streamlit1. Load The Dataset
Start with Pandas and Plotly Express:
import pandas as pd
import plotly.express as pxLoad the CSV:
df = pd.read_csv("plotly_india_district_sample.csv")
print(df.head())
print(df.shape)Expected columns include:
StateDistrictLatitudeLongitudePopulationHouseholdsHouseholds_with_InternetHouseholds_with_ComputerHousholds_with_Electric_LightingWorkerssex_ratioliteracy_rateinternet_household_pcturban_household_pct
The column Housholds_with_Electric_Lighting keeps the same misspelling that often appears in raw public data extracts. In a real project, you may rename it. In this tutorial, we keep it visible so you learn how to handle imperfect source schemas.
2. Validate The Data
Before plotting, check whether the map has valid coordinates and numeric metrics.
required_columns = [
"State",
"District",
"Latitude",
"Longitude",
"Population",
"Households",
"Households_with_Internet",
"Households_with_Computer",
"Housholds_with_Electric_Lighting",
"Workers",
"sex_ratio",
"literacy_rate",
"internet_household_pct",
"urban_household_pct",
]
missing_columns = [col for col in required_columns if col not in df.columns]
print("Missing columns:", missing_columns)
print(df[["Latitude", "Longitude"]].isna().sum())
print(df.duplicated(subset=["State", "District"]).sum())For this sample dataset, you should see:
- no missing required columns
- no missing coordinates
- no duplicate state-district pairs
3. Create A Basic District Map
Plotly can draw points on map tiles with px.scatter_mapbox.
fig = px.scatter_mapbox(
df,
lat="Latitude",
lon="Longitude",
hover_name="District",
hover_data=["State", "Population", "literacy_rate"],
zoom=3.8,
height=650,
mapbox_style="carto-positron",
title="District Sample Map",
)
fig.show()The carto-positron style works without a Mapbox token, which makes it convenient for notebooks and small teaching apps.
4. Encode Population With Marker Size
A dashboard becomes more useful when visual properties carry meaning.
Here, marker size represents population:
fig = px.scatter_mapbox(
df,
lat="Latitude",
lon="Longitude",
size="Population",
size_max=35,
hover_name="District",
hover_data=["State", "Population"],
zoom=3.8,
height=650,
mapbox_style="carto-positron",
title="Population By District",
)
fig.show()Use size_max to prevent the largest districts from covering the whole map.
5. Add Color For A Second Metric
Now encode literacy_rate with color:
fig = px.scatter_mapbox(
df,
lat="Latitude",
lon="Longitude",
size="Population",
color="literacy_rate",
size_max=35,
color_continuous_scale="Viridis",
hover_name="District",
hover_data={
"State": True,
"Population": ":,",
"literacy_rate": ":.1f",
"Latitude": False,
"Longitude": False,
},
zoom=3.8,
height=650,
mapbox_style="carto-positron",
title="Population Size And Literacy Rate Color",
)
fig.show()This creates a two-metric visualization:
- larger bubbles mean larger population
- brighter or darker colors show literacy differences
6. Filter To One State
Dashboards usually need filters.
Filter the data to one state:
state_name = "Maharashtra"
state_df = df[df["State"] == state_name]
fig = px.scatter_mapbox(
state_df,
lat="Latitude",
lon="Longitude",
size="Population",
color="internet_household_pct",
size_max=35,
color_continuous_scale="Plasma",
hover_name="District",
hover_data=["State", "Population", "Households_with_Internet"],
zoom=5.5,
height=650,
mapbox_style="carto-positron",
title=f"Internet Access Sample Metrics In {state_name}",
)
fig.show()The same charting logic works for a national view and a state-level view.
7. Build A Reusable Plot Function
Instead of rewriting the same Plotly call, create a function.
def plot_district_map(data, primary_metric, secondary_metric, title, zoom):
fig = px.scatter_mapbox(
data,
lat="Latitude",
lon="Longitude",
size=primary_metric,
color=secondary_metric,
size_max=35,
color_continuous_scale="Viridis",
hover_name="District",
hover_data={
"State": True,
primary_metric: ":,",
secondary_metric: ":.2f" if "pct" in secondary_metric or "rate" in secondary_metric else ":,",
"Latitude": False,
"Longitude": False,
},
zoom=zoom,
height=700,
mapbox_style="carto-positron",
title=title,
)
fig.update_layout(
margin={"r": 0, "t": 50, "l": 0, "b": 0},
coloraxis_colorbar_title=secondary_metric.replace("_", " ").title(),
)
return figTest it:
fig = plot_district_map(
df,
primary_metric="Population",
secondary_metric="literacy_rate",
title="District Population And Literacy Rate",
zoom=3.8,
)
fig.show()This function is the heart of the dashboard.
8. Choose Metrics Programmatically
Create a list of numeric columns users can select.
protected_columns = {"Latitude", "Longitude"}
numeric_metrics = [
col
for col in df.select_dtypes(include="number").columns
if col not in protected_columns
]
print(numeric_metrics)Example output:
['Population', 'Male_Literate', 'Female_Literate', 'Households_with_Internet', ...]This lets the dashboard remain flexible even if you add more metrics later.
9. Build The Streamlit App
Create a file named app.py:
import pandas as pd
import plotly.express as px
import streamlit as st
@st.cache_data
def load_data():
return pd.read_csv("plotly_india_district_sample.csv")
def plot_district_map(data, primary_metric, secondary_metric, title, zoom):
fig = px.scatter_mapbox(
data,
lat="Latitude",
lon="Longitude",
size=primary_metric,
color=secondary_metric,
size_max=35,
color_continuous_scale="Viridis",
hover_name="District",
hover_data={
"State": True,
primary_metric: ":,",
secondary_metric: ":.2f" if "pct" in secondary_metric or "rate" in secondary_metric else ":,",
"Latitude": False,
"Longitude": False,
},
zoom=zoom,
height=700,
mapbox_style="carto-positron",
title=title,
)
fig.update_layout(margin={"r": 0, "t": 50, "l": 0, "b": 0})
return fig
df = load_data()
st.set_page_config(page_title="India District Metrics Dashboard", layout="wide")
st.title("India District Metrics Dashboard")
st.caption("Original practice dataset with approximate locations and synthetic indicators.")
states = ["Overall India"] + sorted(df["State"].unique())
numeric_metrics = [
col
for col in df.select_dtypes(include="number").columns
if col not in {"Latitude", "Longitude"}
]
selected_state = st.sidebar.selectbox("Select a state", states)
primary_metric = st.sidebar.selectbox("Marker size metric", numeric_metrics, index=numeric_metrics.index("Population"))
secondary_metric = st.sidebar.selectbox("Marker color metric", numeric_metrics, index=numeric_metrics.index("literacy_rate"))
if selected_state == "Overall India":
filtered = df
zoom = 3.8
title = f"All Districts: Size = {primary_metric}, Color = {secondary_metric}"
else:
filtered = df[df["State"] == selected_state]
zoom = 5.2
title = f"{selected_state}: Size = {primary_metric}, Color = {secondary_metric}"
left, right = st.columns([3, 1])
with left:
fig = plot_district_map(filtered, primary_metric, secondary_metric, title, zoom)
st.plotly_chart(fig, use_container_width=True)
with right:
st.subheader("Selected Data")
st.metric("Districts", len(filtered))
st.metric("Total Population", f"{int(filtered['Population'].sum()):,}")
st.metric("Avg Literacy Rate", f"{filtered['literacy_rate'].mean():.1f}%")
st.metric("Avg Internet Household %", f"{filtered['internet_household_pct'].mean():.1f}%")
st.dataframe(
filtered[["State", "District", primary_metric, secondary_metric]].sort_values(primary_metric, ascending=False),
use_container_width=True,
)Run the app:
streamlit run app.py10. How The Dashboard Works
The dashboard has three layers:
- Data loading
- User controls
- Plot rendering
The data loading layer uses @st.cache_data so Streamlit does not reread the CSV on every interaction.
The sidebar controls change:
- which rows are displayed
- which column controls marker size
- which column controls marker color
The Plotly function receives those choices and returns a new figure.
11. Improve The Hover Tooltip
Hover labels are where interactive charts become useful.
Try adding more context:
fig = px.scatter_mapbox(
filtered,
lat="Latitude",
lon="Longitude",
size=primary_metric,
color=secondary_metric,
hover_name="District",
hover_data={
"State": True,
"Population": ":,",
"Households": ":,",
"literacy_rate": ":.1f",
"internet_household_pct": ":.1f",
"urban_household_pct": ":.1f",
"Latitude": False,
"Longitude": False,
},
mapbox_style="carto-positron",
)Hide coordinates unless they are analytically useful. Most users care more about the district name and metrics than raw latitude and longitude.
12. Add A Ranking Chart
A map answers "where".
A bar chart answers "who is highest or lowest".
Add a ranking chart below the map:
top_districts = filtered.nlargest(10, secondary_metric)
bar_fig = px.bar(
top_districts,
x=secondary_metric,
y="District",
color=secondary_metric,
orientation="h",
title=f"Top Districts By {secondary_metric}",
)
bar_fig.update_layout(yaxis={"categoryorder": "total ascending"})
st.plotly_chart(bar_fig, use_container_width=True)This turns the app from a map-only demo into a small analytical dashboard.
13. Common Problems
The map does not display
Check these items:
LatitudeandLongitudeare numeric- the values are not missing
- the map style is token-free, such as
carto-positron
Markers are too large
Lower size_max:
size_max=20Explanation
- Defines a constant variable size_max with value 20 that likely represents an upper limit for data structures or processing operations
- This variable can be used throughout the codebase to maintain consistent sizing constraints without hardcoding the number 20 in multiple locations
- The naming convention suggests this is part of a configuration or parameter setup phase in a program
- Commonly used in scenarios like array bounds checking, buffer size limitations, or iterative process termination conditions
- This approach improves code maintainability by centralizing the maximum size value in one location
The app reruns too often
Streamlit reruns the script when a widget changes. Use @st.cache_data for loading data and keep expensive transformations inside cached functions.
The chart has too many hover fields
Use hover_data to hide columns:
hover_data={"Latitude": False, "Longitude": False}Explanation
- This code snippet defines a dictionary that controls which data fields appear when hovering over elements in Plotly visualizations
- The keys "Latitude" and "Longitude" are set to False, meaning these coordinates won't be displayed during hover interactions
- This approach helps reduce clutter in tooltips by hiding less essential coordinate information while keeping other data visible
- Commonly used in dash applications to create cleaner, more focused user interfaces for geographic data visualization
- The configuration can be passed to Plotly's hover_data parameter to customize interactive tooltip behavior
14. Practice Tasks
Try these improvements:
- Add a dropdown that switches the color scale between
Viridis,Plasma, andTurbo. - Add a checkbox to show only districts above a selected literacy rate.
- Add a slider for minimum population.
- Create a second tab for ranking charts.
- Add a download button for the filtered data.
Example filter:
minimum_population = st.sidebar.slider(
"Minimum population",
min_value=int(df["Population"].min()),
max_value=int(df["Population"].max()),
value=int(df["Population"].min()),
)
filtered = filtered[filtered["Population"] >= minimum_population]Explanation
- Creates an interactive slider widget in Streamlit sidebar to select minimum population threshold
- Sets slider range from the minimum to maximum population values in the dataset
- Initializes slider to the minimum population value by default
- Filters the dataset to include only rows where population meets or exceeds the selected minimum threshold
- Updates the filtered dataframe based on user selection for downstream analysis or visualization
Final Takeaway
This project teaches a practical visualization workflow:
- clean and validate tabular data
- map numeric columns to visual encodings
- use hover labels for context
- wrap Plotly logic in reusable functions
- expose filters through Streamlit controls
That pattern is reusable for many real dashboards: public datasets, operations monitoring, sales territories, service locations, logistics, and education analytics.
