AI Summaries That Don't Hallucinate
Every tutorial post on this blog has an "AI Summary" section at the top — a concise, bullet-point extraction of the key takeaways. The challenge: make it factual, not generative.
Here's how the AISummary component works, and how it avoids the hallucination problem that plagues most AI-generated content.
The Core Constraint: Extract, Don't Generate
The system prompt enforces a strict rule:
You are a factual extraction assistant. Your only job is to extract
verbatim sentences from the provided text that represent key technical
takeaways. Do not rewrite, rephrase, or summarize. Do not add information
not present in the text. If the text contains code examples, extract the
comment or description above the code block, not the code itself.
Output format:
- One bullet point per takeaway
- Each bullet must be a direct quote or near-direct quote from the text
- Maximum 5 takeaways
- If the text is under 200 words, return an empty responseThe key phrase: "direct quote or near-direct quote." This changes the LLM's behavior from "write a summary" to "select relevant sentences." The difference is measurable — verbatim extraction has near-zero hallucination rate, while generative summarization hallucinates in ~15% of outputs even with strong prompting.
Content Preprocessing
Before the content reaches the LLM, it goes through three preparation steps:
1. Strip Boilerplate
def clean_for_summary(content: str) -> str:
content = re.sub(r'```
**Explanation**
- The function `clean_for_summary` takes a string input `content` and returns a cleaned version of it.
- It utilizes regular expressions to remove unwanted characters or patterns from the text.
- The cleaning process is essential for preparing content for further processing, such as summarization or analysis.
- The function ensures that the output is more readable and free of extraneous elements that could hinder understanding.
[\s\S]*?```', '[code block]', content)
content = re.sub(r'!\[.*?\]\(.*?\)', '', content)
content = re.sub(r'\n{3,}', '\n\n', content)
return content.strip()Code blocks are replaced with [code block] markers to reduce token count and prevent the LLM from trying to summarize code. Images are stripped entirely.
2. Truncate to 4K Tokens
MAX_SUMMARY_TOKENS = 4000
def truncate_for_summary(content: str) -> str:
tokens = estimate_tokens(content)
if tokens <= MAX_SUMMARY_TOKENS:
return content
# Take from the beginning (introduction + core concepts)
head = truncate_to_tokens(content, MAX_SUMMARY_TOKENS // 2)
# Take from the end (conclusion + key takeaways)
tail = truncate_to_tokens(content, MAX_SUMMARY_TOKENS // 2, from_end=True)
return head + "\n\n...\n\n" + tailExplanation
- Defines a constant
MAX_SUMMARY_TOKENSto set the maximum allowable tokens for a summary. - The function
truncate_for_summarytakes a stringcontentand estimates its token count. - If the token count is within the limit, it returns the original content unchanged.
- If the content exceeds the limit, it truncates the beginning and end of the content to include key sections.
- The final output combines the truncated head and tail with a separator for clarity.
Taking the head and tail preserves the introduction (which states what the post covers) and the conclusion (which summarizes key points). The middle — usually detailed examples — is truncated.
3. Post-Type Detection
Some posts don't need summaries:
def should_generate_summary(post) -> bool:
if post.is_series_intro:
return False
if len(post.content) < 500:
return False
if post.difficulty == "beginner" and len(post.content) < 1000:
return False
return TrueExplanation
- The function
should_generate_summarytakes apostobject as an argument and returns a boolean value. - It first checks if the post is an introduction to a series; if so, it returns
False, indicating no summary is needed. - Next, it verifies the length of the post's content; if it is less than 500 characters, it also returns
False. - For posts marked as "beginner" difficulty, it requires a minimum content length of 1000 characters to return
True. - If none of these conditions are met, the function returns
True, indicating that a summary can be generated.
Series intro posts (like this one) are announcements, not tutorials. Short beginner posts are simple enough that the excerpt suffices. Skipping these saves API costs and avoids irrelevant summaries.
The Generation Flow
async def generate_summary(post_id: str) -> str | None:
cached = await redis.get(f"summary:{post_id}")
if cached:
return cached
post = await get_post(post_id)
if not should_generate_summary(post):
return None
cleaned = clean_for_summary(post.content)
truncated = truncate_for_summary(cleaned)
prompt = build_summary_prompt(truncated, post.title)
response = await call_llm(prompt, max_tokens=300, temperature=0.1)
summary = parse_response(response)
if summary:
summary_md = format_as_markdown(summary)
await redis.setex(f"summary:{post_id}", 86400, summary_md)
return summary_md
return NoneExplanation
- The function
generate_summaryretrieves a cached summary from Redis using the post ID; if found, it returns the cached summary immediately. - If no cached summary exists, it fetches the post data and checks if a summary should be generated based on the post's content.
- The post content is cleaned and truncated to prepare it for summary generation, followed by creating a prompt for the language model.
- The function then calls the language model with the prompt and processes the response to extract the summary.
- If a summary is successfully generated, it is formatted as Markdown, cached in Redis for 24 hours, and returned; otherwise, it returns None.
Key parameters:
temperature=0.1— near-deterministic, reduces creative variationmax_tokens=300— summaries are short, no need for morecache=86400— 24-hour cache, summaries change only when the post content changes
Frontend Rendering
The AISummary component is a client component that fetches the summary separately from the post:
const AISummary = dynamic(() => import("@/components/blog/AISummary"), { ssr: true });It's dynamically imported with ssr: true — the summary is generated at request time and included in the initial HTML. No client-side loading spinner.
The component renders three view modes:
function AISummary({ postSlug }: { postSlug: string }) {
const { data, isLoading } = useSummary(postSlug);
if (isLoading) return <SummarySkeleton />;
if (!data) return null;
return (
<div className="card overflow-hidden">
<div className="px-6 py-4 border-b border-white/[0.06]">
<SummaryModeSelector mode={mode} onChange={setMode} />
</div>
<div className="p-6">
{mode === "takeaways" && <TakeawaysView data={data} />}
{mode === "eli5" && <ELI5View data={data} />}
{mode === "executive" && <ExecutiveView data={data} />}
</div>
<div className="px-6 py-3 bg-white/[0.01] border-t border-white/[0.06]">
<VerifiedBadge/>
</div>
</div>
);
}The three modes:
- Key Takeaways — bullet-point extraction (default)
- Explain Like I'm 5 — simplified explanation, generated separately with a different prompt
- Executive Summary — two-paragraph overview
Each mode uses a different prompt, but all follow the "extract, don't generate" rule.
The Verified Context Badge
The green dot at the bottom of the summary card isn't decorative. It indicates that the summary passed the verification check:
def verify_summary(summary: str, original: str) -> bool:
claims = extract_claims(summary)
for claim in claims:
if not any(claim.lower() in original.lower() for original in original.split(". ")):
return False
return TrueExplanation
- The function
verify_summarytakes two string arguments:summaryandoriginal. - It uses a helper function
extract_claimsto extract claims from thesummary. - For each claim, it checks if the claim (in lowercase) is found in any of the sentences of the
originaltext (also converted to lowercase). - If any claim is not found, the function returns
False; otherwise, it returnsTrueafter checking all claims. - This is useful for verifying the accuracy of summaries against original content.
Each claim (extracted sentence) must be found in the original text. If any claim isn't found, the badge shows orange instead of green, and a warning is appended to the summary: "Some claims could not be verified against the source."
Error Handling
If the LLM call fails (API error, timeout, rate limit), the component silently degrades:
async def get_summary_safe(post_id: str) -> str | None:
try:
return await generate_summary(post_id)
except Exception as e:
logger.warning(f"Summary generation failed for {post_id}: {e}")
return NoneExplanation
- Defines an asynchronous function
get_summary_safethat takes a post ID as a string argument. - Attempts to call the
generate_summaryfunction to retrieve a summary for the specified post ID. - If an exception occurs during the summary generation, it logs a warning message including the post ID and the error details.
- Returns
Noneif the summary generation fails, ensuring the function handles errors gracefully. - Utilizes Python's type hinting to indicate that the return type can be either a string or
None.
The frontend checks for null and renders nothing — no error message, no broken card, no user-facing failure. The post works perfectly without the summary.
Cost
At ~4000 tokens per summary and $0.15/1M tokens, each summary costs about $0.0006. With ~20 posts and regenerating only when content changes, the monthly cost is negligible — less than a cent.
The caching layer (24-hour TTL) ensures the same summary isn't regenerated on every page load. In practice, each summary is generated once and served from cache for 24 hours.
What's Next
The next post covers the GEO (Generative Engine Optimization) stack — how llms.txt, ai-profile.json, structured data, speakable markup, and articleBody work together to make your content accessible to AI crawlers.
Built with FastAPI, multi-provider AI (GPT-4o-mini, Gemini 1.5 Flash, Llama 3.2), Redis caching, and zero third-party CMS.

