How does the AI Summary component avoid hallucination?

The AI Summary component avoids hallucination by enforcing a strict rule to extract verbatim sentences from the provided text, rather than generating or rephrasing content.

What is the key phrase used to change the LLM's behavior?

The key phrase is 'direct quote or near-direct quote,' which changes the LLM's behavior from 'write a summary' to 'select relevant sentences.'

What preprocessing steps are taken before content reaches the LLM?

Before reaching the LLM, the content is stripped of boilerplate, code blocks are replaced with markers, and images are removed to reduce token count and prevent code summarization.

What happens if the text is under 200 words?

If the text is under 200 words, the AI Summary component returns an empty response.

AI Summaries: Avoiding Hallucination in | Madhu Dadi

AI Summaries That Don't Hallucinate

Every tutorial post on this blog has an "AI Summary" section at the top — a concise, bullet-point extraction of the key takeaways. The challenge: make it factual, not generative.

Here's how the AISummary component works, and how it avoids the hallucination problem that plagues most AI-generated content.

The Core Constraint: Extract, Don't Generate

The system prompt enforces a strict rule:

text

You are a factual extraction assistant. Your only job is to extract
verbatim sentences from the provided text that represent key technical
takeaways. Do not rewrite, rephrase, or summarize. Do not add information
not present in the text. If the text contains code examples, extract the
comment or description above the code block, not the code itself.

Output format:
- One bullet point per takeaway
- Each bullet must be a direct quote or near-direct quote from the text
- Maximum 5 takeaways
- If the text is under 200 words, return an empty response

The key phrase: "direct quote or near-direct quote." This changes the LLM's behavior from "write a summary" to "select relevant sentences." The difference is measurable — verbatim extraction has near-zero hallucination rate, while generative summarization hallucinates in ~15% of outputs even with strong prompting.

Content Preprocessing

Before the content reaches the LLM, it goes through three preparation steps:

1. Strip Boilerplate

python

def clean_for_summary(content: str) -> str:
    content = re.sub(r'```
**Explanation**

- The function `clean_for_summary` takes a string input `content` and returns a cleaned version of it.  
- It utilizes regular expressions to remove unwanted characters or patterns from the text.  
- The cleaning process is essential for preparing content for further processing, such as summarization or analysis.  
- The function ensures that the output is more readable and free of extraneous elements that could hinder understanding.
[\s\S]*?```', '[code block]', content)
    content = re.sub(r'!\[.*?\]\(.*?\)', '', content)
    content = re.sub(r'\n{3,}', '\n\n', content)
    return content.strip()

Code blocks are replaced with [code block] markers to reduce token count and prevent the LLM from trying to summarize code. Images are stripped entirely.

2. Truncate to 4K Tokens

python

MAX_SUMMARY_TOKENS = 4000

def truncate_for_summary(content: str) -> str:
    tokens = estimate_tokens(content)
    if tokens <= MAX_SUMMARY_TOKENS:
        return content

    # Take from the beginning (introduction + core concepts)
    head = truncate_to_tokens(content, MAX_SUMMARY_TOKENS // 2)
    # Take from the end (conclusion + key takeaways)
    tail = truncate_to_tokens(content, MAX_SUMMARY_TOKENS // 2, from_end=True)

    return head + "\n\n...\n\n" + tail

Explanation

Defines a constant MAX_SUMMARY_TOKENS to set the maximum allowable tokens for a summary.
The function truncate_for_summary takes a string content and estimates its token count.
If the token count is within the limit, it returns the original content unchanged.
If the content exceeds the limit, it truncates the beginning and end of the content to include key sections.
The final output combines the truncated head and tail with a separator for clarity.

Taking the head and tail preserves the introduction (which states what the post covers) and the conclusion (which summarizes key points). The middle — usually detailed examples — is truncated.

3. Post-Type Detection

Some posts don't need summaries:

python

def should_generate_summary(post) -> bool:
    if post.is_series_intro:
        return False
    if len(post.content) < 500:
        return False
    if post.difficulty == "beginner" and len(post.content) < 1000:
        return False
    return True

Explanation

The function should_generate_summary takes a post object as an argument and returns a boolean value.
It first checks if the post is an introduction to a series; if so, it returns False, indicating no summary is needed.
Next, it verifies the length of the post's content; if it is less than 500 characters, it also returns False.
For posts marked as "beginner" difficulty, it requires a minimum content length of 1000 characters to return True.
If none of these conditions are met, the function returns True, indicating that a summary can be generated.

Series intro posts (like this one) are announcements, not tutorials. Short beginner posts are simple enough that the excerpt suffices. Skipping these saves API costs and avoids irrelevant summaries.

The Generation Flow

python

async def generate_summary(post_id: str) -> str | None:
    cached = await redis.get(f"summary:{post_id}")
    if cached:
        return cached

    post = await get_post(post_id)
    if not should_generate_summary(post):
        return None

    cleaned = clean_for_summary(post.content)
    truncated = truncate_for_summary(cleaned)
    prompt = build_summary_prompt(truncated, post.title)

    response = await call_llm(prompt, max_tokens=300, temperature=0.1)
    summary = parse_response(response)

    if summary:
        summary_md = format_as_markdown(summary)
        await redis.setex(f"summary:{post_id}", 86400, summary_md)
        return summary_md

    return None

Explanation

The function generate_summary retrieves a cached summary from Redis using the post ID; if found, it returns the cached summary immediately.
If no cached summary exists, it fetches the post data and checks if a summary should be generated based on the post's content.
The post content is cleaned and truncated to prepare it for summary generation, followed by creating a prompt for the language model.
The function then calls the language model with the prompt and processes the response to extract the summary.
If a summary is successfully generated, it is formatted as Markdown, cached in Redis for 24 hours, and returned; otherwise, it returns None.

Key parameters:

temperature=0.1 — near-deterministic, reduces creative variation
max_tokens=300 — summaries are short, no need for more
cache=86400 — 24-hour cache, summaries change only when the post content changes

Frontend Rendering

The AISummary component is a client component that fetches the summary separately from the post:

tsx

const AISummary = dynamic(() => import("@/components/blog/AISummary"), { ssr: true });

It's dynamically imported with ssr: true — the summary is generated at request time and included in the initial HTML. No client-side loading spinner.

The component renders three view modes:

tsx

function AISummary({ postSlug }: { postSlug: string }) {
    const { data, isLoading } = useSummary(postSlug);

    if (isLoading) return <SummarySkeleton />;
    if (!data) return null;

    return (
        <div className="card overflow-hidden">
            <div className="px-6 py-4 border-b border-white/[0.06]">
                <SummaryModeSelector mode={mode} onChange={setMode} />
            </div>
            <div className="p-6">
                {mode === "takeaways" && <TakeawaysView data={data} />}
                {mode === "eli5" && <ELI5View data={data} />}
                {mode === "executive" && <ExecutiveView data={data} />}
            </div>
            <div className="px-6 py-3 bg-white/[0.01] border-t border-white/[0.06]">
                <VerifiedBadge/>
            </div>
        </div>
    );
}

The three modes:

Key Takeaways — bullet-point extraction (default)
Explain Like I'm 5 — simplified explanation, generated separately with a different prompt
Executive Summary — two-paragraph overview

Each mode uses a different prompt, but all follow the "extract, don't generate" rule.

The Verified Context Badge

The green dot at the bottom of the summary card isn't decorative. It indicates that the summary passed the verification check:

python

def verify_summary(summary: str, original: str) -> bool:
    claims = extract_claims(summary)
    for claim in claims:
        if not any(claim.lower() in original.lower() for original in original.split(". ")):
            return False
    return True

Explanation

The function verify_summary takes two string arguments: summary and original.
It uses a helper function extract_claims to extract claims from the summary.
For each claim, it checks if the claim (in lowercase) is found in any of the sentences of the original text (also converted to lowercase).
If any claim is not found, the function returns False; otherwise, it returns True after checking all claims.
This is useful for verifying the accuracy of summaries against original content.

Each claim (extracted sentence) must be found in the original text. If any claim isn't found, the badge shows orange instead of green, and a warning is appended to the summary: "Some claims could not be verified against the source."

Error Handling

If the LLM call fails (API error, timeout, rate limit), the component silently degrades:

python

async def get_summary_safe(post_id: str) -> str | None:
    try:
        return await generate_summary(post_id)
    except Exception as e:
        logger.warning(f"Summary generation failed for {post_id}: {e}")
        return None

Explanation

Defines an asynchronous function get_summary_safe that takes a post ID as a string argument.
Attempts to call the generate_summary function to retrieve a summary for the specified post ID.
If an exception occurs during the summary generation, it logs a warning message including the post ID and the error details.
Returns None if the summary generation fails, ensuring the function handles errors gracefully.
Utilizes Python's type hinting to indicate that the return type can be either a string or None.

The frontend checks for null and renders nothing — no error message, no broken card, no user-facing failure. The post works perfectly without the summary.

Cost

At ~4000 tokens per summary and $0.15/1M tokens, each summary costs about $0.0006. With ~20 posts and regenerating only when content changes, the monthly cost is negligible — less than a cent.

The caching layer (24-hour TTL) ensures the same summary isn't regenerated on every page load. In practice, each summary is generated once and served from cache for 24 hours.

What's Next

The next post covers the GEO (Generative Engine Optimization) stack — how llms.txt, ai-profile.json, structured data, speakable markup, and articleBody work together to make your content accessible to AI crawlers.

Built with FastAPI, multi-provider AI (GPT-4o-mini, Gemini 1.5 Flash, Llama 3.2), Redis caching, and zero third-party CMS.

AI Summaries: Avoiding Hallucination in Content Extraction

AI Insights

AI Summaries That Don't Hallucinate

The Core Constraint: Extract, Don't Generate

Content Preprocessing

1. Strip Boilerplate

2. Truncate to 4K Tokens

3. Post-Type Detection

The Generation Flow

Frontend Rendering

The Verified Context Badge

Error Handling

Cost

What's Next