# How AI Summaries Avoid Hallucination in Content Extraction
URL: https://madhudadi.in/blog/posts/ai-summaries-avoiding-hallucination-in-content-extraction
Published: 2026-05-25
Tags: AI, RAG, FastAPI, Production
Read time: 14 min
Difficulty: advanced
> How the AISummary component works — content extraction, prompt engineering for factual extraction, truncation strategy, caching, post-type detection, and when to skip generation.# AI Summaries That Don't Hallucinate

Every tutorial post on this blog has an "AI Summary" section at the top — a concise, bullet-point extraction of the key takeaways. The challenge: make it factual, not generative.

Here's how the AISummary component works, and how it avoids the hallucination problem that plagues most AI-generated content.

---

## The Core Constraint: Extract, Don't Generate

The system prompt enforces a strict rule:

```
You are a factual extraction assistant. Your only job is to extract
verbatim sentences from the provided text that represent key technical
takeaways. Do not rewrite, rephrase, or summarize. Do not add information
not present in the text. If the text contains code examples, extract the
comment or description above the code block, not the code itself.

Output format:
- One bullet point per takeaway
- Each bullet must be a direct quote or near-direct quote from the text
- Maximum 5 takeaways
- If the text is under 200 words, return an empty response
```

The key phrase: "direct quote or near-direct quote." This changes the LLM's behavior from "write a summary" to "select relevant sentences." The difference is measurable — verbatim extraction has near-zero hallucination rate, while generative summarization hallucinates in ~15% of outputs even with strong prompting.

---

## Content Preprocessing

Before the content reaches the LLM, it goes through three preparation steps:

### 1. Strip Boilerplate

```python
def clean_for_summary(content: str) -> str:
    content = re.sub(r'```
**Explanation**

- The function `clean_for_summary` takes a string input `content` and returns a cleaned version of it.  
- It utilizes regular expressions to remove unwanted characters or patterns from the text.  
- The cleaning process is essential for preparing content for further processing, such as summarization or analysis.  
- The function ensures that the output is more readable and free of extraneous elements that could hinder understanding.
[\s\S]*?```', '[code block]', content)
    content = re.sub(r'!\[.*?\]\(.*?\)', '', content)
    content = re.sub(r'\n{3,}', '\n\n', content)
    return content.strip()
```

Code blocks are replaced with `[code block]` markers to reduce token count and prevent the LLM from trying to summarize code. Images are stripped entirely.

### 2. Truncate to 4K Tokens

```python
MAX_SUMMARY_TOKENS = 4000

def truncate_for_summary(content: str) -> str:
    tokens = estimate_tokens(content)
    if tokens <= MAX_SUMMARY_TOKENS:
        return content

    # Take from the beginning (introduction + core concepts)
    head = truncate_to_tokens(content, MAX_SUMMARY_TOKENS // 2)
    # Take from the end (conclusion + key takeaways)
    tail = truncate_to_tokens(content, MAX_SUMMARY_TOKENS // 2, from_end=True)

    return head + "\n\n...\n\n" + tail
```
**Explanation**

- Defines a constant `MAX_SUMMARY_TOKENS` to set the maximum allowable tokens for a summary.  
- The function `truncate_for_summary` takes a string `content` and estimates its token count.  
- If the token count is within the limit, it returns the original content unchanged.  
- If the content exceeds the limit, it truncates the beginning and end of the content to include key sections.  
- The final output combines the truncated head and tail with a separator for clarity.


Taking the head and tail preserves the introduction (which states what the post covers) and the conclusion (which summarizes key points). The middle — usually detailed examples — is truncated.

### 3. Post-Type Detection

Some posts don't need summaries:

```python
def should_generate_summary(post) -> bool:
    if post.is_series_intro:
        return False
    if len(post.content) < 500:
        return False
    if post.difficulty == "beginner" and len(post.content) < 1000:
        return False
    return True
```
**Explanation**

- The function `should_generate_summary` takes a `post` object as an argument and returns a boolean value.  
- It first checks if the post is an introduction to a series; if so, it returns `False`, indicating no summary is needed.  
- Next, it verifies the length of the post's content; if it is less than 500 characters, it also returns `False`.  
- For posts marked as "beginner" difficulty, it requires a minimum content length of 1000 characters to return `True`.  
- If none of these conditions are met, the function returns `True`, indicating that a summary can be generated.


Series intro posts (like this one) are announcements, not tutorials. Short beginner posts are simple enough that the excerpt suffices. Skipping these saves API costs and avoids irrelevant summaries.

---

## The Generation Flow

```python
async def generate_summary(post_id: str) -> str | None:
    cached = await redis.get(f"summary:{post_id}")
    if cached:
        return cached

    post = await get_post(post_id)
    if not should_generate_summary(post):
        return None

    cleaned = clean_for_summary(post.content)
    truncated = truncate_for_summary(cleaned)
    prompt = build_summary_prompt(truncated, post.title)

    response = await call_llm(prompt, max_tokens=300, temperature=0.1)
    summary = parse_response(response)

    if summary:
        summary_md = format_as_markdown(summary)
        await redis.setex(f"summary:{post_id}", 86400, summary_md)
        return summary_md

    return None
```
**Explanation**

- The function `generate_summary` retrieves a cached summary from Redis using the post ID; if found, it returns the cached summary immediately.
- If no cached summary exists, it fetches the post data and checks if a summary should be generated based on the post's content.
- The post content is cleaned and truncated to prepare it for summary generation, followed by creating a prompt for the language model.
- The function then calls the language model with the prompt and processes the response to extract the summary.
- If a summary is successfully generated, it is formatted as Markdown, cached in Redis for 24 hours, and returned; otherwise, it returns None.


Key parameters:
- `temperature=0.1` — near-deterministic, reduces creative variation
- `max_tokens=300` — summaries are short, no need for more
- `cache=86400` — 24-hour cache, summaries change only when the post content changes

---

## Frontend Rendering

The AISummary component is a client component that fetches the summary separately from the post:

```tsx
const AISummary = dynamic(() => import("@/components/blog/AISummary"), { ssr: true });
```

It's dynamically imported with `ssr: true` — the summary is generated at request time and included in the initial HTML. No client-side loading spinner.

The component renders three view modes:

```tsx
function AISummary({ postSlug }: { postSlug: string }) {
    const { data, isLoading } = useSummary(postSlug);

    if (isLoading) return <SummarySkeleton />;
    if (!data) return null;

    return (
        <div className="card overflow-hidden">
            <div className="px-6 py-4 border-b border-white/[0.06]">
                <SummaryModeSelector mode={mode} onChange={setMode} />
            </div>
            <div className="p-6">
                {mode === "takeaways" && <TakeawaysView data={data} />}
                {mode === "eli5" && <ELI5View data={data} />}
                {mode === "executive" && <ExecutiveView data={data} />}
            </div>
            <div className="px-6 py-3 bg-white/[0.01] border-t border-white/[0.06]">
                <VerifiedBadge/>
            </div>
        </div>
    );
}
```

The three modes:
- **Key Takeaways** — bullet-point extraction (default)
- **Explain Like I'm 5** — simplified explanation, generated separately with a different prompt
- **Executive Summary** — two-paragraph overview

Each mode uses a different prompt, but all follow the "extract, don't generate" rule.

---

## The Verified Context Badge

The green dot at the bottom of the summary card isn't decorative. It indicates that the summary passed the verification check:

```python
def verify_summary(summary: str, original: str) -> bool:
    claims = extract_claims(summary)
    for claim in claims:
        if not any(claim.lower() in original.lower() for original in original.split(". ")):
            return False
    return True
```
**Explanation**

- The function `verify_summary` takes two string arguments: `summary` and `original`.  
- It uses a helper function `extract_claims` to extract claims from the `summary`.  
- For each claim, it checks if the claim (in lowercase) is found in any of the sentences of the `original` text (also converted to lowercase).  
- If any claim is not found, the function returns `False`; otherwise, it returns `True` after checking all claims.  
- This is useful for verifying the accuracy of summaries against original content.


Each claim (extracted sentence) must be found in the original text. If any claim isn't found, the badge shows orange instead of green, and a warning is appended to the summary: "Some claims could not be verified against the source."

---

## Error Handling

If the LLM call fails (API error, timeout, rate limit), the component silently degrades:

```python
async def get_summary_safe(post_id: str) -> str | None:
    try:
        return await generate_summary(post_id)
    except Exception as e:
        logger.warning(f"Summary generation failed for {post_id}: {e}")
        return None
```
**Explanation**

- Defines an asynchronous function `get_summary_safe` that takes a post ID as a string argument.  
- Attempts to call the `generate_summary` function to retrieve a summary for the specified post ID.  
- If an exception occurs during the summary generation, it logs a warning message including the post ID and the error details.  
- Returns `None` if the summary generation fails, ensuring the function handles errors gracefully.  
- Utilizes Python's type hinting to indicate that the return type can be either a string or `None`.


The frontend checks for null and renders nothing — no error message, no broken card, no user-facing failure. The post works perfectly without the summary.

---

## Cost

At ~4000 tokens per summary and $0.15/1M tokens, each summary costs about $0.0006. With ~20 posts and regenerating only when content changes, the monthly cost is negligible — less than a cent.

The caching layer (24-hour TTL) ensures the same summary isn't regenerated on every page load. In practice, each summary is generated once and served from cache for 24 hours.

---

## What's Next

The next post covers the GEO (Generative Engine Optimization) stack — how llms.txt, ai-profile.json, structured data, speakable markup, and articleBody work together to make your content accessible to AI crawlers.

---

*Built with FastAPI, multi-provider AI (GPT-4o-mini, Gemini 1.5 Flash, Llama 3.2), Redis caching, and zero third-party CMS.*