# How AI Summaries Avoid Hallucination in Content Extraction
URL: https://madhudadi.in/blog/posts/ai-summaries-avoiding-hallucination-in-content-extraction
Published: 2026-05-25
Tags: AI, RAG, FastAPI, Production
Read time: 14 min
Difficulty: advanced
> How the AISummary component works — content extraction, prompt engineering for factual extraction, truncation strategy, caching, post-type detection, and when to skip generation.# AI Summaries That Don't Hallucinate
Every tutorial post on this blog has an "AI Summary" section at the top — a concise, bullet-point extraction of the key takeaways. The challenge: make it factual, not generative.
Here's how the AISummary component works, and how it avoids the hallucination problem that plagues most AI-generated content.
---
## The Core Constraint: Extract, Don't Generate
The system prompt enforces a strict rule:
```
You are a factual extraction assistant. Your only job is to extract
verbatim sentences from the provided text that represent key technical
takeaways. Do not rewrite, rephrase, or summarize. Do not add information
not present in the text. If the text contains code examples, extract the
comment or description above the code block, not the code itself.
Output format:
- One bullet point per takeaway
- Each bullet must be a direct quote or near-direct quote from the text
- Maximum 5 takeaways
- If the text is under 200 words, return an empty response
```
The key phrase: "direct quote or near-direct quote." This changes the LLM's behavior from "write a summary" to "select relevant sentences." The difference is measurable — verbatim extraction has near-zero hallucination rate, while generative summarization hallucinates in ~15% of outputs even with strong prompting.
---
## Content Preprocessing
Before the content reaches the LLM, it goes through three preparation steps:
### 1. Strip Boilerplate
```python
def clean_for_summary(content: str) -> str:
content = re.sub(r'```
**Explanation**
- The function `clean_for_summary` takes a string input `content` and returns a cleaned version of it.
- It utilizes regular expressions to remove unwanted characters or patterns from the text.
- The cleaning process is essential for preparing content for further processing, such as summarization or analysis.
- The function ensures that the output is more readable and free of extraneous elements that could hinder understanding.
[\s\S]*?```', '[code block]', content)
content = re.sub(r'!\[.*?\]\(.*?\)', '', content)
content = re.sub(r'\n{3,}', '\n\n', content)
return content.strip()
```
Code blocks are replaced with `[code block]` markers to reduce token count and prevent the LLM from trying to summarize code. Images are stripped entirely.
### 2. Truncate to 4K Tokens
```python
MAX_SUMMARY_TOKENS = 4000
def truncate_for_summary(content: str) -> str:
tokens = estimate_tokens(content)
if tokens <= MAX_SUMMARY_TOKENS:
return content
# Take from the beginning (introduction + core concepts)
head = truncate_to_tokens(content, MAX_SUMMARY_TOKENS // 2)
# Take from the end (conclusion + key takeaways)
tail = truncate_to_tokens(content, MAX_SUMMARY_TOKENS // 2, from_end=True)
return head + "\n\n...\n\n" + tail
```
**Explanation**
- Defines a constant `MAX_SUMMARY_TOKENS` to set the maximum allowable tokens for a summary.
- The function `truncate_for_summary` takes a string `content` and estimates its token count.
- If the token count is within the limit, it returns the original content unchanged.
- If the content exceeds the limit, it truncates the beginning and end of the content to include key sections.
- The final output combines the truncated head and tail with a separator for clarity.
Taking the head and tail preserves the introduction (which states what the post covers) and the conclusion (which summarizes key points). The middle — usually detailed examples — is truncated.
### 3. Post-Type Detection
Some posts don't need summaries:
```python
def should_generate_summary(post) -> bool:
if post.is_series_intro:
return False
if len(post.content) < 500:
return False
if post.difficulty == "beginner" and len(post.content) < 1000:
return False
return True
```
**Explanation**
- The function `should_generate_summary` takes a `post` object as an argument and returns a boolean value.
- It first checks if the post is an introduction to a series; if so, it returns `False`, indicating no summary is needed.
- Next, it verifies the length of the post's content; if it is less than 500 characters, it also returns `False`.
- For posts marked as "beginner" difficulty, it requires a minimum content length of 1000 characters to return `True`.
- If none of these conditions are met, the function returns `True`, indicating that a summary can be generated.
Series intro posts (like this one) are announcements, not tutorials. Short beginner posts are simple enough that the excerpt suffices. Skipping these saves API costs and avoids irrelevant summaries.
---
## The Generation Flow
```python
async def generate_summary(post_id: str) -> str | None:
cached = await redis.get(f"summary:{post_id}")
if cached:
return cached
post = await get_post(post_id)
if not should_generate_summary(post):
return None
cleaned = clean_for_summary(post.content)
truncated = truncate_for_summary(cleaned)
prompt = build_summary_prompt(truncated, post.title)
response = await call_llm(prompt, max_tokens=300, temperature=0.1)
summary = parse_response(response)
if summary:
summary_md = format_as_markdown(summary)
await redis.setex(f"summary:{post_id}", 86400, summary_md)
return summary_md
return None
```
**Explanation**
- The function `generate_summary` retrieves a cached summary from Redis using the post ID; if found, it returns the cached summary immediately.
- If no cached summary exists, it fetches the post data and checks if a summary should be generated based on the post's content.
- The post content is cleaned and truncated to prepare it for summary generation, followed by creating a prompt for the language model.
- The function then calls the language model with the prompt and processes the response to extract the summary.
- If a summary is successfully generated, it is formatted as Markdown, cached in Redis for 24 hours, and returned; otherwise, it returns None.
Key parameters:
- `temperature=0.1` — near-deterministic, reduces creative variation
- `max_tokens=300` — summaries are short, no need for more
- `cache=86400` — 24-hour cache, summaries change only when the post content changes
---
## Frontend Rendering
The AISummary component is a client component that fetches the summary separately from the post:
```tsx
const AISummary = dynamic(() => import("@/components/blog/AISummary"), { ssr: true });
```
It's dynamically imported with `ssr: true` — the summary is generated at request time and included in the initial HTML. No client-side loading spinner.
The component renders three view modes:
```tsx
function AISummary({ postSlug }: { postSlug: string }) {
const { data, isLoading } = useSummary(postSlug);
if (isLoading) return