Building a Self-Correcting Fact-Checker with DSPy

LLMs can sound confident even when they’re wrong. Retrieval‑Augmented Generation (RAG) helps, but it doesn’t guarantee the model will stick to the retrieved evidence. In this post, we’ll build a self‑correcting fact‑checker in DSPy that:

retrieves evidence from Wikipedia,
answers only from that evidence, and
verifies the answer.

If verification flags unsupported claims, the system automatically retries with feedback until the answer is fully supported, using dspy.Refine. The entire code sample is on GitHub.

Why DSPy for self‑correction?

DSPy lets you compose LM programs from small modules (e.g., ChainOfThought), then add search, verification, and retry as code - not prompt hacks. We configure OpenAI’s gpt-4o-mini in one line and use Refine to drive iterative improvement until a reward function says we’re done. (DSPy)

What we’ll build

Retriever: WikipediaRetriever queries the Wikipedia API and returns passages.

Note: DSPy’s Retrieve expects the retriever to return items with a .long_text attribute (we use dotdict). (GitHub)
Generator: GenerateAnswer produces a concise answer only from the given context.
Verifier: VerifyAnswer lists any claims not supported by the context.
Refiner: dspy.Refine wraps the generator and automatically retries with feedback until the verifier returns “None” (no unsupported claims).

Setup

pip install dspy-ai wikipedia
# Set your key:
# Windows (persist): setx OPENAI_API_KEY "sk-..."
# PowerShell (session): $env:OPENAI_API_KEY="sk-..."
# macOS/Linux: export OPENAI_API_KEY="sk-..."

The full script

See fact_check_rag.py in the GitHub repository. Key parts:

1) Configure the LM and retriever

lm = dspy.LM("openai/gpt-4o-mini", api_key=os.environ.get("OPENAI_API_KEY"))
dspy.configure(lm=lm)  # sets the default LM globally
wiki_rm = WikipediaRetriever(max_chars_per_passage=1500, language="en")
dspy.settings.configure(rm=wiki_rm)

2) Retrieval → Generation → Verification

self.retrieve = dspy.Retrieve(k=4) calls our retriever and returns the passages.
self.generate_answer = dspy.ChainOfThought(GenerateAnswer) writes the answer.
self.verify_answer = dspy.ChainOfThought(VerifyAnswer) checks support.

3) Self‑correction with `Refine`

We define a small reward function that calls the verifier. If it returns “None”, the reward is 1.0 and Refine stops; otherwise, Refine injects feedback and retries up to N times:

self.refine_generate = dspy.Refine(
    module=self.generate_answer,
    N=max_attempts,
    reward_fn=reward_fn,
    threshold=1.0,
)

Running it

Try:

“When did Apollo 11 land on the Moon, and who were the astronauts involved?”

You’ll see:

Final Answer - concise and grounded in the retrieved Wikipedia passages.
Unsupported Claims - should print None if verification passes.
Context used - the exact passages (with titles + URLs) that grounded the answer.

Customize

Swap the question for any topic Wikipedia covers well.
Adjust k_passages (retrieval breadth) and max_attempts (strictness).
Adapt the verifier’s wording if you want stricter or looser checks.

Takeaways

Separate concerns: retrieval (evidence), generation (answer), verification (checks).
Make it self‑correcting: wrap generation with Refine and a simple reward.
Mind the interface: custom retrievers should return objects with .long_text for smooth integration with dspy.Retrieve. (GitHub)