Formal Verification of Chain-of-Thought Reasoning
· 8 min read
Chain-of-Thought (CoT) prompting dramatically improves LLM reasoning. But how do we know each step in the chain is valid? This post explores formal verification approaches to CoT.
Research papers and findings
View All TagsChain-of-Thought (CoT) prompting dramatically improves LLM reasoning. But how do we know each step in the chain is valid? This post explores formal verification approaches to CoT.
The AI industry's response to hallucinations has been: train harder, fine-tune more, add RLHF. But this approach has a fundamental flaw — you can't train probability to be certainty.
A Simple Question for a Complex Future
Imagine this: You ask an AI to write a program. It works perfectly. But when you look at the code, you see this:
print((lambda p,q,m:(lambda n,phi:(lambda e,d:(pow(m,e,n),pow(pow(m,e,n),d,n)))(65537,pow(65537,-1,phi)))(p*q,(p-1)*(q-1)))(61,53,42))
What does this do? It encrypts and decrypts the number 42 using RSA. It works correctly. But can you verify that just by reading it?
This is the future of code. And we need to prepare.