Lead · Fundamentals
RLHF teaches Claude what responses humans prefer. Constitutional AI teaches it what responses are actually right. Their combination is what makes Claude both helpful and honest.
Sophie Marlowe
·
June 03, 2026
A freshly trained language model is like a scholar who has read a vast amount of books but has no idea what "humans want." It can generate text — but not necessarily helpful, safe, or honest text. So how did Anthropic turn it into Claude?
## Pre-training Is Just the Starting Point
All large language models begin with "pre-training" — exposing the model to massive text data to learn...