Original analysis at the intersection of AI alignment, cognitive science, and system design.
The usual diagnosis is that sycophancy is a values problem, or a training problem, or a reward model problem. This essay argues it's a perception problem — and that the reason alignment techniques haven't solved it is that they're correcting the output of a system that can't perceive what it should be optimizing for.
Read the essay