Temporal Predictors of Outcome in Reasoning Language Models
We probe just how early a Large Language Model (LLM) internally commits to an eventual outcome by training linear classifiers on hidden states after the first 𝑡 reasoning tokens; showing that eventual correctness is highly predictable after only a few tokens. We show that, for harder questions, a drop in predictive accuracy highlights a selection artifact: hard items are disproportionately represented in long CoTs.