Training Loss → Steps
I  Sharp drop — token frequencies, punctuation
II  Steady descent — grammar → facts → reasoning
III  Plateau — diminishing returns, data/model saturation