Blog

May 24
2024

Not All Language Model Features Are Linear

From MIT

Not all language model features are linear.

Recent work has proposed the linear representation hypothesis: that language models perform computation by manipulating one-dimensional representations of concepts (“features”) in activation space.

Join the discussion on this paper page.

/* */