From MIT
Not all language model features are linear.
Recent work has proposed the linear representation hypothesis: that language models perform computation by manipulating one-dimensional representations of concepts (“features”) in activation space.
Join the discussion on this paper page.
Leave a reply