Testing two families of large language models (LLMs) (GPT and LLaMA2) on a battery of measurements spanning different theory of mind abilities, Strachan et al. find that the performance of LLMs can mirror that of humans on most of these tasks. The authors explored potential reasons for this.
Comments are closed.