Apple researchers develop AI that can ‘see’ and understand screen context

Apple researchers have developed a new artificial intelligence system that can understand ambiguous references to on-screen entities as well as conversational and background context, enabling more natural interactions with voice assistants, according to a paper published on Friday.

The system, called ReALM (Reference Resolution As Language Modeling), leverages large language models to convert the complex task of reference resolution — including understanding references to visual elements on a screen — into a pure language modeling problem. This allows ReALM to achieve substantial performance gains compared to existing methods.

Blog