They reconstructed Feynman’s “restaurant problem” and proved that his solution was mathematically optimal. The challenge belongs to a class of problems known as “explore versus exploit” decisions—a tradeoff that appears everywhere from choosing restaurants and dating partners to scientific research and artificial intelligence. Explore too much, and you waste opportunities enjoying known good options. Exploit too soon, and you might miss something even better.
Feynman’s restaurant problem is an instance of what is known as an optimal stopping problem (7, 8). As such, it falls in the same category as the famous secretary problem (9), in which an interviewer seeks to maximize the probability of hiring the best candidate for a position but can only evaluate those candidates relative to one another. This problem can be translated to the dining setting by assuming the goal is to maximize the probability of selecting the best restaurant over a series of meals. However, Feynman’s problem differs from the classic secretary problem in three ways: the distribution from which the restaurants are drawn is known, the diner is able to return to restaurants that they visited previously, and the goal is to maximize the total score across nights rather than the probability of identifying the single best option.
Feynman’s restaurant problem is also closely related to the finite-horizon multi-armed bandit problem (10, 11), in which a decision-maker is presented with a set of options that differ in their payoffs (such as different arms of a gambling machine) and seeks to maximize the total payoff received from trying those options a fixed number of times. Again, this could be translated to the dining context, treating the restaurants as the different options. The key difference here is that in the multiarmed bandit problem the payoffs are usually stochastic, with a distribution around the true value, while in Feynman’s problem the true value of a restaurant is directly observed. Like the multiarmed bandit problem, Feynman’s restaurant problem creates a tension between exploring new options and exploiting knowledge acquired so far, but does so without dealing with uncertain observations.
Optimal stopping problems often arise in everyday life, appearing not just in choosing what to eat, but in finding a home, deciding who to marry, selecting a parking spot, and knowing when to quit a job (12). Extensive literatures have explored how people solve variants of the secretary problem (13 – 17) and the multiarmed bandit problem (18 – 25). Feynman’s restaurant problem is a valuable addition to this canon: by removing uncertain observations, it makes it possible to study the explore–exploit tradeoff in a particularly pure manner, and the existence of closed-form expressions for the optimal policy facilitates its comparison to human behavior. In fact, previous behavioral experiments have used optimal stopping problems that are similar to (26 – 28) or variants on (29 and 30) Feynman’s restaurant problem (for a detailed breakdown see Discussion). Here, we make use of the optimal solutions we derived for multiple variants of this problem, together with an innovative experimental design that allows us to get an unusually clear picture of people’s behavior and to draw direct parallels between results in the psychological literature and the solution found by Feynman. As a consequence, we hope to not just re-solve the problem that Feynman first posed more than 40 y ago, but to resolve the question of how people perform such tasks.