Explanations as Model Reconciliation

Do we aspire to inhabit a world where AI agents converse and collaborate with us as naturally as we do amongst ourselves? The resounding consensus is yes, for numerous obvious reasons. Achieving this necessitates crafting AI agents with “explainable behavior”—the capability to elucidate their decisions and actions in terms that humans can grasp. Explanation is not merely a fundamental human trait but also instrumental to our collective advancement. To engineer such agents, we must delve into foundational human interactions and communications.

Entering this domain inevitably brings us to the Theory of Mind (ToM), formulated by Premack and Woodruff in 1978. ToM offers an indispensable framework for understanding human cognition and social dynamics. It equips individuals to impute mental states, such as beliefs and intentions, to others, even when differing from their own cognitive maps. This faculty is indispensable for anticipating future human behaviors, laying the groundwork for AI agents skilled in nuanced human interaction.

Social interactions, however complex or potentially misleading, benefit from this ability. A correct attribution of mental models can mitigate misunderstandings, thereby fostering smoother interactions. For example, in the formulation of shared plans or goals, the core competency required is ToM; each participant must comprehend the other’s intentions and coordinate actions accordingly to fulfill a mutual objective.

It’s worth noting that effective communication about differing mental states relies on a common linguistic foundation. Both parties should have mental models expressed in mutually comprehensible terms, underscoring ToM’s role as an essential socio-cognitive skill. This skill is often exercised intuitively in human interactions, forming an integral part of our social fabric. For a more exhaustive discussion on the development and significance of ToM, Baron-Cohen’s 1999 publication serves as a valuable resource.

Within AI, the Model Reconciliation Problem (MRP), introduced by Chakraborti et al. in 2017, builds on the foundational importance of Theory of Mind (ToM). In the realm of planning and MRP, a mental model is specifically formulated as a PDDL (Planning Domain Definition Language) expression, encompassing all fluents, predicates, objects, and permissible actions for a given problem. Key assumptions include the agent having a correct and complete model, while the human user’s model may have inaccuracies or gaps.

In standard MRP scenarios, the need for explanation arises when an agent’s plan is incomprehensible according to the human user’s model. This often occurs because the human user lacks specific elements necessary for the plan’s execution. The agent then aims to “reconcile” the model differences by supplying the missing information, enabling the human user to update their model and subsequently understand the plan.

Notably, MRP inherently acknowledges that the human user has an individualized model of the planning problem. Explanations are framed in terms of these model discrepancies, rendering model reconciliation a pivotal mechanism in explanation generation. This approach is particularly relevant in social contexts requiring intentional communication of information, highlighting the natural alignment of MRP with human social interaction dynamics.


  1. Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences.
  2. Baron-Cohen, S. (1999). The Evolution of a Theory of Mind. MIT Press.
  3. Chakraborti, T., et al. (2017). Plan Explanations as Model Reconciliation: Moving Beyond Explanation as Soliloquy. In Proceedings of the 26th International Joint Conference on Artificial Intelligence.