Ever since the birth of
Artificial Intelligence (AI) at the Dartmouth workshop in 1956, researchers have debated about the exact role that AI will play, and should play, in society. While some have envisioned a romanticized version of AI, incorporated into the narratives of 20th century movies, successful AI developments are often closer to J. C. R. Licklider’s vision of AI, which puts an emphasis on a collaborative relationship between humans and AI, and focuses on
hybrid human-AI decision making.
In the
Multi-Agent Systems group at MPI-SWS, we study multi-agent sequential decision making using formal frameworks that can capture nuances often presented in human-AI collaborative settings. Specifically, we study different aspects of agent-to-agent interaction in settings where agents share a common goal, but can have different perceptions of reality. The overall goal is to design a more effective AI decision maker that accounts for the behavior of its collaborators, and compensates for their imperfections. To achieve this goal, the AI decision maker can use
steering policies to nudge its collaborators to adopt better policies, i.e., policies that lead to an improved joint outcome. In what follows, we summarize some of our recent results related to this agenda.
Accounting for misaligned world-views. An effective way to model behavioral differences between humans and modern AI tools (based on machine learning) is through a model that captures the misalignment in how the agents perceive their environment. Using this approach, we have proposed a new computational model, called
Multi-View Decision Process, suitable for modeling two-agent cooperative scenarios in which agents agree on their goals, but disagree on how their actions affect the state of the world [1]. This framework enables us to formally analyze the utility of accounting for the misalignment in agents’ world-views when only one of the agents has a correct model of the world. Our results show that modeling such a misalignment is not only beneficial, but critical. The main takeaway is that to facilitate a more successful collaboration among agents, it is not sufficient to make one agent (more) accurate in its world-view:
naively improving the accuracy of one agent can degrade the joint performance unless one explicitly accounts for the imperfections of the other agent. To this end, we have developed an algorithm for finding an approximately optimal steering policy for the agent with the correct world-view.
Adapting to a non-stationary collaborator. In addition to accounting for a misalignment in world-views, decision makers must also account for the effects of their behavior on other agents. Namely, decision makers respond to each other's behavior, leading to behavior which is non-stationary and changes over time. In the context of human-AI collaboration, this might happen if the human agent changes their behavior over time, for example, as it learns to interact with the AI agent. Such non-stationary behavior of the human agent could have a negative impact on the collaboration, and can lead to a substantially worse performance unless the AI agent adapts to the changing behavior of the human agent. We can model this situation with a two-agent setting similar to the one presented above, but which allows agents to change their behavior as they interact over time [2]. The agent with the correct world-view now has to adapt to the non-stationary behavior of its collaborator. We have proposed a learning procedure that has provable guarantees on the joint performance under the assumption that the behavior of the other agent is not abruptly changing over time. We have shown that this assumption is not trivial to relax in that obtaining the same guarantees without this assumption would require solving a computationally intractable problem.
Steering via environment design. The previous two cases consider indirect steering policies for which the agent with the correct model implicitly influences the behavior of its collaborator by acting in the world. A more explicit influence would be obtained if the actions of this agent are directly changing the world-view of its collaborator. In the context of human-AI collaboration, the AI agent could shape the environment to nudge the human agent to adopt a more efficient decision policy. This can be done through
reward shaping, i.e., by making some actions more costly for humans in terms of effort, or through
dynamics shaping, i.e., by changing the perceived influence that the human’s actions have on the world. In the machine learning terminology, such a steering strategy is nothing else but a form of an
adversarial attack of the AI agent (attacker) on the human agent. In our recent work [3], we have characterized how to optimally perform these types of attacks and how costly they are from an attacker’s point of view.
References:
[1] Dimitrakakis, C., Parkes, D.C., Radanovic, G. and Tylkin, P., 2017. Multi-view Decision Processes: The Helper-AI Problem. In
Advances in Neural Information Processing Systems.
[2] Radanovic, G., Devidze, R., Parkes, D. and Singla, A., 2019. Learning to Collaborate in Markov Decision Processes. In
International Conference on Machine Learning.
[3] Rakhsha, A., Radanovic, G., Devidze, R., Zhu, X. and Singla, A., 2020. Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning. In
International Conference on Machine Learning.