Research Overview

Our lab focuses on enabling robots to interact robustly with users in unconstrained and dynamic environments. We draw upon insights from artificial intelligence, human-robot interaction, procedural content generation, and quality diversity optimization to make fundamental advances in both developing interactive robots that assist users in complex, real-world tasks, as well as in generating complex, diverse, and realistic scenarios that effectively test the developed systems to enhance their robustness.

Research Contribution #1: Robot Assistance in Human Environments

In order for robots to be effective assistants in human environments, they need to be able to adapt to the human user, based on the user’s physical characteristics, their individualized preferences, and their priorities on how to execute the task. For example, a robot that helps a post-stroke patient with hair combing should recognize and follow the user’s hair style. A manufacturing robot that supports human workers in assembly manufacturing should anticipate which tool a worker is going to need next. While much prior work has focused on learning from demonstration, in many such tasks human demonstrations are tedious, time-consuming, or impractical.

We have designed algorithms that efficiently infer the the user’s physical and mental states. This is made possible with very little data by using compact representations of these states and prior knowledge in the form of dominant user preferences and signal temporal logic. Our models have enabled general-purpose robotic arms to proactively assist users in a variety of tasks, such as IKEA assembly, hair combing, and meal preparation.

Design and Evaluation of a Hair Combing System Using a General-Purpose Robotic Arm

N. Dennler, E. Shin, M. Matarić, S. Nikolaidis

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

Abstract

This work introduces an approach for automatic hair combing by a lightweight robot. For people living with limited mobility, dexterity, or chronic fatigue, combing hair is often a difficult task that negatively impacts personal routines. We propose a modular system for enabling general robot manipulators to assist with a hair-combing task. The system consists of three main components. The first component is the segmentation module, which segments the location of hair in space. The second component is the path planning module that proposes automatically-generated paths through hair based on user input. The final component creates a trajectory for the robot to execute. We quantitatively evaluate the effectiveness of the paths planned by the system with 48 users and qualitatively evaluate the system with 30 users watching videos of the robot performing a hair-combing task in the physical world. The system is shown to effectively comb different hairstyles.

Citation
@INPROCEEDINGS{dennler2021combing,
  author={Dennler, Nathaniel and Shin, Eura and Matarić, Maja and Nikolaidis, Stefanos},
  booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
   title={Design and Evaluation of a Hair Combing System Using a General-Purpose Robotic Arm},
   year={2021},
  volume={},
  number={},
  pages={3739-3746},
  doi={10.1109/IROS51168.2021.9636768},
  abstract={This work introduces an approach for automatic hair combing by a lightweight robot. For people living with limited mobility, dexterity, or chronic fatigue, combing hair is often a difficult task that negatively impacts personal routines. We propose a modular system for enabling general robot manipulators to assist with a hair-combing task. The system consists of three main components. The first component is the segmentation module, which segments the location of hair in space. The second component is the path planning module that proposes automatically-generated paths through hair based on user input. The final component creates a trajectory for the robot to execute. We quantitatively evaluate the effectiveness of the paths planned by the system with 48 users and qualitatively evaluate the system with 30 users watching videos of the robot performing a hair-combing task in the physical world. The system is shown to effectively comb different hairstyles.},
}

Two-Stage Clustering of Human Preferences for Action Prediction in Assembly Tasks

H. Nemlekar, J. Modi, S. Gupta, S. Nikolaidis

2021 IEEE International Conference on Robotics and Automation (ICRA), May 2021

Abstract

To effectively assist human workers in assembly tasks a robot must proactively offer support by inferring their preferences in sequencing the task actions. Previous work has focused on learning the dominant preferences of human workers for simple tasks largely based on their intended goal. However, people may have preferences at different resolutions: they may share the same high-level preference for the order of the sub-tasks but differ in the sequence of individual actions. We propose a two-stage approach for learning and inferring the preferences of human operators based on the sequence of sub-tasks and actions. We conduct an IKEA assembly study and demonstrate how our approach is able to learn the dominant preferences in a complex task. We show that our approach improves the prediction of human actions through cross-validation. Lastly, we show that our two-stage approach improves the efficiency of task execution in an online experiment, and demonstrate its applicability in a real-world robot-assisted IKEA assembly.

Citation
@article{nemlekar2021twostage,
  title={Two-Stage Clustering of Human Preferences for Action Prediction in
         Assembly Tasks},
  author={Heramb Nemlekar and Jignesh Modi and Satyandra K. Gupta and Stefanos Nikolaidis},
  journal={2021 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2021},
  month={May},
  url={https://arxiv.org/abs/2103.14994},
  abstract={To effectively assist human workers in assembly tasks a robot must proactively offer support by inferring their preferences in sequencing the task actions. Previous work has focused on learning the dominant preferences of human workers for simple tasks largely based on their intended goal. However, people may have preferences at different resolutions: they may share the same high-level preference for the order of the sub-tasks but differ in the sequence of individual actions. We propose a two-stage approach for learning and inferring the preferences of human operators based on the sequence of sub-tasks and actions. We conduct an IKEA assembly study and demonstrate how our approach is able to learn the dominant preferences in a complex task. We show that our approach improves the prediction of human actions through cross-validation. Lastly, we show that our two-stage approach improves the efficiency of task execution in an online experiment, and demonstrate its applicability in a real-world robot-assisted IKEA assembly.}
}

Personalizing User Engagement Dynamics in a Non-Verbal Communication Game for Cerebral Palsy

N. Dennler, C. Yunis, J. Realmuto, T. Sanger, S. Nikolaidis, M. Matarić

2021 30th IEEE International Conference on Robot Human Interactive Communication (RO-MAN), 2021

Abstract

Children and adults with cerebral palsy (CP) can have involuntary upper limb movements as a consequence of the symptoms that characterize their motor disability, leading to difficulties in communicating with caretakers and peers. We describe how a socially assistive robot may help individuals with CP to practice non-verbal communicative gestures using an active orthosis in a one-on-one number-guessing game. We performed a user study and data collection with participants with CP; we found that participants preferred an embodied robot over a screen-based agent, and we used the participant data to train personalized models of participant engagement dynamics that can be used to select personalized robot actions. Our work highlights the benefit of personalized models in the engagement of users with CP with a socially assistive robot and offers design insights for future work in this area.

Citation
@inproceedings{dennler2021personalizing,
    author={Dennler, Nathaniel and Yunis, Catherine and Realmuto, Jonathan and Sanger, Terence and Nikolaidis, Stefanos and Matarić, Maja},
    booktitle={2021 30th IEEE International Conference on Robot Human Interactive Communication (RO-MAN)},
     title={Personalizing User Engagement Dynamics in a Non-Verbal Communication Game for Cerebral Palsy},
    year={2021},
    pages={873-879},
    doi={10.1109/RO-MAN50785.2021.9515466},
    abstract={Children and adults with cerebral palsy (CP) can have involuntary upper limb movements as a consequence of the symptoms that characterize their motor disability, leading to difficulties in communicating with caretakers and peers. We describe how a socially assistive robot may help individuals with CP to practice non-verbal communicative gestures using an active orthosis in a one-on-one number-guessing game. We performed a user study and data collection with participants with CP; we found that participants preferred an embodied robot over a screen-based agent, and we used the participant data to train personalized models of participant engagement dynamics that can be used to select personalized robot actions. Our work highlights the benefit of personalized models in the engagement of users with CP with a socially assistive robot and offers design insights for future work in this area.}
}

Learning from Demonstrations using Signal Temporal Logic

A. Puranic, J. Deshmukh, S. Nikolaidis

Conference on Robot Learning, November 2020

Abstract

We present a model-based reinforcement learning framework for robot locomotion that achieves walking based on only 4.5 minutes of data collected on a quadruped robot. To accurately model the robot’s dynamics over a long horizon, we introduce a loss function that tracks the model’s prediction over multiple timesteps. We adapt model predictive control to account for planning latency, which allows the learned model to be used for real time control. Additionally, to ensure safe exploration during model learning, we embed prior knowledge of leg trajectories into the action space. The resulting system achieves fast and robust locomotion. Unlike model-free methods, which optimize for a particular task, our planner can use the same learned dynamics for various tasks, simply by changing the reward function.1 To the best of our knowledge, our approach is more than an order of magnitude more sample efficient than current model-free methods.

Citation
@InProceedings{puranic2020signal,
  title = {Learning from Demonstrations using Signal Temporal Logic},
  author = {Aniruddh Puranic and Jyotirmoy Deshmukh and Stefanos Nikolaidis},
  booktitle = {Conference on Robot Learning},
  year = {2020},
  month = {November},
  abstract = {We present a model-based reinforcement learning framework for robot locomotion that achieves walking based on only 4.5 minutes of data collected on a quadruped robot. To accurately model the robot’s dynamics over a long horizon, we introduce a loss function that tracks the model’s prediction over multiple timesteps. We adapt model predictive control to account for planning latency, which allows the learned model to be used for real time control. Additionally, to ensure safe exploration during model learning, we embed prior knowledge of leg trajectories into the action space. The resulting system achieves fast and robust locomotion. Unlike model-free methods, which optimize for a particular task, our planner can use the same learned dynamics for various tasks, simply by changing the reward function.1 To the best of our knowledge, our approach is more than an order of magnitude more sample efficient than current model-free methods.}
}

Fair Contextual Multi-Armed Bandits: Theory and Experiments

Y. Chen, A. Cuellar, H. Luo, J. Modi, H. Nemlekar, S. Nikolaidis

36th Conference on Uncertainty in Artificial Intelligence (UAI), August 2020

Abstract

When an AI system interacts with multiple users, it frequently needs to make allocation decisions. For instance, a virtual agent decides whom to pay attention to in a group, or a factory robot selects a worker to deliver a part.Demonstrating fairness in decision making is essential for such systems to be broadly accepted. We introduce a Multi-Armed Bandit algorithm with fairness constraints, where fairness is defined as a minimum rate at which a task or a resource is assigned to a user. The proposed algorithm uses contextual information about the users and the task and makes no assumptions on how the losses capturing the performance of different users are generated. We provide theoretical guarantees of performance and empirical results from simulation and an online user study. The results highlight the benefit of accounting for contexts in fair decision making, especially when users perform better at some contexts and worse at others.

Citation
@InProceedings{chen2020experiments,
  title = {Fair Contextual Multi-Armed Bandits: Theory and Experiments},
  author = {Chen, Yifang and Cuellar, Alex and Luo, Haipeng and Modi, Jignesh and Nemlekar, Heramb and Nikolaidis, Stefanos},
  booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)},
  pages = {181--190},
  year = {2020},
  editor = {Jonas Peters and David Sontag},
  volume = {124},
  series = {Proceedings of Machine Learning Research},
  month = {03--06 Aug},
  publisher = {PMLR},
  pdf = {http://proceedings.mlr.press/v124/chen20a/chen20a.pdf},
  url = { http://proceedings.mlr.press/v124/chen20a.html },
  abstract = {When an AI system interacts with multiple users, it frequently needs to make allocation decisions. For instance, a virtual agent decides whom to pay attention to in a group, or a factory robot selects a worker to deliver a part.Demonstrating fairness in decision making is essential for such systems to be broadly accepted. We introduce a Multi-Armed Bandit algorithm with fairness constraints, where fairness is defined as a minimum rate at which a task or a resource is assigned to a user. The proposed algorithm uses contextual information about the users and the task and makes no assumptions on how the losses capturing the performance of different users are generated. We provide theoretical guarantees of performance and empirical results from simulation and an online user study. The results highlight the benefit of accounting for contexts in fair decision making, especially when users perform better at some contexts and worse at others.},
}

Multi-Armed Bandits with Fairness Constraints for Distributing Resources to Human Teammates

H. Claure, Y. Chen, J. Modi, M. Jung, S. Nikolaidis

2020 ACM/IEEE International Conference on Human-Robot Interaction, March 2020

Abstract

How should a robot that collaborates with multiple people decide upon the distribution of resources (e.g. social attention, or parts needed for an assembly)? People are uniquely attuned to how resources are distributed. A decision to distribute more resources to one team member than another might be perceived as unfair with potentially detrimental effects for trust. We introduce a multi-armed bandit algorithm with fairness constraints, where a robot distributes resources to human teammates of different skill levels. In this problem, the robot does not know the skill level of each human teammate, but learns it by observing their performance over time. We define fairness as a constraint on the minimum rate that each human teammate is selected throughout the task. We provide theoretical guarantees on performance and perform a large-scale user study, where we adjust the level of fairness in our algorithm. Results show that fairness in resource distribution has a significant effect on users' trust in the system.

Citation
@inproceedings{claure2020bandits,
  author = {Claure, Houston and Chen, Yifang and Modi, Jignesh and Jung, Malte and Nikolaidis, Stefanos},
  title = {Multi-Armed Bandits with Fairness Constraints for Distributing Resources to Human Teammates},
  year = {2020},
  month = {March},
  isbn = {9781450367462},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3319502.3374806},
  doi = {10.1145/3319502.3374806},
  abstract = {How should a robot that collaborates with multiple people decide upon the distribution of resources (e.g. social attention, or parts needed for an assembly)? People are uniquely attuned to how resources are distributed. A decision to distribute more resources to one team member than another might be perceived as unfair with potentially detrimental effects for trust. We introduce a multi-armed bandit algorithm with fairness constraints, where a robot distributes resources to human teammates of different skill levels. In this problem, the robot does not know the skill level of each human teammate, but learns it by observing their performance over time. We define fairness as a constraint on the minimum rate that each human teammate is selected throughout the task. We provide theoretical guarantees on performance and perform a large-scale user study, where we adjust the level of fairness in our algorithm. Results show that fairness in resource distribution has a significant effect on users' trust in the system.},
  booktitle = {Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction},
  pages = {299–308},
  numpages = {10},
  keywords = {reinforcement learning, multi-armed bandits, trust, fairness},
  location = {Cambridge, United Kingdom},
  series = {HRI '20}
}

Learning Collaborative Action Plans from Unlabeled YouTube Videos

H. Zhang, P. Lai, S. Paul, S. Kothawade, S. Nikolaidis

Robotics Research, The 19th International Symposium, ISRR 2019, October 2019

Abstract

Videos from the World Wide Web provide a rich source of information that robots could use to acquire knowledge about manipulation tasks. Previous work has focused on generating action sequences from unconstrained videos for a single robot performing manipulation tasks by itself. However, robots operating in the same physical space with people need to not only perform actions autonomously, but also coordinate seamlessly with their human counterparts. This often requires representing and executing collaborative manipulation actions, such as handing over a tool or holding an object for the other agent. We present a system for knowledge acquisition of collaborative manipulation action plans that outputs commands to the robot in the form of visual sentence. We show the performance of the system in 12 unlabeled action clips taken from collaborative cooking videos on YouTube. We view this as the first step towards extracting collaborative manipulation action sequences from unconstrained, unlabeled online videos.

Citation
@inproceedings {zhang2019youtube,
  author={Hejia Zhang and Po-Jen Lai and Sayan Paul and Suraj Kothawade and Stefanos Nikolaidis},
  title={Learning Collaborative Action Plans from Unlabeled YouTube Videos},
  booktitle={Robotics Research, The 19th International Symposium, {ISRR} 2019},
  location={Hanoi, Vietnam},
  year={2019},
  month={October},
  abstract={Videos from the World Wide Web provide a rich source of information that robots could use to acquire knowledge about manipulation tasks. Previous work has focused on generating action sequences from unconstrained videos for a single robot performing manipulation tasks by itself. However, robots operating in the same physical space with people need to not only perform actions autonomously, but also coordinate seamlessly with their human counterparts. This often requires representing and executing collaborative manipulation actions, such as handing over a tool or holding an object for the other agent. We present a system for knowledge acquisition of collaborative manipulation action plans that outputs commands to the robot in the form of visual sentence. We show the performance of the system in 12 unlabeled action clips taken from collaborative cooking videos on YouTube. We view this as the first step towards extracting collaborative manipulation action sequences from unconstrained, unlabeled online videos.},
}

Research Contribution #2: Automatic Scenario Generation

For HRI systems to be widely accepted and used, they need to perform robustly in human environments. Traditionally, human-robot interaction (HRI) algorithms are tested with human subject experiments. While these experiments are fundamental in exploring and evaluating human-robot interactions and can lead to exciting and unpredictable behaviors, they are often limited in the number of environments and human actions that they can cover. Exhaustive search of human actions and environments is also computationally prohibitive. This highlights a critical need for automatically generating HRI scenarios; failure to do so can lead to infrequent scenarios that are undiscovered during testing but occur in large-scale real-world deployments, resulting in potentially costly failures.

Our goal is to find scenarios that are diverse, complex and realistic. Our insight is to formulate this as a quality diversity problem, where the goal is to generate scenarios that are diverse with respect to specific measures of interest. Drawing upon insights from procedural content generation, we integrate state-of-the-art quality diversity algorithms with generative models trained with human examples to generate diverse scenarios in simulation that are also complex and realistic.

Deep Surrogate Assisted Generation of Environments

V. Bhatt*, B. Tjanaka*, M. C. Fontaine*, S. Nikolaidis

Neural Information Processing Systems (NeurIPS), November 2022

Evaluating Human-Robot Interaction Algorithms in Shared Autonomy via Quality Diversity Scenario Generation

M. Fontaine, S. Nikolaidis

ACM Transactions on Human-Robot Interaction, July 2021

Abstract

The growth of scale and complexity of interactions between humans and robots highlights the need for new computational methods to automatically evaluate novel algorithms and applications. Exploring diverse scenarios of humans and robots interacting in simulation can improve understanding of the robotic system and avoid potentially costly failures in real-world settings. We formulate this problem as a quality diversity (QD) problem, where the goal is to discover diverse failure scenarios by simultaneously exploring both environments and human actions. We focus on the shared autonomy domain, where the robot attempts to infer the goal of a human operator, and adopt the QD algorithms CMA-ME and MAP-Elites to generate scenarios for two published algorithms in this domain: shared autonomy via hindsight optimization and linear policy blending. Some of the generated scenarios confirm previous theoretical findings, while others are surprising and bring about a new understanding of state-of-the-art implementations. Our experiments show that the QD algorithms CMA-ME and MAP-Elites outperform Monte-Carlo simulation and optimization based methods in effectively searching the scenario space, highlighting their promise for automatic evaluation of algorithms in human-robot interaction.

Citation
@article{fontaine2021evaluating,
  author = {Fontaine, Matthew C. and Nikolaidis, Stefanos},
  title = {Evaluating Human-Robot Interaction Algorithms in Shared Autonomy via Quality Diversity Scenario Generation},
  year = {2021},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3476412},
  doi = {10.1145/3476412},
  abstract = {The growth of scale and complexity of interactions between humans and robots highlights the need for new computational methods to automatically evaluate novel algorithms and applications. Exploring diverse scenarios of humans and robots interacting in simulation can improve understanding of the robotic system and avoid potentially costly failures in real-world settings. We formulate this problem as a quality diversity (QD) problem, where the goal is to discover diverse failure scenarios by simultaneously exploring both environments and human actions. We focus on the shared autonomy domain, where the robot attempts to infer the goal of a human operator, and adopt the QD algorithms CMA-ME and MAP-Elites to generate scenarios for two published algorithms in this domain: shared autonomy via hindsight optimization and linear policy blending. Some of the generated scenarios confirm previous theoretical findings, while others are surprising and bring about a new understanding of state-of-the-art implementations. Our experiments show that the QD algorithms CMA-ME and MAP-Elites outperform Monte-Carlo simulation and optimization based methods in effectively searching the scenario space, highlighting their promise for automatic evaluation of algorithms in human-robot interaction.},
  note = {Just Accepted},
  journal = {J. Hum.-Robot Interact.},
  month = {jul}
}

On the Importance of Environments in Human-Robot Coordination

M. Fontaine*, Y. Hsu*, Y. Zhang*, B. Tjanaka, S. Nikolaidis

Robotics: Science and Systems, July 2021

Abstract

When studying robots collaborating with humans, much of the focus has been on robot policies that coordinate fluently with human teammates in collaborative tasks. However, less emphasis has been placed on the effect of the environment on coordination behaviors. To thoroughly explore environments that result in diverse behaviors, we propose a framework for procedural generation of environments that are (1) stylistically similar to human-authored environments, (2) guaranteed to be solvable by the human-robot team, and (3) diverse with respect to coordination measures. We analyze the procedurally generated environments in the Overcooked benchmark domain via simulation and an online user study. Results show that the environments result in qualitatively different emerging behaviors and statistically significant differences in collaborative fluency metrics, even when the robot runs the same planning algorithm.

Citation
@inproceedings{fontaine2021importance,
    title={On the Importance of Environments in Human-Robot Coordination},
    author={Matthew C. Fontaine and Ya-Chuan Hsu and Yulun Zhang and Bryon Tjanaka and Stefanos Nikolaidis},
    year={2021},
    month={July},
    url={https://arxiv.org/abs/2106.10853},
    booktitle={Proceedings of Robotics: Science and Systems},
    doi={10.15607/RSS.2021.XVII.038},
    abstract={When studying robots collaborating with humans, much of the focus has been on robot policies that coordinate fluently with human teammates in collaborative tasks. However, less emphasis has been placed on the effect of the environment on coordination behaviors. To thoroughly explore environments that result in diverse behaviors, we propose a framework for procedural generation of environments that are (1) stylistically similar to human-authored environments, (2) guaranteed to be solvable by the human-robot team, and (3) diverse with respect to coordination measures. We analyze the procedurally generated environments in the Overcooked benchmark domain via simulation and an online user study. Results show that the environments result in qualitatively different emerging behaviors and statistically significant differences in collaborative fluency metrics, even when the robot runs the same planning algorithm.}
}

A Quality Diversity Approach to Automatically Generating Human-Robot Interaction Scenarios in Shared Autonomy

M. Fontaine, S. Nikolaidis

Robotics: Science and Systems, July 2021

Abstract

The growth of scale and complexity of interactions between humans and robots highlights the need for new computational methods to automatically evaluate novel algorithms and applications. Exploring diverse scenarios of humans and robots interacting in simulation can improve understanding of the robotic system and avoid potentially costly failures in real-world settings. We formulate this problem as a quality diversity (QD) problem, where the goal is to discover diverse failure scenarios by simultaneously exploring both environments and human actions. We focus on the shared autonomy domain, where the robot attempts to infer the goal of a human operator, and adopt the QD algorithm MAP-Elites to generate scenarios for two published algorithms in this domain: shared autonomy via hindsight optimization and linear policy blending. Some of the generated scenarios confirm previous theoretical findings, while others are surprising and bring about a new understanding of state-of-the-art implementations. Our experiments show that MAP-Elites outperforms Monte-Carlo simulation and optimization based methods in effectively searching the scenario space, highlighting its promise for automatic evaluation of algorithms in human-robot interaction.

Citation
@inproceedings{fontaine2021shared,
    title={A Quality Diversity Approach to Automatically Generating Human-Robot
           Interaction Scenarios in Shared Autonomy},
    author={Matthew C. Fontaine and Stefanos Nikolaidis},
    year={2021},
    month={July},
    url={https://arxiv.org/abs/2012.04283},
    booktitle={Proceedings of Robotics: Science and Systems},
    doi={10.15607/RSS.2021.XVII.036},
    abstract={The growth of scale and complexity of interactions between humans and robots highlights the need for new computational methods to automatically evaluate novel algorithms and applications. Exploring diverse scenarios of humans and robots interacting in simulation can improve understanding of the robotic system and avoid potentially costly failures in real-world settings. We formulate this problem as a quality diversity (QD) problem, where the goal is to discover diverse failure scenarios by simultaneously exploring both environments and human actions. We focus on the shared autonomy domain, where the robot attempts to infer the goal of a human operator, and adopt the QD algorithm MAP-Elites to generate scenarios for two published algorithms in this domain: shared autonomy via hindsight optimization and linear policy blending. Some of the generated scenarios confirm previous theoretical findings, while others are surprising and bring about a new understanding of state-of-the-art implementations. Our experiments show that MAP-Elites outperforms Monte-Carlo simulation and optimization based methods in effectively searching the scenario space, highlighting its promise for automatic evaluation of algorithms in human-robot interaction.}
}

Illuminating Mario Scenes in the Latent Space of a Generative Adversarial Network

M. Fontaine, R. Liu, A. Khalifa, J. Modi, J. Togelius, A. Hoover, S. Nikolaidis

AAAI Conference on Artificial Intelligence, February 2021

Abstract

Generative adversarial networks (GANs) are quickly becoming a ubiquitous approach to procedurally generating video game levels. While GAN generated levels are stylistically similar to human-authored examples, human designers often want to explore the generative design space of GANs to extract interesting levels. However, human designers find latent vectors opaque and would rather explore along dimensions the designer specifies, such as number of enemies or obstacles. We propose using state-of-the-art quality diversity algorithms designed to optimize continuous spaces, i.e. MAP-Elites with a directional variation operator and Covariance Matrix Adaptation MAP-Elites, to efficiently explore the latent space of a GAN to extract levels that vary across a set of specified gameplay measures. In the benchmark domain of Super Mario Bros, we demonstrate how designers may specify gameplay measures to our system and extract high-quality (playable) levels with a diverse range of level mechanics, while still maintaining stylistic similarity to human authored examples. An online user study shows how the different mechanics of the automatically generated levels affect subjective ratings of their perceived difficulty and appearance.

Citation
@article{fontaine2021illuminating,
  title={Illuminating Mario Scenes in the Latent Space of a Generative Adversarial Network},
  volume={35},
  url={https://ojs.aaai.org/index.php/AAAI/article/view/16740},
  journal={AAAI Conference on Artificial Intelligence},
  author={Matthew C. Fontaine and Ruilin Liu and Ahmed Khalifa and Jignesh Modi and Julian Togelius and Amy K. Hoover and Stefanos Nikolaidis},
  year={2021},
  month={February},
  abstract={Generative adversarial networks (GANs) are quickly becoming a ubiquitous approach to procedurally generating video game levels. While GAN generated levels are stylistically similar to human-authored examples, human designers often want to explore the generative design space of GANs to extract interesting levels. However, human designers find latent vectors opaque and would rather explore along dimensions the designer specifies, such as number of enemies or obstacles. We propose using state-of-the-art quality diversity algorithms designed to optimize continuous spaces, i.e. MAP-Elites with a directional variation operator and Covariance Matrix Adaptation MAP-Elites, to efficiently explore the latent space of a GAN to extract levels that vary across a set of specified gameplay measures. In the benchmark domain of Super Mario Bros, we demonstrate how designers may specify gameplay measures to our system and extract high-quality (playable) levels with a diverse range of level mechanics, while still maintaining stylistic similarity to human authored examples. An online user study shows how the different mechanics of the automatically generated levels affect subjective ratings of their perceived difficulty and appearance.}
}

Video Game Level Repair via Mixed Integer Linear Programming

H. Zhang*, M. Fontaine*, A. Hoover, J. Togelius, B. Dilkina, S. Nikolaidis

AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, October 2020

Abstract

Recent advancements in procedural content generation via machine learning enable the generation of video-game levels that are aesthetically similar to human-authored examples. However, the generated levels are often unplayable without additional editing. We propose a “generate-then-repair” framework for automatic generation of playable levels adhering to specific styles. The framework constructs levels using a generative adversarial network (GAN) trained with human-authored examples and repairs them using a mixed-integer linear program (MIP) with playability constraints. A key component of the framework is computing minimum cost edits between the GAN generated level and the solution of the MIP solver, which we cast as a minimum cost network flow problem. Results show that the proposed framework generates a diverse range of playable levels, that capture the spatial relationships between objects exhibited in the human-authored levels.

Citation
@article{zhang2020repair,
  title={Video Game Level Repair via Mixed Integer Linear Programming},
  volume={16},
  url={https://ojs.aaai.org/index.php/AIIDE/article/view/7424},
  abstract={Recent advancements in procedural content generation via machine learning enable the generation of video-game levels that are aesthetically similar to human-authored examples. However, the generated levels are often unplayable without additional editing. We propose a “generate-then-repair” framework for automatic generation of playable levels adhering to specific styles. The framework constructs levels using a generative adversarial network (GAN) trained with human-authored examples and repairs them using a mixed-integer linear program (MIP) with playability constraints. A key component of the framework is computing minimum cost edits between the GAN generated level and the solution of the MIP solver, which we cast as a minimum cost network flow problem. Results show that the proposed framework generates a diverse range of playable levels, that capture the spatial relationships between objects exhibited in the human-authored levels.},
  number={1},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment},
  author={Zhang, Hejia and Fontaine, Matthew and Hoover, Amy and Togelius, Julian and Dilkina, Bistra and Nikolaidis, Stefanos},
  year={2020},
  month={October},
  pages={151-158}
}

Research Contribution #3: Quality Diversity Optimization

Searching the vast space of HRI scenarios requires algorithms that explore efficiently very high-dimensional continuous domains. We focus on a class of stochastic optimization algorithms, named quality diversity (QD) algorithms, which search for a range of high-quality solutions that are diverse with respect to measures of interest. Current state-of-the-art QD algorithms find new solutions by perturbing existing high-quality solutions with Gaussian noise, or through variation operators that exploit local correlations. However, these algorithms face difficulties when the low-dimensional measure space that we wish to cover is distorted, e.g., when uniformly sampling the space of scenarios results in scenarios that are nearly identical with respect to the measures that we wish to have diversity for.

Our insight is that we can leverage the adaptation properties of the CMA-ES derivative-free optimization algorithm to dynamically adapt the step size of our search based on how our archive of solutions is changing. Using this insight we propose a new class of algorithms that bring together the self-adaptation of CMA-ES with the archiving properties of MAP-Elites to approximate a natural gradient of the quality diversity objective. Our algorithms can also leverage the gradient information of the objective and measure functions to achieve significant gains in performance.

Quality diversity algorithms at work.

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

B. Tjanaka, M. Fontaine, J. Togelius, S. Nikolaidis

Genetic and Evolutionary Computation Conference, 2022

Abstract

Consider a walking agent that must adapt to damage. To approach this task, we can train a collection of policies and have the agent select a suitable policy when damaged. Training this collection may be viewed as a quality diversity (QD) optimization problem, where we search for solutions (policies) which maximize an objective (walking forward) while spanning a set of measures (measurable characteristics). Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures. However, such gradients are typically unavailable in RL settings due to non-differentiable environments. To apply DQD in RL settings, we propose to approximate objective and measure gradients with evolution strategies and actor-critic methods. We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks. One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks. These results provide insight into the limitations of CMA-MEGA in domains that require rigorous optimization of the objective and where exact gradients are unavailable.

Citation
@inproceedings{10.1145/3512290.3528705,
  author = {Tjanaka, Bryon and Fontaine, Matthew C. and Togelius, Julian and Nikolaidis, Stefanos},
  title = {Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning},
  year = {2022},
  isbn = {9781450392372},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3512290.3528705},
  doi = {10.1145/3512290.3528705},
  abstract = {Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl},
  booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference},
  pages = {1102–1111},
  numpages = {10},
  keywords = {neuroevolution, reinforcement learning, quality diversity},
  location = {Boston, Massachusetts},
  series = {GECCO '22}
}

Differentiable Quality Diversity

M. Fontaine, S. Nikolaidis

Advances in Neural Information Processing Systems, 2021

NeurIPS 2021 Oral

Abstract

Quality diversity (QD) is a growing branch of stochastic optimization research that studies the problem of generating an archive of solutions that maximize a given objective function but are also diverse with respect to a set of specified measure functions. However, even when these functions are differentiable, QD algorithms treat them as "black boxes", ignoring gradient information. We present the differentiable quality diversity (DQD) problem, a special case of QD, where both the objective and measure functions are first order differentiable. We then present MAP-Elites via Gradient Arborescence (MEGA), a DQD algorithm that leverages gradient information to efficiently explore the joint range of the objective and measure functions. Results in two QD benchmark domains and in searching the latent space of a StyleGAN show that MEGA significantly outperforms state-of-the-art QD algorithms, highlighting DQD's promise for efficient quality diversity optimization when gradient information is available.

Citation
@inproceedings{fontaine2021dqd,
  author = {Fontaine, Matthew and Nikolaidis, Stefanos},
  booktitle = {Advances in Neural Information Processing Systems},
  editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
  pages = {10040--10052},
  publisher = {Curran Associates, Inc.},
  title = {Differentiable Quality Diversity},
  url = {https://proceedings.neurips.cc/paper/2021/file/532923f11ac97d3e7cb0130315b067dc-Paper.pdf},
  volume = {34},
  year = {2021},
  abstract={Quality diversity (QD) is a growing branch of stochastic optimization research that studies the problem of generating an archive of solutions that maximize a given objective function but are also diverse with respect to a set of specified measure functions. However, even when these functions are differentiable, QD algorithms treat them as "black boxes", ignoring gradient information. We present the differentiable quality diversity (DQD) problem, a special case of QD, where both the objective and measure functions are first order differentiable. We then present MAP-Elites via Gradient Arborescence (MEGA), a DQD algorithm that leverages gradient information to efficiently explore the joint range of the objective and measure functions. Results in two QD benchmark domains and in searching the latent space of a StyleGAN show that MEGA significantly outperforms state-of-the-art QD algorithms, highlighting DQD's promise for efficient quality diversity optimization when gradient information is available.}
}

Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space

M. Fontaine, J. Togelius, S. Nikolaidis, A. Hoover

2020 Genetic and Evolutionary Computation Conference, June 2020

Abstract

We focus on the challenge of finding a diverse collection of quality solutions on complex continuous domains. While quality diversity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are designed to generate a diverse range of solutions, these algorithms require a large number of evaluations for exploration of continuous spaces. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single-objective continuous domains. This paper proposes a new QD algorithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the self-adaptation techniques of CMA-ES with archiving and mapping techniques for maintaining diversity in QD. Results from experiments based on standard continuous optimization benchmarks show that CMA-ME finds better-quality solutions than MAP-Elites; similarly, results on the strategic game Hearthstone show that CMA-ME finds both a higher overall quality and broader diversity of strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles the performance of MAP-Elites using standard QD performance metrics. These results suggest that QD algorithms augmented by operators from state-of-the-art optimization algorithms can yield high-performing methods for simultaneously exploring and optimizing continuous search spaces, with significant applications to design, testing, and reinforcement learning among other domains.

Citation
@inproceedings{fontaine2020covariance,
  author = {Fontaine, Matthew C. and Togelius, Julian and Nikolaidis, Stefanos and Hoover, Amy K.},
  title = {Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space},
  year = {2020},
  month = {June},
  isbn = {9781450371285},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3377930.3390232},
  doi = {10.1145/3377930.3390232},
  abstract = {We focus on the challenge of finding a diverse collection of quality solutions on complex continuous domains. While quality diversity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are designed to generate a diverse range of solutions, these algorithms require a large number of evaluations for exploration of continuous spaces. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single-objective continuous domains. This paper proposes a new QD algorithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the self-adaptation techniques of CMA-ES with archiving and mapping techniques for maintaining diversity in QD. Results from experiments based on standard continuous optimization benchmarks show that CMA-ME finds better-quality solutions than MAP-Elites; similarly, results on the strategic game Hearthstone show that CMA-ME finds both a higher overall quality and broader diversity of strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles the performance of MAP-Elites using standard QD performance metrics. These results suggest that QD algorithms augmented by operators from state-of-the-art optimization algorithms can yield high-performing methods for simultaneously exploring and optimizing continuous search spaces, with significant applications to design, testing, and reinforcement learning among other domains.},
  booktitle = {Proceedings of the 2020 Genetic and Evolutionary Computation Conference},
  pages = {94–102},
  numpages = {9},
  keywords = {hearthstone, evolutionary algorithms, MAP-Elites, optimization, quality diversity, illumination algorithms},
  location = {Canc'{u}n, Mexico},
  series = {GECCO '20}
}