Publications

2026¶

Conferences ¶

Discount model search for quality diversity optimization in high-dimensional measure spaces

B. Tjanaka, H. Chen, M. Fontaine, and S. Nikolaidis

International Conference on Learning Representations (ICLR), 2026, oral presentation

Abstract

Quality diversity (QD) optimization searches for a collection of solutions that optimize an objective while attaining diverse outputs of a user-specified, vector-valued measure function. Contemporary QD algorithms are typically limited to low-dimensional measures because high-dimensional measures are prone to distortion, where many solutions found by the QD algorithm map to similar measures. For example, the state-of-the-art CMA-MAE algorithm guides measure space exploration with a histogram in measure space that records so-called discount values. However, CMA-MAE stagnates in domains with high-dimensional measure spaces because solutions with similar measures fall into the same histogram cell and hence receive the same discount value. To address these limitations, we propose Discount Model Search (DMS), which guides exploration with a model that provides a smooth, continuous representation of discount values. In high-dimensional measure spaces, this model enables DMS to distinguish between solutions with similar measures and thus continue exploration. We show that DMS facilitates new capabilities for QD algorithms by introducing two new domains where the measure space is the high-dimensional space of images, which enables users to specify their desired measures by providing a dataset of images rather than hand-designing the measure function. Results in these domains and on high-dimensional benchmarks show that DMS outperforms CMA-MAE and other existing black-box QD algorithms.

Soft quality-diversity optimization

S. Hedayatian, and S. Nikolaidis

International Conference on Learning Representations (ICLR), 2026

arxiv

Abstract

Quality-Diversity (QD) algorithms constitute a branch of optimization that is concerned with discovering a diverse and high-quality set of solutions to an optimization problem. Current QD methods commonly maintain diversity by dividing the behavior space into discrete regions, ensuring that solutions are distributed across different parts of the space. The QD problem is then solved by searching for the best solution in each region. This approach to QD optimization poses challenges in large solution spaces, where storing many solutions is impractical, and in high-dimensional behavior spaces, where discretization becomes ineffective due to the curse of dimensionality. We present an alternative framing of the QD problem, called Soft QD, that sidesteps the need for discretizations. We validate this formulation by demonstrating its desirable properties, such as monotonicity, and by relating its limiting behavior to the widely used QD Score metric. Furthermore, we leverage it to derive a novel differentiable QD algorithm, Soft QD Using Approximated Diversity (SQUAD), and demonstrate empirically that it is competitive with current state of the art methods on standard benchmarks while offering better scalability to higher dimensional problems. Source code is available at https://github.com/conflictednerd/soft-qd

Autoqd: Automatic discovery of diverse behaviors with quality-diversity optimization

S. Hedayatian, and S. Nikolaidis

International Conference on Learning Representations (ICLR), 2026

arxiv

Abstract

Quality-Diversity (QD) algorithms have shown remarkable success in discovering diverse, high-performing solutions, but rely heavily on hand-crafted behavioral descriptors that constrain exploration to predefined notions of diversity. Leveraging the equivalence between policies and occupancy measures, we present a theoretically grounded approach to automatically generate behavioral descriptors by embedding the occupancy measures of policies in Markov Decision Processes. Our method, AutoQD, leverages random Fourier features to approximate the Maximum Mean Discrepancy (MMD) between policy occupancy measures, creating embeddings whose distances reflect meaningful behavioral differences. A low-dimensional projection of these embeddings that captures the most behaviorally significant dimensions can then be used as behavioral descriptors for CMA-MAE, a state of the art blackbox QD method, to discover diverse policies. We prove that our embeddings converge to true MMD distances between occupancy measures as the number of sampled trajectories and embedding dimensions increase. Through experiments in multiple continuous control tasks we demonstrate AutoQD's ability in discovering diverse policies without predefined behavioral descriptors, presenting a well-motivated alternative to prior methods in unsupervised Reinforcement Learning and QD optimization. Our approach opens new possibilities for open-ended learning and automated behavior discovery in sequential decision making settings without requiring domain-specific knowledge. Source code is available at https://github.com/conflictednerd/autoqd-code

Algorithmic prompt generation for diverse human-like teaming and communication with large language models

S. Srikanth, V. Bhatt, B. Zhang, W. Hager, C. M. Lewis, K. P. Sycara, A. Tabrez, and S. Nikolaidis

Genetic and Evolutionary Computation Conference (GECCO), (poster), 2026

arxiv

Abstract

Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) have been shown to emulate human-like behavior in social settings. But, obtaining a large set of diverse behaviors requires manual effort in the form of designing prompts. On the other hand, Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior. In this work, we combine QD optimization with LLM-powered agents to iteratively search for prompts that generate diverse team behavior in a long-horizon, multi-step collaborative environment. We first show, through a human-subjects experiment (n=54 participants), that humans exhibit diverse coordination and communication behavior in this domain. We then show that our approach can effectively replicate trends from human teaming data and also capture behaviors that are not easily observed without collecting large amounts of data. Our findings highlight the combination of QD and LLM-powered agents as an effective tool for studying teaming and communication strategies in multi-agent collaboration.

Theory of mind guided strategy adaptation for zero-shot coordination

A. Ni, S. Stepputtis, S. Nikolaidis, M. Lewis, K. Sycara, and W. Kim

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2026

arxiv

Abstract

A central challenge in multi-agent reinforcement learning is enabling agents to adapt to previously unseen teammates in a zero-shot fashion. Prior work in zero-shot coordination often follows a two-stage process, first generating a diverse training pool of partner agents, and then training a best-response agent to collaborate effectively with the entire training pool. While many previous works have achieved strong performance by devising better ways to diversify the partner agent pool, there has been less emphasis on how to leverage this pool to build an adaptive agent. One limitation is that the best-response agent may converge to a static, generalist policy that performs reasonably well across diverse teammates, rather than learning a more adaptive, specialist policy that can better adapt to teammates and achieve higher synergy. To address this, we propose an adaptive ensemble agent that uses Theory-of-Mind-based best-response selection to first infer its teammate's intentions and then select the most suitable policy from a policy ensemble. We conduct experiments in the Overcooked environment to evaluate zero-shot coordination performance under both fully and partially observable settings. The empirical results demonstrate the superiority of our method over a single best-response baseline.

Qd-mapper: A quality diversity framework to automatically evaluate multi-agent path ﬁnding algorithms in diverse maps

C. Qian, V. Bhatt, M. Fontaine, S. Nikolaidis, and J. Li

International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2026

arxiv

Abstract

We use the Quality Diversity (QD) algorithm with Neural Cellular Automata (NCA) to automatically evaluate Multi-Agent Path Finding (MAPF) algorithms by generating diverse maps. Previously, researchers typically evaluate MAPF algorithms on a set of specific, human-designed maps at their initial stage of algorithm design. However, such fixed maps may not cover all scenarios, and algorithms may overfit to the small set of maps. To seek further improvements, systematic evaluations on a diverse suite of maps are needed. In this work, we propose Quality-Diversity Multi-Agent Path Finding Performance EvaluatoR (QD-MAPPER), a general framework that takes advantage of the QD algorithm to comprehensively understand the performance of MAPF algorithms by generating maps with patterns, be able to make fair comparisons between two MAPF algorithms, providing further information on the selection between two algorithms and on the design of the algorithms. Empirically, we employ this technique to evaluate and compare the behavior of different types of MAPF algorithms, including search-based, priority-based, rule-based, and learning-based algorithms. Through both single-algorithm experiments and comparisons between algorithms, researchers can identify patterns that each MAPF algorithm excels and detect disparities in runtime or success rates between different algorithms.

2025¶

Journals ¶

Proactive Contingency-Aware Task Allocation and Scheduling in Multi-Robot Multi-Human Cells via Hindsight Optimization

N. Dhanaraj, H. Nemlekar, S. Nikolaidis, and S. Gupta K.

IEEE Transactions on Automation Science and Engineering

IEEE Xplore

Abstract

Multi-robot systems are becoming more common in various real-world applications, such as manufacturing and warehouse logistics. However, task allocation and scheduling for a multi-agent team face complex challenges due to the need to simultaneously consider time-extended tasks, task constraints, and uncertainties in execution. Potential task failures or contingencies can add additional tasks to recover from the failures, and reactively addressing contingencies can decrease teaming efficiency. To efficiently and proactively consider contingencies, this paper proposes treating the problem as a multi-robot task allocation under uncertainty problem. We suggest a hierarchical approach that divides the problem into two layers. We use mathematical program formulation for the lower layer to find the optimal solution for a deterministic multi-robot task allocation problem with known task outcomes. The higher-layer search intelligently generates more likely combinations of contingency scenarios and calls the inner-level search repeatedly to find the optimal task allocation sequence for the given scenario. We validate our results in simulation for manufacturing applications and demonstrate that our method can reduce the effect of potential delays from contingencies.Note to Practitioners—Automation engineers interested in deploying robotic cells in low-volume applications need to consider contingency handling. When the occurrence of contingencies can be characterized as probability distributions, it is often useful to consider using a proactive approach for task allocation and scheduling. To implement our algorithm, automation engineers will need to develop a hierarchical task network specified by domain experts that models task constraints and a task-agent duration model, which may be generated from simulation environments. Furthermore, they must identify tasks that can result in contingencies and describe them with a probabilistic model. This model can be generated from historical data and/or real-world experiments. Lastly, for addressing the contingency, the practitioner will need to specify a task procedure to recover from a specific contingency type. To run the algorithm, we found that repeatedly approximating the best proactive task allocation for a fixed computation budget and dispatching the best tasks worked well. The computation budget required to approximate the best task allocation is directly affected by the number of contingency scenarios that can be sampled. Therefore, the practitioner must determine a suitable computational budget empirically based on the number of contingencies that can occur.

Designing robot identity: The role of voice, clothing, and task on robot gender perception

N. Dennler, M. Kian, S. Nikolaidis, and M. Mataric

International Journal of Social Robotics (IJSR), 2025.

arxiv

Abstract

Perceptions of gender are a significant aspect of human-human interaction, and gender has wide-reaching social implications for robots deployed in contexts where they are expected to interact with humans. This work explored two flexible modalities for communicating gender in robots--voice and appearance--and we studied their individual and combined influences on a robot's perceived gender. We evaluated the perception of a robot's gender through three video-based studies. First, we conducted a study (n=65) on the gender perception of robot voices by varying speaker identity and pitch. Second, we conducted a study (n=93) on the gender perception of robot clothing designed for two different tasks. Finally, building on the results of the first two studies, we completed a large integrative video-based study (n=273) involving two human-robot interaction tasks. We found that voice and clothing can be used to reliably establish a robot's perceived gender, and that combining these two modalities can have different effects on the robot's perceived gender. Taken together, these results inform the design of robot voices and clothing as individual and interacting components in the perceptions of robot gender.

Conferences ¶

ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation

E. Zhao, V. Raval, H. Zhang, J. Mao, Z. Shangguan, S. Nikolaidis, Y. Wang, and D. Seita

Conference on Robot Learning (CoRL), 2025

arxiv

Abstract

Vision-Language Models (VLMs) have revolutionized artificial intelligence and robotics due to their commonsense reasoning capabilities. In robotic manipulation, VLMs are used primarily as high-level planners, but recent work has also studied their lower-level reasoning ability, which refers to making decisions about precise robot movements. However, the community currently lacks a clear and common benchmark that can evaluate how well VLMs can aid low-level reasoning in robotics. Consequently, we propose a novel benchmark, ManipBench, to evaluate the low-level robot manipulation reasoning capabilities of VLMs across various dimensions, including how well they understand object-object interactions and deformable object manipulation. We extensively test 35 common and state-of-the-art VLM families on our benchmark, including variants to test different model sizes. The performance of VLMs significantly varies across tasks, and there is a strong correlation between this performance and trends in our real-world manipulation tasks. It also shows that there remains a significant gap between these models and human-level understanding.

Adaptively Coordinating with Novel Partners via Learned Latent Strategies

B. Li, S. Shi, L. Romero, H. Li, Y. Xie, W. Kim, S. Nikolaidis, M. Lewis, K. Sycara, and S. Stepputtis

Neural Information Processing Systems (NeurIPS), 2025

arxiv

Abstract

Adaptation is the cornerstone of effective collaboration among heterogeneous team members. In human-agent teams, artificial agents need to adapt to their human partners in real time, as individuals often have unique preferences and policies that may change dynamically throughout interactions. This becomes particularly challenging in tasks with time pressure and complex strategic spaces, where identifying partner behaviors and selecting suitable responses is difficult. In this work, we introduce a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a broad range of potential partner strategies in real-time. Our approach encodes strategies with a variational autoencoder to learn a latent strategy space from agent trajectory data, identifies distinct strategy types through clustering, and trains a cooperator agent conditioned on these clusters by generating partners of each strategy type. For online adaptation to novel partners, we leverage a fixed-share regret minimization algorithm that dynamically infers and adjusts the partner's strategy estimation during interaction. We evaluate our method in a modified version of the Overcooked domain, a complex collaborative cooking environment that requires effective coordination among two players with a diverse potential strategy space. Through these experiments and an online user study, we demonstrate that our proposed agent achieves state of the art performance compared to existing baselines when paired with novel human, and agent teammates.

Multi-objective covariance matrix adaptation map-annealing

S. Zhao, and S. Nikolaidis

Genetic and Evolutionary Computation Conference (GECCO), July 2025

ACM Digital Library arxiv

Abstract

Quality-Diversity (QD) optimization is an emerging field that focuses on finding a set of behaviorally diverse and high-quality solutions. While the quality is typically defined w.r.t. a single objective function, recent work on Multi-Objective Quality-Diversity (MOQD) extends QD optimization to simultaneously optimize multiple objective functions. This opens up multi-objective applications for QD, such as generating a diverse set of game maps that maximize difficulty, realism, or other properties. Existing MOQD algorithms use non-adaptive methods such as mutation and crossover to search for non-dominated solutions and construct an archive of Pareto Sets (PS). However, recent work in QD has demonstrated enhanced performance through the use of covariance-based evolution strategies for adaptive solution search. We propose bringing this insight into the MOQD problem, and introduce MO-CMA-MAE, a new MOQD algorithm that leverages Covariance Matrix Adaptation-Evolution Strategies (CMA-ES) to optimize the hypervolume associated with every PS within the archive. We test MO-CMA-MAE on three MOQD domains, and for generating maps of a co-operative video game, showing significant improvements in performance.

Modeling Personalized Difficulty of Rehabilitation Exercises Using Causal Trees

N. Dennler, Z. Shi, U. Yoo, S. Nikolaidis, and M. Mataric

IEEE/RAS International Conference on Rehabilitation Robotics (ICORR)

arxiv

Abstract

Rehabilitation robots are often used in game-like interactions for rehabilitation to increase a person's motivation to complete rehabilitation exercises. By adjusting exercise difficulty for a specific user throughout the exercise interaction, robots can maximize both the user's rehabilitation outcomes and the their motivation throughout the exercise. Previous approaches have assumed exercises have generic difficulty values that apply to all users equally, however, we identified that stroke survivors have varied and unique perceptions of exercise difficulty. For example, some stroke survivors found reaching vertically more difficult than reaching farther but lower while others found reaching farther more challenging than reaching vertically. In this paper, we formulate a causal tree-based method to calculate exercise difficulty based on the user's performance. We find that this approach accurately models exercise difficulty and provides a readily interpretable model of why that exercise is difficult for both users and caretakers.

Integrating field of view in human-aware collaborative planning

Y. Hsu, M. Defranco, R. Patel, and S. Nikolaidis

IEEE International Conference on Robotics and Automation (ICRA), 2025

arxiv

Abstract

In human-robot collaboration (HRC), it is crucial for robot agents to consider humans’ knowledge of their surroundings. In reality, humans possess a narrow field of view (FOV), limiting their perception. However, research on HRC often overlooks this aspect and presumes an omniscient human collaborator. Our study addresses the challenge of adapting to the evolving subtask intent of humans while accounting for their limited FOV. We integrate FOV within the human-aware probabilistic planning framework. To account for large state spaces due to considering FOV, we propose a hierarchical online planner that efficiently finds approximate solutions while enabling the robot to explore low-level action trajectories that enter the human FOV, influencing their intended subtask. Through user study with our adapted cooking domain, we demonstrate our FOV-aware planner reduces human’s interruptions and redundant actions during collaboration by adapting to human perception limitations. We extend these findings to a virtual reality kitchen environment, where we observe similar collaborative behaviors.

Moe-hair: Toward soft and compliant contact-rich hair manipulation and care

U. Yoo, N. Dennler, M. Mataric, S. Nikolaidis, J. Oh, and J. Ichnowski

ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2025

best Systems paper ﬁnalist

ACM Digital Library arxiv

Abstract

Hair-care robots have the potential to alleviate labor shortages in elderly care and enable those with limited mobility to express their identities through hair styling. In this work, we highlight two advantages that soft robotic manipulators have in hair-care applications: safety through mechanical compliance and sensing through observing deformation. To demonstrate these advantages, we introduce a soft robotic end-effector which we call Multi-finger Omnidirectional End-effector (MOE) for hair-care applications. We validate that in hair-grasping tasks, MOE exerts 74.1% less force on the head while being able to grasp a similar amount of hair compared to rigid grippers. We further demonstrate that we can reliably estimate the mesh shape of MOE during interaction with a head and that we can infer useful information about the head such as its occluded shape. The results suggest that soft robots are uniquely advantaged in hair-care tasks.

Contrastive learning from exploratory actions: Leveraging natural interactions for preference elicitation

N. Dennler, S. Nikolaidis, and M. Mataric

ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2025

best Technical paper ﬁnalist

arxiv

Abstract

People have a variety of preferences for how robots behave. To understand and reason about these preferences, robots aim to learn a reward function that describes how aligned robot behaviors are with a user's preferences. Good representations of a robot's behavior can significantly reduce the time and effort required for a user to teach the robot their preferences. Specifying these representations -- what 'features' of the robot's behavior matter to users -- remains a difficult problem; Features learned from raw data lack semantic meaning and features learned from user data require users to engage in tedious labeling processes. Our key insight is that users tasked with customizing a robot are intrinsically motivated to produce labels through exploratory search; they explore behaviors that they find interesting and ignore behaviors that are irrelevant. To harness this novel data source of exploratory actions, we propose contrastive learning from exploratory actions (CLEA) to learn trajectory features that are aligned with features that users care about. We learned CLEA features from exploratory actions users performed in an open-ended signal design activity (N=25) with a Kuri robot, and evaluated CLEA features through a second user study with a different set of users (N=42). CLEA features outperformed self-supervised features when eliciting user preferences over four metrics: completeness, simplicity, minimality, and explainability.

2024¶

Journals ¶

Selecting source tasks for transfer learning of human preferences

H. Nemlekar, N. Sivagnanadasan, S. Banga, S. Gupta K., and S. Nikolaidis

Robotics and Automation Letters (RA-L), 2024

IEEE Xplore

Abstract

We address the challenge of transferring human preferences for action selection from simpler source tasks to complex target tasks. Our goal is to enable robots to support humans proactively by predicting their actions — without requiring demonstrations of their preferred action sequences in the target task. Previous research has relied on human experts to design or select a simple source task that can be used to effectively learn and transfer human preferences to a known target. However, identifying such source tasks for new target tasks can demand substantial human effort. Thus, we focus on automating the selection of source tasks, introducing two new metrics. Our first metric selects source tasks in which human preferences can be accurately learned from demonstrations, while our second metric selects source tasks in which the learned preferences, although not as accurate, can match the preferred human actions in the target task. We evaluate our metrics in simulated tasks and two human-led assembly studies. Our results indicate that selecting high-scoring source tasks on either metric improves the accuracy of predicting human actions in the target task. Notably, tasks chosen by our second metric can be simpler than the first, sacrificing learning accuracy but preserving prediction accuracy.

Covariance matrix adaptation map-annealing: Theory and experiments

S. Zhao, B. Tjanaka, M. Fontaine, and S. Nikolaidis

ACM Transactions on Evolutionary Learning and Optimization (TELO), 2024

ACM Digital Library

Abstract

Single-objective optimization algorithms search for the single highest-quality solution with respect to an objective. Quality diversity (QD) optimization algorithms, such as Covariance Matrix Adaptation MAP-Elites (CMA-ME), search for a collection of solutions that are both high-quality with respect to an objective and diverse with respect to specified measure functions. However, CMA-ME suffers from three major limitations highlighted by the QD community: prematurely abandoning the objective in favor of exploration, struggling to explore flat objectives, and having poor performance for low-resolution archives. We propose a new quality diversity algorithm, Covariance Matrix Adaptation MAP-Annealing (CMA-MAE), and its differentiable quality diversity variant, Covariance Matrix Adaptation MAP-Annealing via a Gradient Arborescence (CMA-MAEGA), that address all three limitations. We provide theoretical justifications for the new algorithm with respect to each limitation. Our theory informs our experiments, which support the theory and show that CMA-MAE achieves state-of-the-art performance and robustness on standard QD benchmark and reinforcement learning domains.

Conferences ¶

Improving user experience in preference-based optimization of reward functions for assistive robots

N. Dennler, Z. Shi, S. Nikolaidis, and M. Mataric

International Symposium of Robotics Research (ISRR), 2024

arxiv

Abstract

Assistive robots interact with humans and must adapt to different users' preferences to be effective. An easy and effective technique to learn non-expert users' preferences is through rankings of robot behaviors, for example, robot movement trajectories or gestures. Existing techniques focus on generating trajectories for users to rank that maximize the outcome of the preference learning process. However, the generated trajectories do not appear to reflect the user's preference over repeated interactions. In this work, we design an algorithm to generate trajectories for users to rank that we call Covariance Matrix Adaptation Evolution Strategies with Information Gain (CMA-ES-IG). CMA-ES-IG prioritizes the user's experience of the preference learning process. We show that users find our algorithm more intuitive and easier to use than previous approaches across both physical and social robot tasks.

Gpt-fabric: Smoothing and folding fabric by leveraging pre-trained foundation models

V. Raval, E. Zhao, H. Zhang, S. Nikolaidis, and D. Seita

International Symposium of Robotics Research (ISRR), 2024

arxiv

Abstract

Fabric manipulation has applications in folding blankets, handling patient clothing, and protecting items with covers. It is challenging for robots to perform fabric manipulation since fabrics have infinite-dimensional configuration spaces, complex dynamics, and may be in folded or crumpled configurations with severe self-occlusions. Prior work on robotic fabric manipulation relies either on heavily engineered setups or learning-based approaches that create and train on robot-fabric interaction data. In this paper, we propose GPT-Fabric for the canonical tasks of fabric smoothing and folding, where GPT directly outputs an action informing a robot where to grasp and pull a fabric. We perform extensive experiments in simulation to test GPT-Fabric against prior methods for smoothing and folding. GPT-Fabric matches the state-of-the-art in fabric smoothing, and also achieves comparable performance with most prior fabric folding methods tested, even without explicitly training on a fabric-specific dataset (i.e., zero-shot manipulation). Furthermore, we apply GPT-Fabric in physical experiments over 10 smoothing and 12 folding rollouts. Our results suggest that GPT-Fabric is a promising approach for high-precision fabric manipulation tasks.

Diva: Training adaptive agents in open-ended simulators via semi-supervised environment design

R. Costales and S. Nikolaidis

Neural Information Processing Systems (NeurIPS), 2024

Signal temporal logic-guided apprenticeship learning

A. Puranic, J. Deshmukh, and S. Nikolaidis

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

IEEE Xplore arxiv

Abstract

Apprenticeship learning crucially depends on effectively learning rewards , and hence control policies from user demonstrations. Of particular difficulty is the setting where the desired task consists of a number of sub-goals with temporal dependencies. The quality of inferred rewards and hence policies are typically limited by the quality of demonstrations, and poor inference of these can lead to undesirable outcomes. In this paper, we show how temporal logic specifications that describe high level task objectives, are encoded in a graph to define a temporal-based metric that reasons about behaviors of demonstrators and the learner agent to improve the quality of inferred rewards and policies. Through experiments on a diverse set of robot manipulator simulations, we show how our framework overcomes the drawbacks of prior literature by drastically improving the number of demonstrations required to learn a control policy.

BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

T. Huang, N. R. Sontakke, N. Kannabiran, I. Essa, S. Nikolaidis, D. Hong, and S. Ha

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

arxiv

Abstract

Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automating parameter range selection using real-world experience. While effective, these algorithms often require long computation time, as a new policy is trained from scratch every iteration. In this work, we propose Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune), which inherits the spirit of BayRn but aims to significantly accelerate the learning processes by fine-tuning from previously learned policy. This idea leads to a critical question: which previous policy should we use as a prior during fine-tuning? We investigated four different fine-tuning strategies and compared them against baseline algorithms in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments. Our analysis demonstrates that our method yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR.

Preference elicitation and incorporation for human-robot task scheduling

N. Dhanaraj, M. Jeon, J. H. Kang, S. Nikolaidis, and S. K. Gupta

IEEE 20th International Conference on Automation Science and Engineering (CASE), 2024

IEEE Xplore

Abstract

In this work, we address the challenge of incorporating human preferences into the task-scheduling process for human-robot teams. Humans have various individual preferences that can be influenced by context and situational information. Incorporating these preferences can lead to improved team performance. Our main contribution is a framework that helps elicit and incorporate preferences during task scheduling. We achieve this by proposing 1) a constraint programming method to generate a range of plans, 2) an intelligent approach for selecting and presenting task schedules based on task features, and 3) a preference incorporation method that uses large language models to convert preferences into soft constraints. Our results demonstrate that we can efficiently generate diverse plans for preference elicitation and incorporate them into the task-scheduling process. We evaluate our framework using an assembly-inspired case study and show how it can effectively incorporate complex and realistic preferences. Our implementation can be found at github.com/RROS-Lab/Human-Robot-Preference-Planning.

Selecting source tasks for transfer learning of human preferences

H. Nemlekar, N. Sivagnanadasan, S. Banga, N. Dhanaraj, S. K. Gupta, and S. Nikolaidis

IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2024

Abstract

Multi-robot task allocation under uncertainty via hindsight optimization

N. Dhanaraj, J. H. Kang, A. Mukherjee, H. Nemlekar, S. Nikolaidis, and S. K. Gupta

International Conference on Robotics and Automation (ICRA), 2024

IEEE Xplore

Abstract

Multi-robot systems are becoming increasingly prevalent in various real-world applications, such as manufacturing and warehouse logistics. These systems face complex challenges in 1) task allocation due to factors like time-extended tasks, and agent specialization, and 2) uncertainties in task execution. Potential task failures can add further contingency tasks to recover from the failure, thereby causing delays. This paper addresses the problem of Multi-Robot Task Allocation under Uncertainty by proposing a hierarchical approach that decouples the problem into two levels. We use a low-level optimization formulation to find the optimal solution for a deterministic multi-robot task allocation problem with known task outcomes. The higher-level search intelligently generates more likely combinations of failures and calls the inner-level search repeatedly to find the optimal task allocation sequence, given the known outcomes. We validate our results in simulation for a manufacturing domain and demonstrate that our method can reduce the effect of potential delays from contingencies. We show that our algorithm is computationally efficient while improving average makespan compared to other baselines.

Guidance graph optimization for lifelong multi-agent path ﬁnding

Y. Zhang, H. Jian, V. Bhatt, S. Nikolaidis, and J. Li

The International Joint Conference on Artiﬁcial Intelligence (IJCAI), August 2024

arxiv

Abstract

We study how to use guidance to improve the throughput of lifelong Multi-Agent Path Finding (MAPF). Previous studies have demonstrated that while incorporating guidance, such as highways, can accelerate MAPF algorithms, this often results in a trade-off with solution quality. In addition, how to generate good guidance automatically remains largely unexplored, with current methods falling short of surpassing manually designed ones. In this work, we introduce the directed guidance graph as a versatile representation of guidance for lifelong MAPF, framing Guidance Graph Optimization (GGO) as the task of optimizing its edge weights. We present two GGO algorithms to automatically generate guidance for arbitrary lifelong MAPF algorithms and maps. The first method directly solves GGO by employing CMA-ES, a black-box optimization algorithm. The second method, PIU, optimizes an update model capable of generating guidance, demonstrating the ability to transfer optimized guidance graphs to larger maps with similar layouts. Empirically, we show that (1) our guidance graphs improve the throughput of three representative lifelong MAPF algorithms in four benchmark maps, and (2) our update model can generate guidance graphs for as large as 93 x 91 maps and as many as 3000 agents.

Density descent for diversity optimization

D. Lee, A. Palaparthi, M. Fontaine, and S. Nikolaidis

Genetic and Evolutionary Computation Conference (GECCO), July 2024

arxiv

Abstract

Diversity optimization seeks to discover a set of solutions that elicit diverse features. Prior work has proposed Novelty Search (NS), which, given a current set of solutions, seeks to expand the set by finding points in areas of low density in the feature space. However, to estimate density, NS relies on a heuristic that considers the k-nearest neighbors of the search point in the feature space, which yields a weaker stability guarantee. We propose Density Descent Search (DDS), an algorithm that explores the feature space via gradient descent on a continuous density estimate of the feature space that also provides stronger stability guarantee. We experiment with DDS and two density estimation methods: kernel density estimation (KDE) and continuous normalizing flow (CNF). On several standard diversity optimization benchmarks, DDS outperforms NS, the recently proposed MAP-Annealing algorithm, and other state-of-the-art baselines. Additionally, we prove that DDS with KDE provides stronger stability guarantees than NS, making it more suitable for adaptive optimizers. Furthermore, we prove that NS is a special case of DDS that descends a KDE of the feature space.

Proximal policy gradient arborescence for quality diversity reinforcement learning

S. Batra, B. Tjanaka, M. Fontaine, A. Petrenko, S. Nikolaidis, and G. Sukhatme

International Conference on Learning Representations (ICLR), 2024

ICLR 2024 Spotlight Presentation

arxiv

Abstract

Training generally capable agents that perform well in unseen dynamic environments is a long-term goal of robot learning. Quality Diversity Reinforcement Learning (QD-RL) is an emerging class of reinforcement learning (RL) algorithms that blend insights from Quality Diversity (QD) and RL to produce a collection of high performing and behaviorally diverse policies with respect to a behavioral embedding. Existing QD-RL approaches have thus far taken advantage of sample-efficient off-policy RL algorithms. However, recent advances in high-throughput, massively parallelized robotic simulators have opened the door for algorithms that can take advantage of such parallelism, and it is unclear how to scale existing off-policy QD-RL methods to these new data-rich regimes. In this work, we take the first steps to combine on-policy RL methods, specifically Proximal Policy Optimization (PPO), that can leverage massive parallelism, with QD, and propose a new QD-RL method with these high-throughput simulators and on-policy training in mind. Our proposed Proximal Policy Gradient Arborescence (PPGA) algorithm yields a 4x improvement over baselines on the challenging humanoid domain.

Quality-diversity generative sampling for learning with synthetic data

A. Chang, M. Fontaine, B. Serena, M. Mataric, and S. Nikolaidis

AAAI Conference on Artiﬁcial Intelligence (AAAI), 2024

PDF arxiv

Abstract

Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating synthetic training datasets. We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space, despite the data coming from a biased generator. QDGS is a model-agnostic framework that uses prompt guidance to optimize a quality objective across measures of diversity for synthetically generated data, without fine-tuning the generative model. Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks. Code available at: github.com/Cylumn/qd-generative-sampling

2023¶

Journals ¶

A Metric for Characterizing the Arm Nonuse Workspace in Poststroke Individuals Using a Robot Arm

N. Dennler, A. Cain, E. D. Guzmann, C. Chiu, C. J. Winstein, S. Nikolaidis, M. J. Matarić

Science Robotics, 2023

Science Robotics

Abstract

An overreliance on the less-affected limb for functional tasks at the expense of the paretic limb and in spite of recovered capacity is an often-observed phenomenon in survivors of hemispheric stroke. The difference between capacity for use and actual spontaneous use is referred to as arm nonuse. Obtaining an ecologically valid evaluation of arm nonuse is challenging because it requires the observation of spontaneous arm choice for different tasks, which can easily be influenced by instructions, presumed expectations, and awareness that one is being tested. To better quantify arm nonuse, we developed the bimanual arm reaching test with a robot (BARTR) for quantitatively assessing arm nonuse in chronic stroke survivors. The BARTR is an instrument that uses a robot arm as a means of remote and unbiased data collection of nuanced spatial data for clinical evaluations of arm nonuse. This approach shows promise for determining the efficacy of interventions designed to reduce paretic arm nonuse and enhance functional recovery after stroke. We show that the BARTR satisfies the criteria of an appropriate metric for neurorehabilitative contexts: It is valid, reliable, and simple to use. Interacting with a robot can quantify otherwise hard-to-measure clinical metrics.

Multi-robot geometric task-and-motion planning for collaborative manipulation tasks

H. Zhang, S.-H. Chan, J. Zhong, J. Li, P. Kolapo, S. Koenig, Z. Agioutantis, S. Schafrik, and S. Nikolaidis

Autonomous Robots (AURO)

Springer arxiv

Abstract

We address multi-robot geometric task-and-motion planning (MR-GTAMP) problems in synchronous, monotone setups. The goal of the MR-GTAMP problem is to move objects with multiple robots to goal regions in the presence of other movable objects. We focus on collaborative manipulation tasks where the robots have to adopt intelligent collaboration strategies to be successful and effective, i.e., decide which robot should move which objects to which positions, and perform collaborative actions, such as handovers. To endow robots with these collaboration capabilities, we propose to first collect occlusion and reachability information for each robot by calling motion-planning algorithms. We then propose a method that uses the collected information to build a graph structure which captures the precedence of the manipulations of different objects and supports the implementation of a mixed-integer program to guide the search for highly effective collaborative task-and-motion plans. The search process for collaborative task-and-motion plans is based on a Monte-Carlo Tree Search (MCTS) exploration strategy to achieve exploration-exploitation balance. We evaluate our framework in two challenging MR-GTAMP domains and show that it outperforms two state-of-the-art baselines with respect to the planning time, the resulting plan length and the number of objects moved. We also show that our framework can be applied to underground mining operations where a robotic arm needs to coordinate with an autonomous roof bolter. We demonstrate plan execution in two roof-bolting scenarios both in simulation and on robots.

Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing

B. Tjanaka, M. C. Fontaine, D. H. Lee, A. Kalkar, S. Nikolaidis

Robotics and Automation Letters (RA-L), 2023

IEEE Xplore arXiv Website

Abstract

Pre-training a diverse set of neural network controllers in simulation has enabled robots to adapt online to damage in robot locomotion tasks. However, finding diverse, high-performing controllers requires expensive network training and extensive tuning of a large number of hyperparameters. On the other hand, Covariance Matrix Adaptation MAP-Annealing (CMA-MAE), an evolution strategies (ES)-based quality diversity algorithm, does not have these limitations and has achieved state-of-the-art performance on standard QD benchmarks. However, CMA-MAE cannot scale to modern neural network controllers due to its quadratic complexity. We leverage efficient approximation methods in ES to propose three new CMA-MAE variants that scale to high dimensions. Our experiments show that the variants outperform ES-based baselines in benchmark robotic locomotion tasks, while being comparable with or exceeding state-of-the-art deep reinforcement learning-based quality diversity algorithms.

Conferences ¶

Arbitrarily Scalable Environment Generators via Neural Cellular Automata

Y. Zhang, M. C. Fontaine, V. Bhatt, S. Nikolaidis, and J. Li

Conference on Neural Information Processing Systems (NeurIPS), 2023

NeurIPS arxiv

Abstract

We study the problem of generating arbitrarily large environments to improve the throughput of multi-robot systems. Prior work proposes Quality Diversity (QD) algorithms as an effective method for optimizing the environments of automated warehouses. However, these approaches optimize only relatively small environments, falling short when it comes to replicating real-world warehouse sizes. The challenge arises from the exponential increase in the search space as the environment size increases. Additionally, the previous methods have only been tested with up to 350 robots in simulations, while practical warehouses could host thousands of robots. In this paper, instead of optimizing environments, we propose to optimize Neural Cellular Automata (NCA) environment generators via QD algorithms. We train a collection of NCA generators with QD algorithms in small environments and then generate arbitrarily large environments from the generators at test time. We show that NCA environment generators maintain consistent, regularized patterns regardless of environment size, significantly enhancing the scalability of multi-robot systems in two different domains with up to 2,350 robots. Additionally, we demonstrate that our method scales a single-agent reinforcement learning policy to arbitrarily large environments with similar patterns. We include the source code at https://github. com/lunjohnzhang/warehouseenvgenncapublic.

The RoSiD Tool: Empowering Users to Design Multimodal Signals for Human-robot Collaboration

N. Dennler, D. Delgado, D. Zeng, S. Nikolaidis, and M. Mataric

18th International Symposium on Experimental Robotics (ISER), November 2023

arxiv

Abstract

Robots that cooperate with humans must be effective at communicating with them. However, people have varied preferences for communication based on many contextual factors, such as culture, environment, and past experience. To communicate effectively, robots must take those factors into consideration. In this work, we present the Robot Signal Design (RoSiD) tool to empower people to easily self-specify communicative preferences for collaborative robots. We show through a participatory design study that the RoSiD tool enables users to create signals that align with their communicative preferences, and we illuminate how this tool can be further improved.

Surrogate Assisted Generation of Human-Robot Interaction Scenarios

V. Bhatt, H. Nemlekar, M. C. Fontaine, B. Tjanaka, H. Zhang, Y. Hsu, and S. Nikolaidis

Conference on Robot Learning, November 2023

CoRL 2023 Oral

arXiv

Abstract

As human-robot interaction (HRI) systems advance, so does the difficulty of evaluating and understanding the strengths and limitations of these systems in different environments and with different users. To this end, previous methods have algorithmically generated diverse scenarios that reveal system failures in a shared control teleoperation task. However, these methods require directly evaluating generated scenarios by simulating robot policies and human actions. The computational cost of these evaluations limits their applicability in more complex domains. Thus, we propose augmenting scenario generation systems with surrogate models that predict both human and robot behaviors. In the shared control teleoperation domain and a more complex shared workspace collaboration task, we show that surrogate assisted scenario generation efficiently synthesizes diverse datasets of challenging scenarios. We demonstrate that these failures are reproducible in real-world interactions.

Multi-robot coordination and layout design for automated warehousing

Y. Zhang, M. Fontaine, V. Bhatt, S. Nikolaidis, J. Li

The International Joint Conference on Artificial Intelligence (IJCAI), August 2023

arXiv GitHub

Abstract

With the rapid progress in Multi-Agent Path Finding (MAPF), researchers have studied how MAPF algorithms can be deployed to coordinate hundreds of robots in large automated warehouses. While most works try to improve the throughput of such warehouses by developing better MAPF algorithms, we focus on improving the throughput by optimizing the warehouse layout. We show that, even with state-of-the-art MAPF algorithms, commonly used human-designed layouts can lead to congestion for warehouses with large numbers of robots and thus have limited scalability. We extend existing automatic scenario generation methods to optimize warehouse layouts. Results show that our optimized warehouse layouts (1) reduce traffc congestion and thus improve throughput, (2) improve the scalability of the automated warehouses by doubling the number of robots in some cases, and (3) are capable of generating layouts with userspecifed diversity measures.

Covariance matrix adaptation map-annealing

M. C. Fontaine and S. Nikolaidis

Genetic and Evolutionary Computation Conference (GECCO), July 2023

Best Evolutionary Machine Learning Paper Award

ACM Digital Library GitHub

Abstract

Single-objective optimization algorithms search for the single highest-quality solution with respect to an objective. Quality diversity (QD) optimization algorithms, such as Covariance Matrix Adaptation MAP-Elites (CMA-ME), search for a collection of solutions that are both high-quality with respect to an objective and diverse with respect to specified measure functions. However, CMA-ME suffers from three major limitations highlighted by the QD community: prematurely abandoning the objective in favor of exploration, struggling to explore flat objectives, and having poor performance for low-resolution archives. We propose a new quality diversity algorithm, Covariance Matrix Adaptation MAP-Annealing (CMA-MAE), that addresses all three limitations. We provide theoretical justifications for the new algorithm with respect to each limitation. Our theory informs our experiments, which support the theory and show that CMA-MAE achieves state-of-the-art performance and robustness.

pyribs: A Bare-Bones Python Library for Quality Diversity Optimization

B. Tjanaka, M. C. Fontaine, D. H. Lee, Y. Zhang, N. R. Balam, N. Dennler, S. S. Garlanka, N. D. Klapsis, S. Nikolaidis

Genetic and Evolutionary Computation Conference (GECCO), July 2023

ACM Digital Library arXiv GitHub Website

Abstract

Recent years have seen a rise in the popularity of quality diversity (QD) optimization, a branch of optimization that seeks to find a collection of diverse, high-performing solutions to a given problem. To grow further, we believe the QD community faces two challenges: developing a framework to represent the field's growing array of algorithms, and implementing that framework in software that supports a range of researchers and practitioners. To address these challenges, we have developed pyribs, a library built on a highly modular conceptual QD framework. By replacing components in the conceptual framework, and hence in pyribs, users can compose algorithms from across the QD literature; equally important, they can identify unexplored algorithm variations. Furthermore, pyribs makes this framework simple, flexible, and accessible, with a user-friendly API supported by extensive documentation and tutorials. This paper overviews the creation of pyribs, focusing on the conceptual framework that it implements and the design principles that have guided the library's development.

Pato: Policy assisted teleoperation for scalable robot data collection

S. Dass, K. Pertsch, H. Zhang, Y. Lee, J. Lim, and S. Nikolaidis

Robotics: Science and Systems (RSS), July 2023

arXiv Website

Abstract

Abstract—Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research. However, collecting large-scale robotic data is much more expensive and slower as each operator can control only a single robot at a time. To make this costly data collection process efficient and scalable, we propose Policy Assisted TeleOperation (PATO), a system which automates part of the demonstration collection process using a learned assistive policy. PATO autonomously executes repetitive behaviors in data collection and asks for human input only when it is uncertain about which subtask or behavior to execute. We conduct teleoperation user studies both with a real robot and a simulated robot fleet and demonstrate that our assisted teleoperation system reduces human operators’ mental load while improving data collection efficiency. Further, it enables a single operator to control multiple robots in parallel, which is a first step towards scalable robotic data collection.

Inverse reinforcement learning framework for transferring task sequencing policies from humans to robots in manufacturing applications

O. Manyar, Z. McNulty, S. Nikolaidis, and S. Gupta

Proceedings of the International Conference on Robotics and Automation (ICRA), May 2023

IEEE Xplore GitHub

Abstract

In this work, we present an inverse reinforcement learning approach for solving the problem of task sequencing for robots in complex manufacturing processes. Our proposed framework is adaptable to variations in process and can perform sequencing for entirely new parts. We prescribe an approach to capture feature interactions in a demonstration dataset based on a metric that computes feature interaction coverage. We then actively learn the expert's policy by keeping the expert in the loop. Our training and testing results reveal that our model can successfully learn the expert's policy. We demonstrate the performance of our method on a real-world manufacturing application where we transfer the policy for task sequencing to a manipulator. Our experiments show that the robot can perform these tasks to produce human-competitive performance. Code and video can be found at: https://sites.google.com/usc.edu/irlfortasksequencing

Contingency-aware task assignment and scheduling for human-robot teams

N. Dhanaraj, S. Narayan, S. Nikolaidis, and S. Gupta

Proceedings of the International Conference on Robotics and Automation (ICRA), May 2023

IEEE Xplore

Abstract

We consider the problem of task assignment and scheduling for human-robot teams to enable the efficient completion of complex problems, such as satellite assembly. In high-mix, low volume settings, we must enable the human-robot team to handle uncertainty due to changing task requirements, potential failures, and delays to maintain task completion efficiency. We make two contributions: (1) we account for the complex interaction of uncertainty that stems from the tasks and the agents using a multi-agent concurrent MDP framework, and (2) we use Mixed Integer Linear Programs and contingency sampling to approximate action values for task assignment. Our results show that our online algorithm is computationally efficient while making optimal task assignments compared to a value iteration baseline. We evaluate our method on a 24-task representative assembly and a real-world 60-task satellite assembly, and we show that we can find an assignment that results in a near-optimal makespan.

Transfer learning of human preferences for proactive robot assistance in assembly tasks

H. Nemlekar, A. Guan, N. Dhanaraj, S. Gupta, and S. Nikolaidis

Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2023

Best Systems Paper Award Finalist

ACM Digital Library

Abstract

We focus on enabling robots to proactively assist humans in assembly tasks by adapting to their preferred sequence of actions. Much work on robot adaptation requires human demonstrations of the task. However, human demonstrations of real-world assemblies can be tedious and time-consuming. Thus, we propose learning human preferences from demonstrations in a shorter, canonical task to predict user actions in the actual assembly task. The proposed system uses the preference model learned from the canonical task as a prior and updates the model through interaction when predictions are inaccurate. We evaluate the proposed system in simulated assembly tasks and in a real-world human-robot assembly study and we show that both transferring the preference model from the canonical task, as well as updating the model online, contribute to improved accuracy in human action prediction. This enables the robot to proactively assist users, significantly reduce their idle time, and improve their experience working with the robot, compared to a reactive robot.

2022¶

Journals ¶

Learning performance graphs from demonstrations via task-based evaluations

A. Puranic, J. Deshmukh, and S. Nikolaidis

Robotics and Automation Letters (RA-L), 2022

IEEE Xplore arXiv

Abstract

In the paradigm of robot learning-from-demonstra tions (LfD), understanding and evaluating the demonstrated behaviors plays a critical role in extracting control policies for robots. Without this knowledge, a robot may infer incorrect reward functions that lead to undesirable or unsafe control policies. Prior work has used temporal logic specifications, manually ranked by human experts based on their importance, to learn reward functions from imperfect/suboptimal demonstrations. To overcome reliance on expert rankings, we propose a novel algorithm that learns from demonstrations, a partial ordering of provided specifications in the form of a performance graph. Through various experiments, including simulation of industrial mobile robots, we show that extracting reward functions with the learned graph results in robot policies similar to those generated with the manually specified orderings. We also show in a user study that the learned orderings match the orderings or rankings by participants for demonstrations in a simulated driving domain. These results show that we can accurately evaluate demonstrations with respect to provided task specifications from a small set of imperfect data with minimal expert input.

Evaluating Human-Robot Interaction Algorithms in Shared Autonomy via Quality Diversity Scenario Generation

M. C. Fontaine, S. Nikolaidis

ACM Transactions on Human-Robot Interaction, 2022

ACM Digital Library

Abstract

The growth of scale and complexity of interactions between humans and robots highlights the need for new computational methods to automatically evaluate novel algorithms and applications. Exploring diverse scenarios of humans and robots interacting in simulation can improve understanding of the robotic system and avoid potentially costly failures in real-world settings. We formulate this problem as a quality diversity (QD) problem, where the goal is to discover diverse failure scenarios by simultaneously exploring both environments and human actions. We focus on the shared autonomy domain, where the robot attempts to infer the goal of a human operator, and adopt the QD algorithms CMA-ME and MAP-Elites to generate scenarios for two published algorithms in this domain: shared autonomy via hindsight optimization and linear policy blending. Some of the generated scenarios confirm previous theoretical findings, while others are surprising and bring about a new understanding of state-of-the-art implementations. Our experiments show that the QD algorithms CMA-ME and MAP-Elites outperform Monte-Carlo simulation and optimization based methods in effectively searching the scenario space, highlighting their promise for automatic evaluation of algorithms in human-robot interaction.

Using Design Metaphors to Understand User Expectations of Socially Interactive Robot Embodiments

N. Dennler, C. Ruan, J. Hadiwijoyo, B. Chen, S. Nikolaidis, M. Mataric

ACM Transactions on Human-Robot Interaction, 2022

arXiv

Abstract

The physical design of a robot suggests expectations of that robot's functionality for human users and collaborators. When those expectations align with the true capabilities of the robot, interaction with the robot is enhanced. However, misalignment of those expectations can result in an unsatisfying interaction. This paper uses Mechanical Turk to evaluate user expectation through the use of design metaphors as applied to a wide range of robot embodiments. The first study (N=382) associates crowd-sourced design metaphors to different robot embodiments. The second study (N=803) assesses initial social expectations of robot embodiments. The final study (N=805) addresses the degree of abstraction of the design metaphors and the functional expectations projected on robot embodiments. Together, these results can guide robot designers toward aligning user expectations with true robot capabilities, facilitating positive human-robot interaction.

Preference-Driven Texture Modeling Through Interactive Generation and Search

S. Lu, M. Zheng, M. C. Fontaine, S. Nikolaidis, H. Culbertson

IEEE Transactions on Haptics, 2022

Best Paper Award Finalist

IEEE Xplore

Abstract

Data-driven texture modeling and rendering has pushed the limit of realism in haptics. However, the lack of haptic texture databases, difficulties of model interpolation and expansion, and the complexity of real textures prevent data-driven methods from capturing a large variety of textures and from customizing models to suit specific output hardware or user needs. This work proposes an interactive texture generation and search framework driven by user input. We design a GAN-based texture model generator, which can create a wide range of texture models using Auto-Regressive processes. Our interactive texture search method, which we call preference-driven, follows an evolutionary strategy given guidance from user's preferred feedback within a set of generated texture models. We implemented this framework on a 3D haptic device and conducted a two-phase user study to evaluate the efficiency and accuracy of our method for previously unmodeled textures. The results showed that by comparing the feel of real and generated virtual textures, users can follow an evolutionary process to efficiently find a virtual texture model that matched or exceeded the realism of a data-driven model. Furthermore, for 4 out of 5 real textures, 80% of the preference-driven models from participants were rated comparable to the data-driven models.

Conferences ¶

Deep Surrogate Assisted Generation of Environments

V. Bhatt*, B. Tjanaka*, M. C. Fontaine*, S. Nikolaidis

Neural Information Processing Systems (NeurIPS), November 2022

arXiv GitHub Supplemental Website

A mip-based approach for multi-robot geometric task-and-motion planning

H. Zhang, S.-H. Chan, J. Zhong, J. Li, S. Koenig, S. Nikolaidis

Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE), 2022

Towards transferring human preferences from canonical to actual assembly tasks

H. Nemlekar, R. Guan, G. Luo, S. Gupta, S. Nikolaidis

Proceedings of the IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), 2022

IEEE Xplore arXiv

Abstract

To assist human users according to their individual preference in assembly tasks, robots typically require user demonstrations in the given task. However, providing demonstrations in actual assembly tasks can be tedious and time-consuming. Our thesis is that we can learn user preferences in assembly tasks from demonstrations in a representative canonical task. Inspired by previous work in economy of human movement, we propose to represent user preferences as a linear function of abstract task-agnostic features, such as movement and physical and mental effort required by the user. For each user, we learn their preference from demonstrations in a canonical task and use the learned preference to anticipate their actions in the actual assembly task without any user demonstrations in the actual task. We evaluate our proposed method in a model-airplane assembly study and show that preferences can be effectively transferred from canonical to actual assembly tasks, enabling robots to anticipate user actions.

Human-guided goal assignment to effectively manage workload for a smart robotic assistant

N. Dhanaraj, R. Malhan, H. Nemlekar, S. Nikolaidis, and S. Gupta

Proceedings of the IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), 2022

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

B. Tjanaka, M. C. Fontaine, J. Togelius, S. Nikolaidis

Genetic and Evolutionary Computation Conference, 2022

Website ACM Digital Library

Abstract

Consider a walking agent that must adapt to damage. To approach this task, we can train a collection of policies and have the agent select a suitable policy when damaged. Training this collection may be viewed as a quality diversity (QD) optimization problem, where we search for solutions (policies) which maximize an objective (walking forward) while spanning a set of measures (measurable characteristics). Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available for the objective and measures. However, such gradients are typically unavailable in RL settings due to non-differentiable environments. To apply DQD in RL settings, we propose to approximate objective and measure gradients with evolution strategies and actor-critic methods. We develop two variants of the DQD algorithm CMA-MEGA, each with different gradient approximations, and evaluate them on four simulated walking tasks. One variant achieves comparable performance (QD score) with the state-of-the-art PGA-MAP-Elites in two tasks. The other variant performs comparably in all tasks but is less efficient than PGA-MAP-Elites in two tasks. These results provide insight into the limitations of CMA-MEGA in domains that require rigorous optimization of the objective and where exact gradients are unavailable.

Deep Surrogate Assisted MAP-Elites for Automated Hearthstone Deckbuilding

Y. Zhang, M. C. Fontaine, A. Hoover, S. Nikolaidis

Genetic and Evolutionary Computation Conference, 2022

ACM Digital Library arXiv

Abstract

We study the problem of efficiently generating high-quality and diverse content in games. Previous work on automated deckbuilding in Hearthstone shows that the quality diversity algorithm MAP-Elites can generate a collection of high-performing decks with diverse strategic gameplay. However, MAP-Elites requires a large number of expensive evaluations to discover a diverse collection of decks. We propose assisting MAP-Elites with a deep surrogate model trained online to predict game outcomes with respect to candidate decks. MAP-Elites discovers a diverse dataset to improve the surrogate model accuracy, while the surrogate model helps guide MAP-Elites towards promising new content. In a Hearthstone deckbuilding case study, we show that our approach improves the sample efficiency of MAP-Elites and outperforms a model trained offline with random decks, as well as a linear surrogate model baseline, setting a new state-of-the-art for quality diversity approaches in automated Hearthstone deckbuilding. We include the source code for all the experiments at: https://github.com/icaros-usc/EvoStone2.

Illuminating Diverse Neural Cellular Automata for Level Generation

S. Earle, J. Snider, M. Fontaine, S. Nikolaidis, J. Togelius

Genetic and Evolutionary Computation Conference, 2022

ACM Digital Library arXiv

Abstract

We present a method of generating diverse collections of neural cellular automata (NCA) to design video game levels. While NCAs have so far only been trained via supervised learning, we present a quality diversity (QD) approach to generating a collection of NCA level generators. By framing the problem as a QD problem, our approach can train diverse level generators, whose output levels vary based on aesthetic or functional criteria. To efficiently generate NCAs, we train generators via Covariance Matrix Adaptation MAP-Elites (CMA-ME), a quality diversity algorithm which specializes in continuous search spaces. We apply our new method to generate level generators for several 2D tile-based games: a maze game, Sokoban, and Zelda. Our results show that CMA-ME can generate small NCAs that are diverse yet capable, often satisfying complex solvability criteria for deterministic agents. We compare against a Compositional Pattern-Producing Network (CPPN) baseline trained to produce diverse collections of generators and show that the NCA representation yields a better exploration of level-space.

2021¶

Journals ¶

Autonomy in Physical Human-Robot Interaction: A Brief Survey

M. Selvaggio, M. Cognetti, S. Nikolaidis, S. Ivaldi, B. Siciliano

IEEE Robotics and Automation Letters, 2021

IEEE Xplore

Abstract

Sharing the control of a robotic system with an autonomous controller allows a human to reduce his/her cognitive and physical workload during the execution of a task. In recent years, the development of inference and learning techniques has widened the spectrum of applications of shared control (SC) approaches, leading to robotic systems that are capable of seamless adaptation of their autonomy level. In this perspective, shared autonomy (SA) can be defined as the design paradigm that enables this adapting behavior of the robotic system. This letter collects the latest results achieved by the research community in the field of SC and SA with special emphasis on physical human-robot interaction (pHRI). Architectures and methods developed for SC and SA are discussed throughout the letter, highlighting the key aspects of each methodology. A discussion about open issues concludes this letter.

Learning From Demonstrations Using Signal Temporal Logic in Stochastic and Continuous Domains

A. Gopinath Puranic, J. V. Deshmukh, S. Nikolaidis

IEEE Robotics and Automation Letters, 2021

IEEE Xplore

Abstract

Learning control policies that are safe, robust and interpretable are prominent challenges in developing robotic systems. Learning-from-demonstrations with formal logic is an arising paradigm in reinforcement learning to estimate rewards and extract robot control policies that seek to overcome these challenges. In this approach, we assume that mission-level specifications for the robotic system are expressed in a suitable temporal logic such as Signal Temporal Logic (STL). The main idea is to automatically infer rewards from user demonstrations (that could be suboptimal or incomplete) by evaluating and ranking them w.r.t. the given STL specifications. In contrast to existing work that focuses on deterministic environments and discrete state spaces, in this letter, we propose significant extensions that tackle stochastic environments and continuous state spaces.

Conferences ¶

Differentiable Quality Diversity

M. C. Fontaine, S. Nikolaidis

Advances in Neural Information Processing Systems, 2021

NeurIPS 2021 Oral

NeurIPS arXiv GitHub

Abstract

Quality diversity (QD) is a growing branch of stochastic optimization research that studies the problem of generating an archive of solutions that maximize a given objective function but are also diverse with respect to a set of specified measure functions. However, even when these functions are differentiable, QD algorithms treat them as "black boxes", ignoring gradient information. We present the differentiable quality diversity (DQD) problem, a special case of QD, where both the objective and measure functions are first order differentiable. We then present MAP-Elites via Gradient Arborescence (MEGA), a DQD algorithm that leverages gradient information to efficiently explore the joint range of the objective and measure functions. Results in two QD benchmark domains and in searching the latent space of a StyleGAN show that MEGA significantly outperforms state-of-the-art QD algorithms, highlighting DQD's promise for efficient quality diversity optimization when gradient information is available.

Design and Evaluation of a Hair Combing System Using a General-Purpose Robotic Arm

N. Dennler, E. Shin, M. Matarić, S. Nikolaidis

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

IEEE Xplore arXiv

Abstract

This work introduces an approach for automatic hair combing by a lightweight robot. For people living with limited mobility, dexterity, or chronic fatigue, combing hair is often a difficult task that negatively impacts personal routines. We propose a modular system for enabling general robot manipulators to assist with a hair-combing task. The system consists of three main components. The first component is the segmentation module, which segments the location of hair in space. The second component is the path planning module that proposes automatically-generated paths through hair based on user input. The final component creates a trajectory for the robot to execute. We quantitatively evaluate the effectiveness of the paths planned by the system with 48 users and qualitatively evaluate the system with 30 users watching videos of the robot performing a hair-combing task in the physical world. The system is shown to effectively comb different hairstyles.

Robotic Lime Picking by Considering Leaves as Permeable Obstacles

H. Nemlekar, Z. Liu, S. Kothawade, S. Niyaz, B. Raghavan, S. Nikolaidis

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

IEEE Xplore arXiv

Abstract

The problem of robotic lime picking is challenging; lime plants have dense foliage which makes it difficult for a robotic arm to grasp a lime without coming in contact with leaves. Existing approaches either do not consider leaves, or treat them as obstacles and completely avoid them, often resulting in undesirable or infeasible plans. We focus on reaching a lime in the presence of dense foliage by considering the leaves of a plant as 'permeable obstacles' with a collision cost. We then adapt the rapidly exploring random tree star (RRT*) algorithm for the problem of fruit harvesting by incorporating the cost of collision with leaves into the path cost. To reduce the time required for finding low-cost paths to goal, we bias the growth of the tree using an artificial potential field (APF). We compare our proposed method with prior work in a 2-D environment and a 6-DOF robot simulation. Our experiments and a real-world demonstration on a robotic lime picking task demonstrate the applicability of our approach.

On the Importance of Environments in Human-Robot Coordination

M. C. Fontaine*, Y. Hsu*, Y. Zhang*, B. Tjanaka, S. Nikolaidis

Robotics: Science and Systems, July 2021

arXiv Website

Abstract

When studying robots collaborating with humans, much of the focus has been on robot policies that coordinate fluently with human teammates in collaborative tasks. However, less emphasis has been placed on the effect of the environment on coordination behaviors. To thoroughly explore environments that result in diverse behaviors, we propose a framework for procedural generation of environments that are (1) stylistically similar to human-authored environments, (2) guaranteed to be solvable by the human-robot team, and (3) diverse with respect to coordination measures. We analyze the procedurally generated environments in the Overcooked benchmark domain via simulation and an online user study. Results show that the environments result in qualitatively different emerging behaviors and statistically significant differences in collaborative fluency metrics, even when the robot runs the same planning algorithm.

A Quality Diversity Approach to Automatically Generating Human-Robot Interaction Scenarios in Shared Autonomy

M. C. Fontaine, S. Nikolaidis

Robotics: Science and Systems, July 2021

arXiv

Abstract

The growth of scale and complexity of interactions between humans and robots highlights the need for new computational methods to automatically evaluate novel algorithms and applications. Exploring diverse scenarios of humans and robots interacting in simulation can improve understanding of the robotic system and avoid potentially costly failures in real-world settings. We formulate this problem as a quality diversity (QD) problem, where the goal is to discover diverse failure scenarios by simultaneously exploring both environments and human actions. We focus on the shared autonomy domain, where the robot attempts to infer the goal of a human operator, and adopt the QD algorithm MAP-Elites to generate scenarios for two published algorithms in this domain: shared autonomy via hindsight optimization and linear policy blending. Some of the generated scenarios confirm previous theoretical findings, while others are surprising and bring about a new understanding of state-of-the-art implementations. Our experiments show that MAP-Elites outperforms Monte-Carlo simulation and optimization based methods in effectively searching the scenario space, highlighting its promise for automatic evaluation of algorithms in human-robot interaction.

Personalizing User Engagement Dynamics in a Non-Verbal Communication Game for Cerebral Palsy

N. Dennler, C. Yunis, J. Realmuto, T. Sanger, S. Nikolaidis, M. Matarić

2021 30th IEEE International Conference on Robot Human Interactive Communication (RO-MAN), 2021

IEEE Xplore arXiv

Abstract

Children and adults with cerebral palsy (CP) can have involuntary upper limb movements as a consequence of the symptoms that characterize their motor disability, leading to difficulties in communicating with caretakers and peers. We describe how a socially assistive robot may help individuals with CP to practice non-verbal communicative gestures using an active orthosis in a one-on-one number-guessing game. We performed a user study and data collection with participants with CP; we found that participants preferred an embodied robot over a screen-based agent, and we used the participant data to train personalized models of participant engagement dynamics that can be used to select personalized robot actions. Our work highlights the benefit of personalized models in the engagement of users with CP with a socially assistive robot and offers design insights for future work in this area.

Learning Collaborative Pushing and Grasping Policies in Dense Clutter

B. Tang, M. Corsaro, G. Konidaris, S. Nikolaidis, S. Tellex

2021 IEEE International Conference on Robotics and Automation (ICRA), May 2021

PDF

Abstract

Robots must reason about pushing and grasping in order to engage in flexible manipulation in cluttered environments. Earlier works on learning pushing and grasping only consider each operation in isolation or are limited to top-down grasping and bin-picking. We train a robot to learn joint planar pushing and 6-degree-of-freedom (6-DoF) grasping policies by self-supervision. Two separate deep neural networks are trained to map from 3D visual observations to actions with a Q-learning framework. With collaborative pushes and expanded grasping action space, our system can deal with cluttered scenes with a wide variety of objects (e.g. grasping a plate from the side after pushing away surrounding obstacles). We compare our system to the state-of-the-art baseline model VPG in simulation and outperform it with 10% higher action efficiency and 20% higher grasp success rate. We then demonstrate our system on a KUKA LBR iiwa arm with a Robotiq 3-finger gripper.

Two-Stage Clustering of Human Preferences for Action Prediction in Assembly Tasks

H. Nemlekar, J. Modi, S. Gupta, S. Nikolaidis

2021 IEEE International Conference on Robotics and Automation (ICRA), May 2021

arXiv

Abstract

To effectively assist human workers in assembly tasks a robot must proactively offer support by inferring their preferences in sequencing the task actions. Previous work has focused on learning the dominant preferences of human workers for simple tasks largely based on their intended goal. However, people may have preferences at different resolutions: they may share the same high-level preference for the order of the sub-tasks but differ in the sequence of individual actions. We propose a two-stage approach for learning and inferring the preferences of human operators based on the sequence of sub-tasks and actions. We conduct an IKEA assembly study and demonstrate how our approach is able to learn the dominant preferences in a complex task. We show that our approach improves the prediction of human actions through cross-validation. Lastly, we show that our two-stage approach improves the efficiency of task execution in an online experiment, and demonstrate its applicability in a real-world robot-assisted IKEA assembly.

Illuminating Mario Scenes in the Latent Space of a Generative Adversarial Network

M. C. Fontaine, R. Liu, A. Khalifa, J. Modi, J. Togelius, A. Hoover, S. Nikolaidis

AAAI Conference on Artificial Intelligence, February 2021

AAAI arXiv GitHub

Abstract

Generative adversarial networks (GANs) are quickly becoming a ubiquitous approach to procedurally generating video game levels. While GAN generated levels are stylistically similar to human-authored examples, human designers often want to explore the generative design space of GANs to extract interesting levels. However, human designers find latent vectors opaque and would rather explore along dimensions the designer specifies, such as number of enemies or obstacles. We propose using state-of-the-art quality diversity algorithms designed to optimize continuous spaces, i.e. MAP-Elites with a directional variation operator and Covariance Matrix Adaptation MAP-Elites, to efficiently explore the latent space of a GAN to extract levels that vary across a set of specified gameplay measures. In the benchmark domain of Super Mario Bros, we demonstrate how designers may specify gameplay measures to our system and extract high-quality (playable) levels with a diverse range of level mechanics, while still maintaining stylistic similarity to human authored examples. An online user study shows how the different mechanics of the automatically generated levels affect subjective ratings of their perceived difficulty and appearance.

2020¶

Journals ¶

Trust-Aware Decision Making for Human-Robot Collaboration: Model Learning and Planning

M. Chen*, S. Nikolaidis*, H. Soh, D. Hsu, S. Srinivasa

ACM Transactions on Human-Robot Interaction, January 2020

PDF ACM Digital Library

Abstract

Trust in autonomy is essential for effective human-robot collaboration and user adoption of autonomous systems such as robot assistants. This article introduces a computational model that integrates trust into robot decision making. Specifically, we learn from data a partially observable Markov decision process (POMDP) with human trust as a latent variable. The trust-POMDP model provides a principled approach for the robot to (i) infer the trust of a human teammate through interaction, (ii) reason about the effect of its own actions on human trust, and (iii) choose actions that maximize team performance over the long term. We validated the model through human subject experiments on a table clearing task in simulation (201 participants) and with a real robot (20 participants). In our studies, the robot builds human trust by manipulating low-risk objects first. Interestingly, the robot sometimes fails intentionally to modulate human trust and achieve the best team performance. These results show that the trust-POMDP calibrates trust to improve human-robot team performance over the long term. Further, they highlight that maximizing trust alone does not always lead to the best performance.

Conferences ¶

Learning from Demonstrations using Signal Temporal Logic

A. Puranic, J. Deshmukh, S. Nikolaidis

Conference on Robot Learning, November 2020

CoRL arXiv YouTube

Abstract

We present a model-based reinforcement learning framework for robot locomotion that achieves walking based on only 4.5 minutes of data collected on a quadruped robot. To accurately model the robot’s dynamics over a long horizon, we introduce a loss function that tracks the model’s prediction over multiple timesteps. We adapt model predictive control to account for planning latency, which allows the learned model to be used for real time control. Additionally, to ensure safe exploration during model learning, we embed prior knowledge of leg trajectories into the action space. The resulting system achieves fast and robust locomotion. Unlike model-free methods, which optimize for a particular task, our planner can use the same learned dynamics for various tasks, simply by changing the reward function.1 To the best of our knowledge, our approach is more than an order of magnitude more sample efficient than current model-free methods.

Robot Learning in Mixed Adversarial and Collaborative Settings

S. Yoon, S. Nikolaidis

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), January 2020

PDF IEEE Xplore

Abstract

Previous work has shown that interacting with a human adversary can significantly improve the efficiency of the learning process in robot grasping. However, people are not consistent in applying adversarial forces; instead they may alternate between acting antagonistically with the robot or helping the robot achieve its tasks. We propose a physical framework for robot learning in a mixed adversarial/collaborative setting, where a second agent may act as a collaborator or as an antagonist, unbeknownst to the robot. The framework leverages prior estimates of the reward function to infer whether the actions of the second agent are collaborative or adversarial. Integrating the inference in an adversarial learning algorithm can significantly improve the robustness of learned grasps in a manipulation task.

Video Game Level Repair via Mixed Integer Linear Programming

H. Zhang*, M. C. Fontaine*, A. Hoover, J. Togelius, B. Dilkina, S. Nikolaidis

AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, October 2020

AAAI

Abstract

Recent advancements in procedural content generation via machine learning enable the generation of video-game levels that are aesthetically similar to human-authored examples. However, the generated levels are often unplayable without additional editing. We propose a “generate-then-repair” framework for automatic generation of playable levels adhering to specific styles. The framework constructs levels using a generative adversarial network (GAN) trained with human-authored examples and repairs them using a mixed-integer linear program (MIP) with playability constraints. A key component of the framework is computing minimum cost edits between the GAN generated level and the solution of the MIP solver, which we cast as a minimum cost network flow problem. Results show that the proposed framework generates a diverse range of playable levels, that capture the spatial relationships between objects exhibited in the human-authored levels.

Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space

M. C. Fontaine, J. Togelius, S. Nikolaidis, A. Hoover

2020 Genetic and Evolutionary Computation Conference, June 2020

arXiv ACM Digital Library

Abstract

We focus on the challenge of finding a diverse collection of quality solutions on complex continuous domains. While quality diversity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are designed to generate a diverse range of solutions, these algorithms require a large number of evaluations for exploration of continuous spaces. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single-objective continuous domains. This paper proposes a new QD algorithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the self-adaptation techniques of CMA-ES with archiving and mapping techniques for maintaining diversity in QD. Results from experiments based on standard continuous optimization benchmarks show that CMA-ME finds better-quality solutions than MAP-Elites; similarly, results on the strategic game Hearthstone show that CMA-ME finds both a higher overall quality and broader diversity of strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles the performance of MAP-Elites using standard QD performance metrics. These results suggest that QD algorithms augmented by operators from state-of-the-art optimization algorithms can yield high-performing methods for simultaneously exploring and optimizing continuous search spaces, with significant applications to design, testing, and reinforcement learning among other domains.

Fair Contextual Multi-Armed Bandits: Theory and Experiments

Y. Chen, A. Cuellar, H. Luo, J. Modi, H. Nemlekar, S. Nikolaidis

36th Conference on Uncertainty in Artificial Intelligence (UAI), August 2020

arXiv PMLR

Abstract

When an AI system interacts with multiple users, it frequently needs to make allocation decisions. For instance, a virtual agent decides whom to pay attention to in a group, or a factory robot selects a worker to deliver a part.Demonstrating fairness in decision making is essential for such systems to be broadly accepted. We introduce a Multi-Armed Bandit algorithm with fairness constraints, where fairness is defined as a minimum rate at which a task or a resource is assigned to a user. The proposed algorithm uses contextual information about the users and the task and makes no assumptions on how the losses capturing the performance of different users are generated. We provide theoretical guarantees of performance and empirical results from simulation and an online user study. The results highlight the benefit of accounting for contexts in fair decision making, especially when users perform better at some contexts and worse at others.

The Fair Contextual Multi-Armed Bandit

Y. Chen, A. Cuellar, H. Luo, J. Modi, H. Nemlekar, S. Nikolaidis

19th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) (short paper), May 2020

PDF ACM Digital Library

Abstract

When an AI system interacts with multiple users, it frequently needs to make allocation decisions. For instance, a virtual agent decides whom to pay attention to in a group setting, or a factory robot selects a worker to deliver a part. Demonstrating fairness in decision making is essential for such systems to be broadly accepted. We introduce a Multi-Armed Bandit algorithm with fairness constraints, where fairness is defined as a minimum rate that a task or a resource is assigned to a user. The proposed algorithm uses contextual information about the users and the task and makes no assumptions on how the losses capturing the performance of different users are generated. We view this as an exciting step towards including fairness constraints in resource allocation decisions.

Multi-Armed Bandits with Fairness Constraints for Distributing Resources to Human Teammates

H. Claure, Y. Chen, J. Modi, M. Jung, S. Nikolaidis

2020 ACM/IEEE International Conference on Human-Robot Interaction, March 2020

ACM Digital Library arXiv

Abstract

How should a robot that collaborates with multiple people decide upon the distribution of resources (e.g. social attention, or parts needed for an assembly)? People are uniquely attuned to how resources are distributed. A decision to distribute more resources to one team member than another might be perceived as unfair with potentially detrimental effects for trust. We introduce a multi-armed bandit algorithm with fairness constraints, where a robot distributes resources to human teammates of different skill levels. In this problem, the robot does not know the skill level of each human teammate, but learns it by observing their performance over time. We define fairness as a constraint on the minimum rate that each human teammate is selected throughout the task. We provide theoretical guarantees on performance and perform a large-scale user study, where we adjust the level of fairness in our algorithm. Results show that fairness in resource distribution has a significant effect on users' trust in the system.

Communicating Robot Goals via Haptic Feedback in Manipulation Tasks

R. Pocius, N. Zamani, H. Culbertson, S. Nikolaidis

2020 ACM/IEEE International Conference on Human-Robot Interaction, March 2020

ACM Digital Library

Abstract

In shared autonomy, human teleoperation blends with intelligent robot autonomy to create robot control. This combination enables assistive robot manipulators to help human operators by predicting and reaching the human's desired target. However, this reduces the control authority of the user and the transparency of the interaction. This negatively affects their willingness to use the system. We propose haptic feedback as a seamless and natural way for the robot to communicate information to the user and assist them in completing the task. A proof-of-concept demonstration of our system illustrates the effectiveness of haptic feedback in communicating the robot's goals to the user. We hypothesize that this can be an effective way to improve performance in teleoperated manipulation tasks, while retaining the control authority of the user.

Preprints ¶

Robot Learning and Execution of Collaborative Manipulation Plans from YouTube Videos

H. Zhang, S. Nikolaidis

CoRR, February 2019

arXiv

Abstract

People often watch videos on the web to learn how to cook new recipes, assemble furniture or repair a computer. We wish to enable robots with the very same capability. This is challenging; there is a large variation in manipulation actions and some videos even involve multiple persons, who collaborate by sharing and exchanging objects and tools. Furthermore, the learned representations need to be general enough to be transferable to robotic systems. On the other hand, previous work has shown that the space of human manipulation actions has a linguistic, hierarchical structure that relates actions to manipulated objects and tools. Building upon this theory of language for action, we propose a framework for understanding and executing demonstrated action sequences from full-length, unconstrained cooking videos on the web. The framework takes as input a cooking video annotated with object labels and bounding boxes, and outputs a collaborative manipulation action plan for one or more robotic arms. We demonstrate performance of the system in a standardized dataset of 100 YouTube cooking videos, as well as in three full-length Youtube videos that include collaborative actions between two participants. We additionally propose an open-source platform for executing the learned plans in a simulation environment as well as with an actual robotic arm.

2019¶

Conferences ¶

Robot Learning via Human Adversarial Games

J. Duan*, Q. Wang*, L. Pinto, C. Jay Kuo, S. Nikolaidis

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019

Best Cognitive Robotics Paper Award Nomination

PDF IEEE Xplore arXiv

Abstract

Much work in robotics has focused on “human-in-the-loop” learning techniques that improve the efficiency of the learning process. However, these algorithms have made the strong assumption of a cooperating human supervisor that assists the robot. In reality, human observers tend to also act in an adversarial manner towards deployed robotic systems. We show that this can in fact improve the robustness of the learned models by proposing a physical framework that leverages perturbations applied by a human adversary, guiding the robot towards more robust models. In a manipulation task, we show that grasping success improves significantly when the robot trains with a human adversary as compared to training in a self-supervised manner.

Learning Collaborative Action Plans from Unlabeled YouTube Videos

H. Zhang, P. Lai, S. Paul, S. Kothawade, S. Nikolaidis

Robotics Research, The 19th International Symposium, ISRR 2019, October 2019

PDF

Abstract

Videos from the World Wide Web provide a rich source of information that robots could use to acquire knowledge about manipulation tasks. Previous work has focused on generating action sequences from unconstrained videos for a single robot performing manipulation tasks by itself. However, robots operating in the same physical space with people need to not only perform actions autonomously, but also coordinate seamlessly with their human counterparts. This often requires representing and executing collaborative manipulation actions, such as handing over a tool or holding an object for the other agent. We present a system for knowledge acquisition of collaborative manipulation action plans that outputs commands to the robot in the form of visual sentence. We show the performance of the system in 12 unlabeled action clips taken from collaborative cooking videos on YouTube. We view this as the first step towards extracting collaborative manipulation action sequences from unconstrained, unlabeled online videos.

Surprise! Predicting Infant Visual Attention in a Socially Assistive Robot Contingent Learning Paradigm

L. Klein, L. Itti, B. Smith, M. Rosales, S. Nikolaidis, M. Matarić

2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), October 2019

PDF IEEE Xplore

Abstract

Early intervention to address developmental disability in infants has the potential to promote improved outcomes in neurodevelopmental structure and function [1]. Researchers are starting to explore Socially Assistive Robotics (SAR) as a tool for delivering early interventions that are synergistic with and enhance human-administered therapy. For SAR to be effective, the robot must be able to consistently attract the attention of the infant in order to engage the infant in a desired activity. This work presents the analysis of eye gaze tracking data from five 6-8 month old infants interacting with a Nao robot that kicked its leg as a contingent reward for infant leg movement. We evaluate a Bayesian model of low-level surprise on video data from the infants' head-mounted camera and on the timing of robot behaviors as a predictor of infant visual attention. The results demonstrate that over 67% of infant gaze locations were in areas the model evaluated to be more surprising than average. We also present an initial exploration using surprise to predict the extent to which the robot attracts infant visual attention during specific intervals in the study. This work is the first to validate the surprise model on infants; our results indicate the potential for using surprise to inform robot behaviors that attract infant attention during SAR interactions.

Robot Object Referencing through Legible Situated Projections

T. Weng, L. Perlmutter, S. Nikolaidis, S. Srinivasa, M. Cakmak

2019 International Conference on Robotics and Automation (ICRA), May 2019

PDF IEEE Xplore

Abstract

The ability to reference objects in the environment is a key communication skill that robots need for complex, task-oriented human-robot collaborations. In this paper we explore the use of projections, which are a powerful communication channel for robot-to-human information transfer as they allow for situated, instantaneous, and parallelized visual referencing. We focus on the question of what makes a good projection for referencing a target object. To that end, we mathematically formulatelegibility of projections intended to reference an object, and propose alternative arrow-object match functions for optimally computing the placement of an arrow to indicate a target object in a cluttered scene. We implement our approach on a PR2 robot with a head-mounted projector. Through an online (48 participants) and an in-person (12 participants) user study we validate the effectiveness of our approach, identify the types of scenes where projections may fail, and characterize the differences between alternative match functions.

Demonstrations ¶

Robot-Assisted Hair Brushing

E. Shin, H. Zhang, R. Pocius, N. Dennler, H. Culbertson, N. Zamani, S. Nikolaidis

NeurIPS 2019 Demonstrations

Game-Theoretic Modeling of Human Adaptation in Human-Robot Collaboration

S. Nikolaidis, S. Nath, A. Procaccia, S. Srinivasa

2017 ACM/IEEE International Conference on Human-Robot Interaction, March 2017

PDF ACM Digital Libary

Abstract

In human-robot teams, humans often start with an inaccurate model of the robot capabilities. As they interact with the robot, they infer the robot's capabilities and partially adapt to the robot, i.e., they might change their actions based on the observed outcomes and the robot's actions, without replicating the robot's policy. We present a game-theoretic model of human partial adaptation to the robot, where the human responds to the robot's actions by maximizing a reward function that changes stochastically over time, capturing the evolution of their expectations of the robot's capabilities. The robot can then use this model to decide optimally between taking actions that reveal its capabilities to the human and taking the best action given the information that the human currently has. We prove that under certain observability assumptions, the optimal policy can be computed efficiently. We demonstrate through a human subject experiment that the proposed model significantly improves human-robot team performance, compared to policies that assume complete adaptation of the human to the robot.

Human-Robot Mutual Adaptation in Shared Autonomy

S. Nikolaidis, Y. Zhu, D. Hsu, S. Srinivasa

2017 ACM/IEEE International Conference on Human-Robot Interaction, March 2017

PDF ACM Digial Library

Abstract

Shared autonomy integrates user input with robot autonomy in order to control a robot and help the user to complete a task. Our work aims to improve the performance of such a human-robot team: the robot tries to guide the human towards an effective strategy, sometimes against the human's own preference, while still retaining his trust. We achieve this through a principled human-robot mutual adaptation formalism. We integrate a bounded-memory adaptation model of the human into a partially observable stochastic decision model, which enables the robot to adapt to an adaptable human. When the human is adaptable, the robot guides the human towards a good strategy, maybe unknown to the human in advance. When the human is stubborn and not adaptable, the robot complies with the human's preference in order to retain their trust. In the shared autonomy setting, unlike many other common human-robot collaboration settings, only the robot actions can change the physical state of the world, and the human and robot goals are not fully observable. We address these challenges and show in a human subject experiment that the proposed mutual adaptation formalism improves human-robot team performance, while retaining a high level of user trust in the robot, compared to the common approach of having the robot strictly following participants' preference.

2016¶

Conferences ¶

Formalizing Human-Robot Mutual Adaptation: A Bounded Memory Model

S. Nikolaidis, A. Kuznetsov, D. Hsu, S. Srinivasa

The Eleventh ACM/IEEE International Conference on Human Robot Interaction, March 2016

PDF ACM Digital Library

Abstract

Mutual adaptation is critical for effective team collaboration. This paper presents a formalism for human-robot mutual adaptation in collaborative tasks. We propose the bounded-memory adaptation model (BAM), which captures human adaptive behaviours based on a bounded memory assumption. We integrate BAM into a partially observable stochastic model, which enables robot adaptation to the human. When the human is adaptive, the robot will guide the human towards a new, optimal collaborative strategy unknown to the human in advance. When the human is not willing to change their strategy, the robot adapts to the human in order to retain human trust. Human subject experiments indicate that the proposed formalism can significantly improve the effectiveness of human-robot teams, while human subject ratings on the robot performance and trust are comparable to those achieved by cross training, a state-of-the-art human-robot team training practice.

Optimal arrangement of ceiling cameras for home service robots using genetic algorithms

S. Nikolaidis, T. Arai

RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication, September 2009

PDF IEEE Xplore

Abstract

In the near future robots will be used in home environments to provide assistance for the elderly and challenged people. As home environments are complicated, external sensors like ceiling cameras need to be placed on the environment to provide the robot with information about its position. The pose of cameras influences the area covered by the cameras, as well as the error of the robot localization. We examine the problem of the finding the arrangement of ceiling cameras at home environments that maximizes the area covered and minimizes the localization error. Genetic algorithms are proposed for the single and multi-objective optimization problem. Simulation results indicate that we can obtain the optimal arrangement of cameras that satisfies the given objectives and the required constraints.

Optimal camera placement considering mobile robot trajectory

S. Nikolaidis, R. Ueda, A. Hayashi, T. Arai

2008 IEEE International Conference on Robotics and Biomimetics, February 2009

PDF IEEE Xplore

Abstract

In the near future robots will be used in home environments to provide assistance for the elderly and challenged people. The arrangement of sensors influences greatly the quality of information provided to the robot. We, therefore, examine the problem of the optimal arrangement of vision sensors for the case of a robot following a pre-defined path. A methodology to evaluate the arrangement of sensors is proposed, focusing on the case of a home environment with ceiling cameras. Simulation results indicate that we can obtain sub-optimal and practical arrangement with the minimum number of sensors which satisfies the necessary condition.

Global Pose Estimation of Multiple Cameras with Particle Filters

R. Ueda, S. Nikolaidis, A. Hayashi, T. Arai

Distributed Autonomous Robotic Systems 8, January 2009

PDF Springer

Abstract

Though image processing algorithms are sophisticated and provided as software libraries, it is still difficult to assure that complicated programs can work in various situations. In this paper, we propose a novel global pose estimation method for network cameras to actualize auto-calibration. This method uses native information from images. The sets of partial information are integrated with particle filters. Though some kinds of limitation still exist in the method, we can verify that the particle filters can deal with the nonlinearity of estimation with the experiment.

2008¶

Conferences ¶

Real-Time Detection And Visualization of Clarinet Bad Sounds

A. Gkiokas, K. Perifanos, S. Nikolaidis

11th Int. Conference on Digital Audio Effects (DAFx-08), September 2008

PDF

Abstract

This paper describes an approach on real-time performance visualization in the context of music education. A tool is described that produces sound visualizations during a student performance that are intuitively linked to common mistakes frequently observed in the performances of novice to intermediate students. The paper discusses the case of clarinet students. Nevertheless, the approach is also well suited for a wide range of wind or other instruments where similar mistakes are often encountered.

2007¶

Conferences ¶

RFID Based Object Localization System Using Ceiling Cameras with Particle Filter

P. Kamol, S. Nikolaidis, R. Ueda, T. Arai

Future Generation Communication and Networking (FGCN 2007), December 2007

PDF IEEE Xplore

Abstract

In this paper, we propose an object localization method for home environments. This method utilizes RFID equipments, a mobile robot and some ceiling cameras. The RFID system estimates a rough position of each object. The autonomous robot with RFID antennas explores the environment so as to detect other objects on the floor. Each object that is attached an RFID tag, is then recognized by utilizing its feature information stored in this tag. Finally, the precise localization of each object is achieved by the ceiling cameras with particle filters. The accuracy and the robustness of the proposed method are verified through an experiment.

Years

2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2013
2012
2009
2008
2007