Robotics Research Lab
CRES
USC Computer Science
USC Engineering
USC
/ Research / Projects / Learning by Demonstration: A Human-Inspired Approach

Overview Introduction Teaching Learning Generalization
Pratice Videos Publications Support Contact

Overview

Among humans, teaching various tasks is a complex process which relies on multiple means for interaction and learning, both on the part of the teacher and of the learner. Used together, these modalities lead to effective teaching and learning approaches, respectively. In the robotics domain, task teaching has been mostly addressed by using only one or very few of these interactions. This project presents an approach for teaching robots that relies on the key features and the general approach people use when teaching each other: first give a demonstration, then allow the learner to refine the acquired capabilities by practicing under the teacher's supervision, involving a small number of trials. Depending on the quality of the learned task, the teacher may either demonstrate it again or provide specific feedback during the learner's practice trial for further refinement. Also, as people do during demonstrations, the teacher can provide simple instructions and informative cues, increasing the performance of learning. Thus, instructive demonstrations, generalization over multiple demonstrations and practice trials are essential features for a successful human-robot teaching approach.

Introduction

The goal of this project is to develop a flexible mechanism that allows a robot to learn and refine representations of high level tasks, from interaction with a human teacher, based on a set of underlying capabilities (behaviors) already available to the robot.

Since people are very good at learning from a teacher's training, we are interested in the key features that make this process efficient, and seek to develop a similar robot teaching strategy. Human teachers rely on concurrent use of multiple instructive modalities, including primarily demonstration, verbal instruction, attentional cues, or gestures. On the part of the learner, the process is also more complex than a one-shot teaching experience. Students are typically given one demonstration of the task and then they perform a set of practice trials under the supervision of the teacher, in order to show what was learned. If needed, during these runs the teacher provides feedback cues to indicate corrections (irrelevant actions or missing parts of the task). Alternatively, the teacher may also provide additional demonstrations that the learner could use for generalization. Most of these aspects are generally overlooked in the majority of robot teaching approaches, which focus mostly on only one or very few of these instructive and learning modalities. We believe that considering these issues makes significantly easier and improves the learning process by conveying more information about the task, while in the same time allowing for a very flexible robot teaching approach. The overall strategy for learning and refining task representations is presented in Figure 1.

chart

Teaching Process

As a method for task teaching we use teaching by experience. With this the robot actively participates in the demonstration provided by the teacher, and experiences the task through its own sensors. The advantage of this approach is that it directly provides the robot the data necessary for learning. In the mobile robot domain the task demonstrations are achieved by having the robot follow and interact with the teacher.

An important challenge for any learning by demonstration method is to distinguish between the relevant and irrelevant information being perceived. Putting the entire responsibility on the learner to decide between relevant and irrelevant observations increases the complexity of the problem and leads to more complicated, sometimes ineffective solutions. Therefore, in order for a robot to learn a task effectively, we allow the teacher to also provide it with additional information beyond the perceived demonstration experience through verbal instruction:

  • HERE - indicates moments in time during the demonstration when the environment presents aspects that are relevant for the task.

  • TAKE, DROP - instructions that induce the robot into performing certain actions during the demonstration (in this case Pick Up and Drop small objects), actions that would be otherwise impossible to trigger in a teacher-following-only learning approach.

  • START, DONE - instructions that signal the beginning and the end of a demonstration, respectively.
Learning

The ability to learn from the observations gathered during the demonstration is based on the robot's ability to relate the observed states of the environment to the known effects of its own skills given by the particular Behavior-Based Architecture we developed. This architecture provides a simple and natural way of representing complex tasks and sequences of behaviors in the form of networks of abstract behaviors. The abstract behaviors embed representations of a behavior's goals in the form of abstracted environmental states and are a key feature of our architecture, and a critical aspect for learning from experience.

In order to learn a task the robot has to create a link between perception (observations) and the robot's behaviors that would achieve the same observed effects.

During the demonstration, while the robot follows the human teacher, all its available behaviors continuously monitor the status of their postconditions (without executing any of their actions). Whenever the observations match a primitive's goals, this represents an example of the robot having seen something it is also able to do, signaled by the corresponding abstract behaviors. The feedback cues received from the teacher are used in conjunction with these observations, to eliminate any irrelevant observations. More details on the learning algorithm can be found in the paper below [2].

Generalization

An important capability that allows humans to learn effectively is the ability to generalize over multiple given demonstrations. The reason we are interesting in giving a robot similar generalization abilities is that its limited sensing capabilities, the quality of the teacher's demonstration, and particularities of the environment generally prevent the robot from correctly learning a task from only one trial. The two main inaccuracies that occur in the learning process are learning irrelevant steps (false positives) and omission of steps that are relevant (false negatives).

Our approach for generalization is to build a task representation that encodes the specifics of each input example, but most importantly that points out the parts that are common to each of them. As a measure of similarity we consider the longest list of common nodes between the topological forms of the sample tasks. Based on this information we further construct a generalized topology in which nodes that are common to both tasks will be merged, while the others will appear as alternate paths in the graph, as shown in Figure 2.

tr lcstbl new-net
Training samples Longest common sequence table Generalized topology
Figure 2
Practice

Generalization over several training examples helps in identifying the steps that were observed most often and that most probably are a part of the task. However, repeated observations of irrelevant steps may inadvertently bias the learner toward including them in the representation. Also, limitations in the sensing capabilities of robots and particular structures in the environment may prevent the robot from observing steps that are relevant. The practice trials allow the teacher to observe the execution of the robot and to point more accurately to where problems occurred. The following feedback cues can be given, with the associated effects:

Videos

The first set of videos demonstrate one-shot teaching by demonstration experiences, in which all the observations gathered by the robot are considered relevant for the task.

Visit targets (long sequences) task Slalom task Object manipulation task
Environmental
setup
Videos of the
demonstration stage

AVI

AVI

AVI

Videos of the
execution stage

AVI

AVI

AVI

Learned task
representations

The second set of videos demonstrate the ability of the robot to generalize across multiple demonstrations, through three consecutive demonstrations, performed in different environmental setups. The environments are purposely designed to contain incorrect steps and inconsistencies. The second set of videos shows how already learned/generalized tasks can be further refined through practice under the teacher's supervision. The robot has a behavior set that allows it to track cylindrical colored targets (to a particular distance and orientation to them), to pick up, and drop small colored objects.

1) Learning by generalization from several examples

The environment consists of a set of cylindrical targets, in colors that the robot is able to perceive. The teacher leads the robot around these targets, while also instructing it when it has to pick up or drop a small orange box. The task to be learned is as follows: go to either the Green (G) or the Light Green (LG) targets, then pick up an Orange (O) box, go between the Yellow (Y) and Red (R) targets, go to the Pink (P) target, drop the box there, then go to the Light Orange (LO) target and come back to the target Light Green. The sketched courses of the three demonstrations show that none of them corresponds exactly to the target task. Besides containing unnecessary steps (such as a final visit to a Green target in the first trial), these training runs also contain inconsistencies, such as the visits to the Light Orange target which happened at various stages during the demonstrations.

First demonstration Second demonstration Third demonstration
Environmental
setup
Videos of the
demonstration stage

AVI [9.13 MB]

AVI [9.82 MB]

AVI [9.33 MB]

Videos of the
execution stage

AVI [9.74 MB]

AVI [7.66 MB]

AVI [12.4 MB]


2) Learning from practice and teacher feedback

The generalized network after the three demonstrations does not yet represent the target task desired by the user. The missing part is a visit to the Light Orange target, which should happen right after dropping the box and before going to the Light Green target. Simple feedback during a robot practice run is enough for refining it to the desired structure (middle column below).

For a second experiment we assume that (if starting from the first demonstration above) for a different object transport task the teacher considers as wrong the initial visit to a Green target, when a Light Green target should be visited instead, and also, that the visit to the Light Orange target is wrong, and not a part of the task as well. The sketched trajectory with the teacher's interventions and videos are in the right column below.

Generalized task:
new environment
Feedback after the
third demonstration
Feedback after the
first demonstration
Environmental
setup
Videos of the
practice runs

AVI [12.9 MB]

AVI [17.4 MB]

Videos of the
execution stage

AVI [7.43 MB]

AVI [11.2 MB]

AVI [8.32 MB]

Publications

  1. Monica Nicolescu, Maja J Mataric´, "A hierarchical architecture for behavior-based robots", Proceedings, First International Joint Conference on Autonomous Agents and Multi-Agent Systems, pages 227-233, Bologna, ITALY, July 15-19, 2002. [PS], [PDF]

  2. Monica Nicolescu, Maja J Mataric´, "Learning and Interacting in Human-Robot Domains", Special Issue of IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans , Vol. 31, No. 5, pages 419-430, Chelsea C. White and Kerstin Dautenhahn Eds., September, 2001. [PS], [PDF]

  3. Monica Nicolescu, Maja J Mataric´, "Natural Methods for Learning and Generalization in Human-Robot Domains", to appear in Proceedings, Second International Joint Conference on Autonomous Agents and Multi-Agent Systems, Melbourne, AUSTRALIA, July 14-18, 2003. [PDF]

Support

This work is supported by DARPA Grant DABT63-99-1-0015 under the Mobile Autonomous Robot Software (MARS) program, the National Science Foundation Grant No. 9896322, and by the ONR Defense University Research Instrumentation Program Grant.

Contact

Monica Nicolescu