|Background & Motivation||Summary of Approach||Flow Chart||Feature Relative Task Learning||Interface|
|Experimental Validation||Relevant Papers||Support||Contact Details||Publicity|
Modern robots are physically capable of performing complex and time-extended tasks, such as search and rescue and structure construction. A prime example is the iRobot Packbot mobile manipulator designed to withstand rough environments, and interact with complex objects with its six degree of freedom arm. As capable as such robots are, programming them to accomplish complex tasks in dynamic environments is extremely challenging.
Not only is it difficult for a roboticist to create control code, but in many situations skills from experts in other fields must be transfered to the robot. Programming by demonstration, also known as imitation learning, is one approach toward solving this problem of robot programming. Initially the robot acquires new skills and tasks by learning from a human teacher. Skills refer to motor control policies, such as grasping an object, and tasks consist of symbolic concepts, such as ``find object X''. Typically, a series of skills accomplishes a task.
This work takes an incremental memory-based approach to task learning from human demonstrations and practice. Assuming a robot has received no prior training, an initial demonstration of a task is performed by a human expert who tele-ops the robot. During the demonstration the robot monitors and records local features, which are defined a priori. The resulting ``memory'' of the demonstration serves as a baseline for further practice, which is accomplished through interactive reinforcement learning. The robot can reuse this experience in future training sessions to incrementally learn more complex tasks.
Generalization across similar environments is achieved through ego-centric features [Bentivegna, 2000]. These features refer to symbolic objects that can be observed by filtering sensor data. Example features include walls, corners, colored boxes, and a person's face. Task relevant features are defined prior to training. Future work will study methods to automatically determine relevant features from demonstrations.
Following a task demonstration, the robot has a time series of features from which to learn an action policy. Application of a learning algorithm to the entire feature sequence is not suitable, due to the complexity of a task. Instead the demonstration is decomposed into segments, where each segment is bounded by distinct changes in the feature sequence. The derivative of a feature's pose, distance and rotation relative to the robot, maintains constant during a segment. When the derivative alters significantly, most likely due to a change in robot's motion, a new segment is begun.
A regression learning algorithm is applied to each segment in order to learn the control function that maps features to actions. The result is a compressed representation of the feature sequence, in the form of small sequence of learned ``skills''. Segmentation further increases the applicability of the demonstrated task to various situations, as the pattern of features do not need to be identical between environments, and the robot can repeat and reuse learned ``skills''.
A single demonstration of a task is insufficient for a robot to learn a correct policy. Solutions to this problem include performing multiple demonstrations for the robot, at the expense of the human teacher, or significant automated practice trials. Instead of these two extremes, a middle ground is reached where a robot practices a demonstration with some guidance from a human observer. Using the initial policy to bootstrap the learning process, the robot then follows the reinforcement learning paradigm. By attempting to accomplish the demonstrated task autonomously, the robot is able to gather more information and further refine each skill. Also during each practice trial, a human observer is allowed to interject positive or negative reinforcement based on the robot's performance. The purpose of this interjection is to further reduce the time to convergence, and guarantee that the robot learns a correct policy.
In summary, the proposed algorithm will accomplish task level learning with minimal human demonstrations. Symbolic features provide generalization across similar environments, and are used to decompose the demonstration into segments. Policy refinement is achieved through interactive reinforcement learning, thereby reducing the convergence time and improving the end result.
Both learning by demonstration and life-long learning generate a policy, or state-action mapping, the robot should follow. This can be summarized as learning what actions to take in each state. Where state is comprised of various sensor data and internal information, and the actions are commands to effectors and actuators. This is a very robot-centric viewpoint of learning, and works well in many situations. However, it does not scale well to complex and diverse tasks and environments where the potential set of states and tasks become very large.
By approaching the problem in a slightly different manner one can overcome these difficulties. Instead of a robot-centric approach, take a feature-relative approach where all the actions a robot can perform are relative to some environmental feature. For example a robot operating indoors may ``turn a corner", ``cross a doorway", or ``traverse a hallway". A robot in an outdoor environment may ``approach a rock", ``follow a sidewalk", or ``cross a road". All these cases contain an operator and feature. Interestingly, the range of operators, or actions, for a mobile manipulator is relatively small. All operators move the robot or manipulator closer, farther, or at an angle to a feature. The set of operators can be defined to be reasonably small, and can be implemented with simple functions relative to environmental features.
By taking this feature-relative task learning approach all the low level motor commands are abstracted away. The robot no longer learns motor commands, but instead learns a few parameters for a function relating the pose of the robot to the pose of a feature. The motor controllers still must be present, however these can be programmed by a competent roboticist. One can also argue that such controllers should be programmed by a professional who know the intricacies of the robot, and can create a robust and full-proof controller.
Features within the environment must be recognized by the robot. The difficulty is that preprogramming a static set of features will limit the range and complexity of the environments within which the robot can operate. A better approach is to allow the robot learn what features are relevant. However, this is a non-trivial task and one that forms the crux upon which feature-relative task learning rests. The benefit is that learning a complex task policy is reduced to learning features, which can be reused in numerous tasks and environments. This formulation also conveniently merges life-long learning and learning by demonstration into a single framework. During demonstrations the goal of the robot is to recognize features, learn new features, and assign appropriate parameters to the action function.
This work will focus on mobile manipulators as this type of robot is a nice compromise between pure mobile robots and humanoids. Mobile manipulators are becoming very common in a variety of applications. The general platform morphology is relatively easy to use and comfortable for humans to understand and interact with.
The process of teaching a robot will take place through an abstract graphical interface. This will decouple the physical robot from the teaching process, thereby making it hardware independent. The instructor will only have information that is also available to the robot. This limits the person from using sensory data that the robot cannot detect to make decisions, and forces the instructor to teach in terms of the what can understand. A graphical interface method of teaching also create a single unified point of contact for human robot interaction. It is much easier to create a human friendly graphical interface than to design all robots to have the same characteristics.
This work will be validated on a mobile manipulator robot in two distinct tasks: search and rescue and human rehabilitation. Search and rescue involves finding an important object, usually a person, in an unstructured potentially dangerous environment. While the range of tasks the robot needs to understand is limited, the range of environments is vast. Human rehabilitation requires the robot to learn exercises from a nurse and teach them human. The robot must also monitor the person's progress and performance. This will take place in constrained hospital-like environments, however the range of exercises the robot must know will be large.
Demonstration-Based Behavior and Task Learning". In Working notes, AAAI Spring Symposium To Boldy Go Where No Human-Robot Team Has Gone Before, Stanford, CA, Mar 2006.(.pdf). "
Behavior-Based Segmentation of Demonstrated Task". In International Conference on Development and Learning (ICDL), Bloomington, IN, May 2006.(.pdf). "
This project is funded by: