Torbjørn S. Dahl's Project

Home	Publications	Miscellaneous	Personal Details

Project:

Hierarchical Self-Organizing Maps for Stateful Robot Behaviors

We must remember that in nature there are neither rewards nor punishments there are consequences [Robert G. Ingersoll, Some Reasons Why]

Research Objectives
Behavior-based systems (BBS) [Arkin, 1998] arrange robot controllers into a collection of task-achieving modules or behaviors that, when properly designed, can produce robust, repeatable and reliable overall behavior for a robot. BBS have proved successful and have become popular both in research and in commercial applications. There are however prevailing limitations that have been identified to the general applicability of BBS, in particular in relation to automated generation and reuse of behaviors, as well as to planning using behaviors as symbolic operators [Nicolescu and Mataric', 2002].
Tripartite structures comprised of a context, an action, and a result are common in several areas of artificial intelligence. In planning, such structures have been called STRIPS operators [Russel and Norvig, 1995]. In constructivist learning they have been called schemas [Chaput et al., 2003], and in BBS they have been called competence modules [Maes, 1989].
Here we present an algorithm that combines Chaput et al.'s CLASM architecture with Maes' action selection mechanism based on activation spreading. Our algorithm can do one-shot learning of behaviors that are stateful, i.e., able to handle the hidden state problem, and it has some of the characteristics of traditional planners, e.g., it can optimize behaviors by extending causal links backward in time and avoid redundant actions.

Hierachical SOMs for Robot Control
The fundamental data-structure underlying our control algorithm is a growing self-organizing map (SOM). Inspired by the CLASM architecture, our architecture uses a SOM with an input vector that is divided into two distinct sub-vectors, T1 and T2, as shown in the image below. The sub-vectors describe the activations of the inputs at two different points in time, thus creating a temporal association between two input states.

The SOM takes its input from sensor readings and activator states normalized between 0 and 1. Again like in the CLASM architecture, we build a hierarchy of SOMs by adding SOMs that have, as their input, the activations of the elements of the underlying SOMs, as illustrated below.

Instance-based methods have been successful in learning solutions to partially observable Markov decision processes [McCallum, 1996]. Our algorithm is closely related to these methods in that it stores data describing state and action sequences. Inspired by the instance-based learning methods we have adapted the activation spreading mechanism used in Mataric's work on Toto [Mataric', 1992] for making information about geographical proximity instantly available, to work in a state-action-time space. The new activation spreading mechanism provides data about previously visited states to the action selection function. The resulting behavior can thus deal with hidden state problems.
Based on the activation spreading mechanism for action selection, our algorithm for control and learning uses a three-step cycle:

convergence of activation to one dominant state
diffusion of activation over temporal associations
learning by updating the weights of the SOM
The dominant SOM element in each level spreads activation downward to the underlying SOM elements. The amount of activation diffused to each element is proportional to the relevant weight in the T2 sub-vector of the dominant SOM element's connection weights. The image below illustrates how activation spreading in a hierarchy through convergence and diffusion can lead to stateful behaviors by letting a state observed at time T1 influence the activation values of the bottom-level SOMs over the following times, T2, T3, and T4.

Maes has presented an action selection mechanism that combines characteristics of traditional planners and reactive systems, producuing fast and robust behavior while remaining prediction and planning to take place. Maes' action selection mechanism is based on competence modules, tuples (ci, ai, di, Ai), where ci is a list of preconditions which have to be fulfilled before a competance module can become active, ai and di are lists of effects of the module's actions in terms of an add list and a delete list, and Ai is a module's activation level. Activation is spread through a network of competence modules by means of successor links, predecessor links, and conflicter links. Generally the module with the highest activation is selected and its related actions are executed. Activation originates in the sensors and promotes modules close to the current sensory state by spreading forward. activation also originates in goal states and promotes modules that changes the input state toward the goal state by spreading backward. The SOM elements related to a specific action embody the same information as Maes' competence modules. By spreading activation forward and backward through the SOM hierarchy, the action selection characteristics of Maes' machanism can be reproduced in learned behaviors.

Bibliography

[Arkin, 1998] Ronald C. Arkin, Behavior-Based Robotics, MIT Press, Cambridge, Massachusetts, 1998.

[Chaput et al., 2003] Harold H. Chaput, Benjamin Kuipers, and Risto Miikkulainen, Constructivist learning: a neural implementation of the Schema Mechanism. In the Proceedings of the Workshop on Self-Organizing Maps (WSOM'03), Kitakyushu, Japan, September 11-14, 2003. [pdf]

[Maes, 1989] Pattie Maes, How to do the right thing. Connection Science, 1(3):291-232, 1989. [pdf] [ps.gz]

[Mataric', 1992] Maja J. Mataric', Integration of representation into goal-driven behavior-based robots. IEEE Transactions on Robotisc and Automation, 8(3):304-312, 1992. [pdf] (1.8MB)

[McCallum, 1996] Andrew McCallum, Hidden state and reinforcement learning with instance-based state identification. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 26(3):464-473, 1996. [pdf] [ps.gz]

[Nicolescu and Mataric', 2002] Monica Nicolescu and Maja J. Mataric', A hierarchical architecture for behavior-based robotics. In the Proceedings of the 1st International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS'02), pp227-233, ACM Press, Bologna, Italy, July 15-19, 2002. [pdf] [ps.gz]

[Russel and Norvig, 1995] Stuart J. Russel and Peter Norvig, Artificial Intelligence: a modern approach, Prentice Hall, Upper Saddle River, New Jersey, 1995.

by Torbjørn S Dahl.