Home | Publications | Miscellaneous | Personal Details |
---|
Project:
|
---|
Tripartite structures comprised of a context, an action, and a result are common in several areas of artificial intelligence.
In planning, such structures have been called STRIPS operators
[Russel and Norvig, 1995].
In constructivist learning they have been called schemas
[Chaput et al., 2003],
and in BBS they have been called competence modules [Maes, 1989].
Here we present an algorithm that combines Chaput et al.'s CLASM architecture with Maes' action selection mechanism based on activation spreading.
Our algorithm can do one-shot learning of behaviors that are stateful, i.e., able to handle the hidden state problem, and it has some of the characteristics of traditional planners, e.g., it can optimize behaviors by extending causal links backward in time and avoid redundant actions.
The SOM takes its input from sensor readings and activator states normalized between 0 and 1.
Again like in the CLASM architecture, we build a hierarchy of SOMs by adding SOMs that have, as their input, the activations of the elements of the underlying SOMs, as illustrated below.
Instance-based methods have been successful in learning solutions to partially observable Markov decision processes [McCallum, 1996].
Our algorithm is closely related to these methods in that it stores data describing state and action sequences.
Inspired by the instance-based learning methods we have adapted the activation spreading mechanism used in Mataric's work on Toto [Mataric', 1992] for making information about geographical proximity instantly available, to work in a state-action-time space.
The new activation spreading mechanism provides data about previously visited states to the action selection function.
The resulting behavior can thus deal with hidden state problems.
Based on the activation spreading mechanism for action selection, our algorithm
for control and learning uses a three-step cycle:
Maes has presented an action selection mechanism that combines characteristics of traditional planners and reactive systems, producuing fast and robust behavior while remaining prediction and planning to take place.
Maes' action selection mechanism is based on competence modules, tuples (ci, ai, di, Ai), where ci is a list of preconditions which have to be fulfilled before a competance module can become active, ai and di are lists of effects of the module's actions in terms of an add list and a delete list, and Ai is a module's activation level.
Activation is spread through a network of competence modules by means of successor links, predecessor links, and conflicter links.
Generally the module with the highest activation is selected and its related actions are executed.
Activation originates in the sensors and promotes modules close to the current sensory state by spreading forward.
activation also originates in goal states and promotes modules that changes the
input state toward the goal state by spreading backward.
The SOM elements related to a specific action embody the same information as Maes' competence modules.
By spreading activation forward and backward through the SOM hierarchy, the action selection characteristics of Maes' machanism can be reproduced in learned behaviors.
Hierarchical Self-Organizing Maps for Stateful Robot Behaviors
We must remember that in nature there are neither rewards nor punishments there are consequences [Robert G. Ingersoll, Some Reasons Why]
Research Objectives
Behavior-based systems (BBS)
[Arkin, 1998]
arrange robot controllers into a collection of task-achieving modules or behaviors that, when properly designed, can
produce robust, repeatable and reliable overall behavior for a robot.
BBS have proved successful and have become popular both in research and in commercial applications.
There are however prevailing limitations that have been identified to the general applicability of BBS, in particular in relation to automated generation and reuse of behaviors, as well as to planning using behaviors as symbolic operators [Nicolescu and Mataric', 2002].
Hierachical SOMs for Robot Control
The fundamental data-structure underlying our control algorithm is a growing self-organizing map (SOM).
Inspired by the CLASM architecture, our architecture uses a SOM with an input vector that is divided into two distinct sub-vectors, T1 and T2, as shown in the image below.
The sub-vectors describe the activations of the inputs at two different points in time, thus creating a temporal association between two input states.
The dominant SOM element in each level spreads activation downward to the underlying SOM elements.
The amount of activation diffused to each element is proportional to the relevant weight in the T2 sub-vector of the dominant SOM element's connection weights.
The image below illustrates how activation spreading in a hierarchy through convergence and diffusion can lead to stateful behaviors by letting a state observed at time T1 influence the activation values of the bottom-level SOMs over the following times, T2, T3, and T4.
Bibliography
by
Torbjørn S Dahl.