Introduction
The focus of this work is on modeling the interaction dynamics of mobile robotic systems, i.e., modeling properties of the situated system that arise or evolve during execution as a result of the interaction between the robot(s) and the environment. In particular, we explore on-line modeling (i.e., modeling occurring within the lifetime of a task) in the stochastic and uncertain domain of physically embodied single-robot and multi-robot systems.
Our approach employs Augmented Markov Models (AMMs), a technique we have developed based on semi-Markov chains, but having additional statistics associated with links and nodes. We have developed an algorithm that constructs AMMs with little computational overhead, making it feasible for real-time applications (i.e., within the lifetime of a task). We have demonstrated AMMs in a number of robotics applications: fault detection, affiliation determination, hierarchy restructuring, regime detection, and reward maximization. The AMM-based evaluations used in these applications include statistical hypothesis tests and expectation calculations from Markov chain theory. Each of the applications is experimentally verified using up to four physical mobile robots performing elements of a foraging (collection) task.
The encompassing theme in this work is to provide pragmatic and theoretically sound, general-purpose tools for improving the performance of mobile robotic systems.
Goals
The goals of this project are:
- To model on-line the interaction dynamics between a real-world robot and its environment.
- To use the models in exploring applications not currently addressed in the literature.
Overview of Approach
- Each robot creates an Augmented Markov Model (AMM) during the execution of a task.
- The model states represent behaviors that the robot performs.
- The models capture the dynamics of controller execution in the robot's environment.
Augmented Markov Models (AMMs)
Augmented Markov Models are essentially probabilistic finite state machines augmented with statistics on link and node usage. In particular, an AMM is a subclass of semi-Markov chain. Our model construction algorithm is also able to capture the hidden state associated with second (and higher) order transitions. Below we describe the implementation of AMMs and the model construction algorithm. An AMM is a 5-tuple < S,A,B,L,T >
- S, a set of symbols
{s1, s2, . . . , sM} recognized by the network.
- A, a set of states
(or nodes) {a1, a2, . . . , aN}, each having
four attributes:
- ais,
the symbol recognized by the state.
- aim,
average time steps spent in the state.
- ais2,
variance associated with aim.
- aip,
probability of remaining in ai in the next time step.
- B, an N by M transition
matrix, where bi,k is the next state given ai and sk.
- L, a set of directed
links {l1, l2, . . . , lP}, having six attributes:
- lif,
state from which the link begins.
- lit,
state to which the link connects.
- lid,
number of times the link has been traversed.
- liS,
total time spent in lit, after first traversing
the link.
- liS2,
sum of squares of durations comprising liS.
- lip,
probability of using the link, given the system is in state lif.
- T, a set of elements
{t1, t2, . . . , tQ}, each storing data on
a two-link transition entering and leaving a state, and having five attributes:
- tie,
the link entering the state.
- til,
the link leaving the state.
- tid,
the number of times ti has been traversed.
- tiS,
total time steps spent in the state that til connects
to, after first traversing ti. tiS2,
sum of squares of durations comprising tiS.
Characteristics of AMMs:
- Parsimonious.
- Computationally inexpensive to generate.
- Incrementally generated, allowing anytime use.
- Statistically-based node splitting reduces inconsistencies between training data and model.
Model use with mobile robots:
At each time step, the input
symbol to the AMM uniquely represents a particular subset of the robot's active
behaviors.
AMM Construction Algorithm
A stream of symbols belonging
to S provides the data for model construction. Let sym be the
current input symbol, oldsym the previous input symbol, and ac
the current state. For each input symbol do:
if sym == oldsym then
remain in ac and update aip and the appropriate lS, lp, and tS
else
if sym is in S then
S = S unioned with {sym}
create a new state ai with ais = sym
add the new transition to B
endif
create a link from ac to the new state if necessary
create the two-link transition in T if necessary
update acm, acs2, and ld, lS, lS2, lp, td, tS, tS2 as appropriate
do node-splitting if necessary
transition to the new state
endif
Node-splitting:
For the current state, calculate
the binomial mean and variance for transitions in T with the same in-link.
If this mean is an outlier, then split the current state and attach the current
in-link (with its associated out-links, as indicated in T) to the new state.
Using T, make all probabilities consistent. Update T.
AMM Applications:
Fault Detection:
- Focus on faults that keep the robot in one AMM state: sensor failures, actuator failures, robot becoming physically impeded.
- A statistical confidence interval for the mean time in a state signals outliers as potential faults.
Group Affiliation:
- In order to coordinate activity, robots may need to determine whether they are behaviorally identical.
- Isomorphic AMMs indicate that the robots perform identical tasks.
- Similar AMM transition probabilities indicate comparable experiences in performing the task.
Dynamic Leader Selection:
- Dynamically reorganize a group hierarchy to make the best performing robot the leader.
- Important when the capabilities of the robots are not known or may change due, for example, to failures.
- Performance data is extracted from AMMs.
Regime Detection:
- Detect when there is a significant shift in the global environment that may necessitate a change in the robot's behavior.
- Challenges: robot has no a priori data about the environment and is limited to local sensing.
Reward Maximization:
- The robot must perform optimally in a stochastic, non-stationary environment.
- Challenges: robot must estimate environmental state with no a priori data, and with only limited local sensing. Additionally, the non-stationarity of the environment may range from an abrupt change to a gradual shift.
Experimental Task: Foraging
- Basic task: Robots search for pucks and bring them Home while avoiding obstacles and other robots.
- Dynamic leader selection: Robots are organized in a hierarchy with the most dominant being allowed to deliver its puck first.
- Regime detection: Robots first collect one color of puck then switch to another color when they determine the first is depleted.
- Reward Maximization: There are two colors of pucks, each with different reward values. When the robot finds a puck, it must determine if collecting it gives more reward or leaving it in search of a different color.
Relevant Publications
-
Dani Goldberg and Maja J Mataric', (2000) "Maximizing Reward in a Non-Stationary Mobile Robot Environment," Submitted to Autonomous Agents and Multi-Agent Systems, special issue on the best of Agents 2000 (The Fourth International Conference on Autonomous Agents, Barcelona, Spain, June, 2000). [PDF]
-
Dani Goldberg and Maja J Mataric', (2000) "Learning Multiple Models for Reward Maximization," Proceedings, The Seventeenth International Conference on Machine Learning (ICML-2000), Stanford University, June 29-July 2. [PDF]
-
Dani Goldberg and Maja J Mataric', (2000) "Detecting Regime Changes with a Mobile Robot using Multiple Models," USC Institute for Robotics and Intelligent Systems Technical Report IRIS-00-382. [PDF]
-
Dani Goldberg and Maja J Mataric', (2000) "Reward Maximization in a Non-Stationary Mobile Robot Environment," Proceedings, The Fourth International Conference on Autonomous Agents (Agents 2000), Barcelona, Spain, June 3-7. [PDF]
-
Dani Goldberg and Maja J Mataric', (1999) "Mobile Robot Group Coordination Using a Model of Interaction Dynamics," Proceedings of the SPIE: Sensor Fusion and Decentralized Control in Robotic Systems II, Boston, Massachusetts, pp. 63-73.
-
Dani Goldberg and Maja J Mataric', (1999) "Augmented Markov Models," USC Institute for Robotics and Intelligent Systems Technical Report IRIS-99-367. [PDF]
-
Dani Goldberg and Maja J Mataric', (1999) "Coordinating Mobile Robot Group Behavior Using a Model of Interaction Dynamics," Proceedings, The Third International Conference on Autonomous Agents (Agents '99), Seattle, Washington, May 1-5. [PDF]