Introduction
The focus of this work is on modeling the interaction dynamics of mobile robotic systems, i.e., modeling properties of the situated system that arise or evolve during execution as a result of the interaction between the robot(s) and the environment. In particular, we explore online modeling (i.e., modeling occurring within the lifetime of a task) in the stochastic and uncertain domain of physically embodied singlerobot and multirobot systems.
Our approach employs Augmented Markov Models (AMMs), a technique we have developed based on semiMarkov chains, but having additional statistics associated with links and nodes. We have developed an algorithm that constructs AMMs with little computational overhead, making it feasible for realtime applications (i.e., within the lifetime of a task). We have demonstrated AMMs in a number of robotics applications: fault detection, affiliation determination, hierarchy restructuring, regime detection, and reward maximization. The AMMbased evaluations used in these applications include statistical hypothesis tests and expectation calculations from Markov chain theory. Each of the applications is experimentally verified using up to four physical mobile robots performing elements of a foraging (collection) task.
The encompassing theme in this work is to provide pragmatic and theoretically sound, generalpurpose tools for improving the performance of mobile robotic systems.
Goals
The goals of this project are:
 To model online the interaction dynamics between a realworld robot and its environment.
 To use the models in exploring applications not currently addressed in the literature.
Overview of Approach
 Each robot creates an Augmented Markov Model (AMM) during the execution of a task.
 The model states represent behaviors that the robot performs.
 The models capture the dynamics of controller execution in the robot's environment.
Augmented Markov Models (AMMs)
Augmented Markov Models are essentially probabilistic finite state machines augmented with statistics on link and node usage. In particular, an AMM is a subclass of semiMarkov chain. Our model construction algorithm is also able to capture the hidden state associated with second (and higher) order transitions. Below we describe the implementation of AMMs and the model construction algorithm. An AMM is a 5tuple < S,A,B,L,T >
 S, a set of symbols
{s_{1}, s_{2}, . . . , s_{M}} recognized by the network.
 A, a set of states
(or nodes) {a_{1}, a_{2}, . . . , a_{N}}, each having
four attributes:
 a_{i}^{s},
the symbol recognized by the state.
 a_{i}^{m},
average time steps spent in the state.
 a_{i}^{s2},
variance associated with a_{i}^{m}.
 a_{i}^{p},
probability of remaining in a_{i} in the next time step.
 B, an N by M transition
matrix, where b_{i,k} is the next state given a_{i} and s_{k}.
 L, a set of directed
links {l_{1}, l_{2}, . . . , l_{P}}, having six attributes:
 l_{i}^{f},
state from which the link begins.
 l_{i}^{t},
state to which the link connects.
 l_{i}^{d},
number of times the link has been traversed.
 l_{i}^{S},
total time spent in l_{i}^{t}, after first traversing
the link.
 l_{i}^{S2},
sum of squares of durations comprising l_{i}^{S}.
 l_{i}^{p},
probability of using the link, given the system is in state l_{i}^{f}.
 T, a set of elements
{t_{1}, t_{2}, . . . , t_{Q}}, each storing data on
a twolink transition entering and leaving a state, and having five attributes:
 t_{i}^{e},
the link entering the state.
 t_{i}^{l},
the link leaving the state.
 t_{i}^{d},
the number of times t_{i} has been traversed.
 t_{i}^{S},
total time steps spent in the state that t_{i}^{l} connects
to, after first traversing t_{i}. t_{i}^{S2},
sum of squares of durations comprising t_{i}^{S}.
Characteristics of AMMs:
 Parsimonious.
 Computationally inexpensive to generate.
 Incrementally generated, allowing anytime use.
 Statisticallybased node splitting reduces inconsistencies between training data and model.
Model use with mobile robots:
At each time step, the input
symbol to the AMM uniquely represents a particular subset of the robot's active
behaviors.
AMM Construction Algorithm
A stream of symbols belonging
to S provides the data for model construction. Let sym be the
current input symbol, oldsym the previous input symbol, and a_{c}
the current state. For each input symbol do:
if sym == oldsym then
remain in a_{c} and update a_{i}^{p} and the appropriate l^{S}, l^{p}, and t^{S}
else
if sym is in S then
S = S unioned with {sym}
create a new state a_{i} with a_{i}^{s} = sym
add the new transition to B
endif
create a link from a_{c} to the new state if necessary
create the twolink transition in T if necessary
update a_{c}^{m}, a_{c}^{s2}, and l^{d}, l^{S}, l^{S2}, l^{p}, t^{d}, t^{S}, t^{S2} as appropriate
do nodesplitting if necessary
transition to the new state
endif
Nodesplitting:
For the current state, calculate
the binomial mean and variance for transitions in T with the same inlink.
If this mean is an outlier, then split the current state and attach the current
inlink (with its associated outlinks, as indicated in T) to the new state.
Using T, make all probabilities consistent. Update T.
AMM Applications:
Fault Detection:
 Focus on faults that keep the robot in one AMM state: sensor failures, actuator failures, robot becoming physically impeded.
 A statistical confidence interval for the mean time in a state signals outliers as potential faults.
Group Affiliation:
 In order to coordinate activity, robots may need to determine whether they are behaviorally identical.
 Isomorphic AMMs indicate that the robots perform identical tasks.
 Similar AMM transition probabilities indicate comparable experiences in performing the task.
Dynamic Leader Selection:
 Dynamically reorganize a group hierarchy to make the best performing robot the leader.
 Important when the capabilities of the robots are not known or may change due, for example, to failures.
 Performance data is extracted from AMMs.
Regime Detection:
 Detect when there is a significant shift in the global environment that may necessitate a change in the robot's behavior.
 Challenges: robot has no a priori data about the environment and is limited to local sensing.
Reward Maximization:
 The robot must perform optimally in a stochastic, nonstationary environment.
 Challenges: robot must estimate environmental state with no a priori data, and with only limited local sensing. Additionally, the nonstationarity of the environment may range from an abrupt change to a gradual shift.
Experimental Task: Foraging
 Basic task: Robots search for pucks and bring them Home while avoiding obstacles and other robots.
 Dynamic leader selection: Robots are organized in a hierarchy with the most dominant being allowed to deliver its puck first.
 Regime detection: Robots first collect one color of puck then switch to another color when they determine the first is depleted.
 Reward Maximization: There are two colors of pucks, each with different reward values. When the robot finds a puck, it must determine if collecting it gives more reward or leaving it in search of a different color.
Relevant Publications

Dani Goldberg and Maja J Mataric', (2000) "Maximizing Reward in a NonStationary Mobile Robot Environment," Submitted to Autonomous Agents and MultiAgent Systems, special issue on the best of Agents 2000 (The Fourth International Conference on Autonomous Agents, Barcelona, Spain, June, 2000). [PDF]

Dani Goldberg and Maja J Mataric', (2000) "Learning Multiple Models for Reward Maximization," Proceedings, The Seventeenth International Conference on Machine Learning (ICML2000), Stanford University, June 29July 2. [PDF]

Dani Goldberg and Maja J Mataric', (2000) "Detecting Regime Changes with a Mobile Robot using Multiple Models," USC Institute for Robotics and Intelligent Systems Technical Report IRIS00382. [PDF]

Dani Goldberg and Maja J Mataric', (2000) "Reward Maximization in a NonStationary Mobile Robot Environment," Proceedings, The Fourth International Conference on Autonomous Agents (Agents 2000), Barcelona, Spain, June 37. [PDF]

Dani Goldberg and Maja J Mataric', (1999) "Mobile Robot Group Coordination Using a Model of Interaction Dynamics," Proceedings of the SPIE: Sensor Fusion and Decentralized Control in Robotic Systems II, Boston, Massachusetts, pp. 6373.

Dani Goldberg and Maja J Mataric', (1999) "Augmented Markov Models," USC Institute for Robotics and Intelligent Systems Technical Report IRIS99367. [PDF]

Dani Goldberg and Maja J Mataric', (1999) "Coordinating Mobile Robot Group Behavior Using a Model of Interaction Dynamics," Proceedings, The Third International Conference on Autonomous Agents (Agents '99), Seattle, Washington, May 15. [PDF]