|Introduction||Interaction Scenario||Modeling User Exercise Poses||Model Validation||Pilot Study||Conclusions||Support||Contact|
Socially Assistive Robotics (SAR) is an area of Human-Robot Interaction (HRI) that focuses on aiding through social rather than physical interaction between the robot and the human user, and has the potential to enhance the quality of life for large user populations, including the elderly, people with physical impairments and those involved in rehabilitation therapy. The world's population is growing older, thereby introducing a wide array of societal challenges. It is estimated that in 2050 there will be three times more people over the age 85 than there are today. Many are expected to need physical and cognitive assistance.
The demand for such assistance is quickly outpacing the available supply of available and affordable human care. As the elderly population continues to grow, a great deal of attention and research will be dedicated to assistive systems that allow the elderly to live independently in their own homes. As such, the main purpose of socially assistive robotics technology is not to replace human care-givers, but rather to provide assistance where human assistance is not available or affordable.
Our focus is on designing and studying the effectiveness of socially assistive robotics technology towards providing affordable customized care for individuals in need of assistance, with particular emphasis on how the user’s intrinsic motivation can be influenced by a socially assistive robot in order to maximize the probability of success of the therapeutic intervention.
This project is an experimental implementation of a socially assistive robot whose purpose is to motivate users to exercise by engaging in a simple seated arm exercise scenario. The overall goal of the pilot study is to evaluate and validate the effectiveness of the probabilistic models of user activity in order to gain insight for future, long-term studies involving the intended user populations in need of such customized assistance, such as older adults and/or stroke patients.
In this project, we extended our previous work with the elderly performing "chair exercises" guided by a socially assistive robot. The exercise scenario utilized a socially assistive robot to instruct, evaluate, and encourage users to perform simple arm gesture exercises. The scenario was one-on-one, allowing the robot to focus its attention on the single user in order to provide timely, accurate feedback, and to maximize the effectiveness of the exercise session for the user. In the set up, the user was seated in a chair in front of the robot; the user and robot faced each other. The developed activity monitoring systems affords the robot the ability to track the user's arm movements; the use of the Kinect sensor (as opposed to a monocular camera) is an extension of our previous work, in that robot and arm motion is no longer restricted to the sides of the body (i.e., non-planar).
During exercise sessions, the robot asked the user to perform simple seated arm gesture exercises. This type of seated exercise, called "chair exercise" or "chair aerobics", is commonly practiced in senior living facilities and provides grounding for our exercise system. Chair exercises are highly regarded for their accessibility to those with low mobility, for their safety as they reduce the possibility of injury due to falling from improper balance, and for their health benefits such as improved flexibility, muscle strength, ability to perform everyday tasks, and even memory recall.
The robot monitored the user in a workout game designed to help keep the user engaged and motivated throughout the exercise sessions. The robot filled the role of a traditional exercise instructor by showing the user which arm gesture exercises to perform and asking the user to imitate. The robot gave the user feedback in real time, providing corrections when appropriate (e.g., "Raise your left arm and lower your right arm" or "Bend your left forearm inwards a little"), and praise in response to each successful imitation (e.g., "Great job!" or "Now you've got the hang of it!".
The user was able to communicate with the robot through the use of a wireless button control interface -- the popular Wiimote remote control -- which communicates via Bluetooth with the button labels modified to suit our exercise system. There are two buttons available to the user to respond to prompts from the robot, labeled "Yes" and "No", and one button for the user to request a rest break at any time during the interaction.
Our previous work described above utilized a monocular camera and 2-D body segmentation to recognize exercise poses; however, this 2-D approach limited these poses to the plane of a backdrop behind the user. To extend this for future investigations, we used the Microsoft Kinect to perform full 3-D body pose estimation. With this added functionality, we extended our library of desired exercise poses to include those in front of the body (anterior), increasing the number of exercises from 25 (in our previous work) to 100 (in our current work).
Each of the 100 selected exercise poses was performed by a healthy young adult volunteer. We recorded the camera image (from the Kinect) of the volunteer performing the exercise. The resulting poses would then be shown by the robot for the purpose of the user matching the displayed exercise pose. We asked five healthy young adult participants to match each exercise pose that was displayed by the robot. Visual (camera image) and kinematic (estimated body pose) data from the Kinect were recorded for each participant in this entire data collection. Each participant would first assume an idle pose (hands at side), then match the pose. Kinematic data from the matched pose would serve as a “positive” (correct) sample of the displayed pose (30 Kinect frames were used per sample); all kinematic data in between poses served as “negative” (incorrect) samples of the displayed pose. This approach was taken to measure the perception of the pose (as opposed to the pose instructed by the researchers), as this is how the elderly participants in the exercise scenario would experience the interaction.
Labeled (positive) data were then used to train a probabilistic binary classifier for each pose. In this implementation, each exercise pose was represented by a vector (of size 12) containing 3-D Gaussian (normal) distributions over each of the positions (x, y, z) of four of the user’s joints: (1) left hand, (2) left elbow, (3) right hand, and (4) right elbow. These positions were scaled and normalized based on limb (arm) length, and were with respect to the user's neck position; the orientation (roll, pitch, and yaw) of the neck frame was set to that of the "world" frame to normalize for user posture. The robot could then select an exercise pose to perform, and compare the fit of the user's upper-body kinematic state to that of the model of the pose. If the user's body pose was within a parameterized number (for our purposes, 2) of standard deviations of the mean for each joint, the classifier would respond with a "positive" match; however, if the user's body pose was outside of this number of standard deviations of the mean for at least one joint, the classifier would respond with a "negative" (non) match. In the case of a negative match, the robot could then refer to the difference in poses, and could provide spoken feedback (as in our previous work) based on the one or many joints that are in error.
To validate our model of 3-D body pose estimation and matching, we measure the performance of our probabilistic binary classifier of user pose match with respect to the metrics of Sensitivity and Specificity. The higher the sensitivity and the specificity, the fewer poses will be misclassified.
To do this, we performed leave-one-out cross-validation over the aforementioned collected data -- that is, we trained our model using N - 1 samples of labeled training data, and then used the trained model to classify the N-th sample; we do this iteratively for each sample in the collected data (a computationally expensive process) to generate a confusion matrix containing:
We collected positive samples for 100 specified poses (each sample consisting of 30 Kinect frames) from five different participants (for a total of 700 positive samples, or 21,000 sample frames); Kinect frames in between the specified poses served as negative samples.
Using leave-one-out cross-validation, we generated the following confusion matrix (presented as in a list format):
Sensitivity (or "True Positive Rate") is the probability that the model will correctly classify a sample pose as the specified pose; it is expressed as the ratio of the sample poses that were correctly classified by the model as a particular pose (TP) to all of the sample poses that were labeled as a particular pose (TP + FN):
Thus, with high sensitivity, fewer poses will go misclassified as NOT the specified pose. This means that the system is 94.7% confident in its classification of the specified pose when giving positive feedback to the user.
Specificity (or "True Negative Rate") is the probability that the model will correctly classify a sample pose as NOT the specified pose; it is the ratio of the sample poses that were correctly classified by the model as NOT a particular pose (TN) to all of the sample poses that were labeled as NOT a particular pose (TN + FP):
Thus, with high specificity, fewer poses will go misclassified as the specified pose. This means that the system is 94.0% confident in its classification of NOT the specified pose when giving negative (corrective) feedback to the user.
A small pilot study was conducted based on the interaction scenario described above. Older adult participants performed the described exercise scenario for a duration of 10 minutes; the participants were monitored by the Kinect sensor (mounted on the robot) and received speech feedback from the robot. This feasibility study served solely as an end-to-end proof-of-concept implementation of the Kinect-based Gaussian pose-match classifier in the robot exercise instructor context. As such, no general conclusions can be drawn about the exercise system; however, many qualitative and anecdotal results are promising.
Seven participants were recruited to participate in the pilot study. There were five female participants and two male participants. Participant ethnicity was very diverse, and included Caucasian, African-American, Hispanic, Asian, Indian, Italian, and Armenian. All of the participants were seniors over the age of sixty-five.
In a questionnaire given after the exercise scenario, each participant was asked to evaluate terms that best described his/her perception of robot's level of entertainment, value or usefulness, physical and social attractiveness, utility, intelligence, partnership, motivation, and social presence. Each participants were also asked about his/her level of familiarity with technology. The rating scale was a ten-point Likert scale, with the one end of the scale representing "Very Strongly Disagree" and the other end of the scale representing "Very Strongly Agree".
While no statistically significant conclusions can be drawn from the survey questions (due to the small size of the population), the responses are encouraging. Participants indicated that they perceived the robot to be highly useful, attractive, and intelligent, and that the robot highly motivated them to exercise. Participants rated the robot as providing a low-to-moderate level of entertainment; this is likely due to repetitive phrasing selected by the robot when providing feedback to the participant over the 10-minute interaction, and suggests room for improvement in speech content generation in future work. On average, participants reported low-to-moderate familiarity with technology.
One participant stopped the session halfway through due to fatigue; thus, six participants were able to complete the entire exercise session. The questionnaire was administered for all seven participants. This failure case is interesting and challenging, and provides opportunity to explore models of user fatigue in future work.
We have presented the design, implementation, and evaluation of a probabilistic (Gaussian) binary classifier for user activity, specifically, user arm pose matching to an exercise pose displayed by a robot exercise instructor. A feasibility study was conducted with a small group of elderly participants. The results are encouraging, and further promote the use of socially assistive robots in motivating users to perform simple physical exercises. Future work includes providing more interesting/entertaining feedback to the user, as well as investigating models of user fatigue. We would also like to increase the number of participants, and extend the duration and number of sessions of the study to investigate the long-term effectiveness of the socially assistive robot exercise system.