Abstract:
Human communication is multimodal in nature. In a normal environment, people use to interact with other humans and with the environment using more than one modality or medium of communication. They speak, use gestures and look at things to interact with nature and other humans. By listening to the different voice tones, looking at face gazes, and arm movements people understand communication cues. A discussion with two people will be in vocal communication, hand gestures, head gestures, and facial cues, etc. [1]. If textbook definition is considered synergistic use of these interaction methods is known as multimodal interaction [2]. For example, , a wheelchair user might instruct the smart wheelchair or the assistant to go forward, as shown in Fig. 1(a). However, with a hand gesture shown in the figure, he or she might want to go slowly. In the same way as of Fig. 1(b), a person might give someone a direction with a vocal command ‘that way’ and gesture the direction with his or her hand.
In most Human-Robot Interaction (HRI) developments, there is an assumption that human interactions are unimodal. This forces the researchers to ignore the information other modalities carry with them. Therefore, it would provide an additional dimension for interpretation of human robot interactions. This article provides a concise description of how to adapt the concept of multimodal interaction in human-robot applications.