Multi-modal conversations — the essence of human-computer interaction

Aziz Ghadiali
5 min readNov 19, 2021
Photo by Mark Farías on Unsplash

Everyone’s heard of it, some have felt pain interacting with it, but many know the promise. Conversational AI is the next wave of technological innovation providing people with interactions beyond the interface. Even with the promise of this technology, one of the biggest challenges to be explored is how do we use it the right way?

I’m sharing my personal reflection on what I believe conversational AI affords us and how to best understand the ways in which we can successfully utilize it to re-imagine human-computer experiences.

Starting with the basics — defining “human-computer conversations”

One interesting approach to understanding how conversational AI fits into human-computer interactions is to consider the interactions between humans and computers as goal-oriented conversations where each party is conversing with the other through a variety of modalities in order to achieve some known (or partially known) objective. Perhaps we can separate this “conversation” into three distinct pieces: the context of interaction, the abilities of each party, and the modes of interaction.

Context

Context is the cornerstone of UX design because it helps us understand the situation of a user and computer when they are conversing. Contextual information includes a wide variety of factors such as the environment, mood, time of day, previous interactions, and much more.

Why is it important?

Any effective conversation requires contextual knowledge. A primary reason that it’s harder to communicate with another person over text is that there is less context that you have to work with. Similarly, systems need to be designed to account for and capture a sufficient level of context around the current interaction to know what the job to be completed is and how to best operate.

Ability

In every human-computer conversation, both parties bring certain abilities to the interaction. Humans can bring things like wisdom, creativity, empathy, physical abilities, and the ability to process and make sense of ambiguous information. Computers on the other hand bring archival capabilities, automation, data manipulation, and other methods of augmenting human intelligence.

Each party also brings a set of constraints. Humans bring an inability to remember large amounts of information accurately and process it quickly. Computers bring an inability to understand and operate on human emotion and creativity or anything else that we don’t understand enough about humans to program.

Why is it important?

Having a deep understanding of the abilities of both the human and computer in any conversation is important to understand each party’s role and ways in which the strengths of each entity can be maximized.

Modes of interaction

Humans have always had different ways of interacting with each other whether it be talking, writing, or even sending visual signals through hand gestures or facial expressions. Similarly, we have different ways of interacting with computers by using keyboards, mice, and touch to talk to the computer and multi-media displays, haptics, and other sensory outputs that help the computer respond back to us. The goal of these interaction modalities is to help us and the computer successfully communicate with each other about our goals, contexts, and abilities.

Why is it important?

The chosen modes of interaction are often directly coupled with a particular type of interface or device (technology-first). When trying to design an effective conversation between a human and a computer we need to take a context-first approach where we let the context of interaction guide our understanding of the specific interaction modalities available and devices to be used (human-first).

Where does conversational AI fit in?

Conversational AI is a new mode of interaction that provides computers with the ability to converse with people in a more human-centric way and understand more information about the context of interaction to better work with the user.

One of the most common terms used today to talk about conversational AI is the conversational user interface (CUI), a separate interface for pure voice or chat-based interactions between a human and computer. One of the biggest issues in using the term CUI is that it does not accurately represent what conversational AI is, which is a new interaction modality. When we first introduced the touchscreen, we did not call it a touch user interface. It was a new way for us to converse with computers using our fingers. The term CUI does a level of injustice to what conversational AI really is: A new mode of interaction for us to converse with a computer that is more natural to the way that we interact with other humans. It is the first mode of interaction that makes computers become better at adapting to our way of conversing rather than theirs.

Why is this differentiation important?

Understanding that conversational AI is not here to replace our current modes of interactions but, provide us with the ability to create more well-rounded human-computer experiences, we can view how to integrate this technology in a more holistic way. By understanding what conversational AI affords us as an interaction modality rather than an interface, we can maximize its utility in products and services rather than creating customer service chatbots that simply represent glorified wizards.

We need to shift from a mindset where the focal points of conversational AI are voice agents, chatbots, or IVRs that are separate interfaces so that we can analyze how conversations that people are currently having with technology can be augmented or re-architected to incorporate this new technology in place of inferior interaction modalities. Rather than building applications that only use voice, we can build a connected ecosystem of visual and non-visual computer interfaces that allow for the use of multiple interaction modalities simultaneously allowing the user to switch freely between interaction modes.

The challenge then becomes understanding how we can combine these interaction modalities together, to collectively improve the conversations that we have with technology every day.

This is one of many perspectives on the state of human-computer interaction and conversational AI today. I’d love to hear your thoughts and critiques in the comments to further this discussion!

--

--

Aziz Ghadiali

Using my time to explore innovative ideas with conversational AI. Master of Human-Computer Interaction from Carnegie Mellon.