Multimodal Interaction Design: Voice + Touch for Enterprise Environments
Lead UX Designer | Interaction Design & Prototyping
Emerging Technology | Voice + Touch Interface Design
Business Challenge
Conference room technology had become powerful but inaccessible. Non-technical users — educators, office workers, meeting organizers — were intimidated by AV systems designed for specialist installers. The question wasn't whether voice control was technically possible. It was whether we could layer voice and touch modalities in a way that felt trustworthy, discoverable, and recoverable for users who had never interacted with a voice-enabled system before.
My Role
Led interaction design for Extron's voice-enabled conference room experience, integrating Amazon Alexa with existing touch panel interfaces. Responsible for defining the multimodal interaction model — determining when voice should lead, when touch should confirm, and how the system should communicate state, intent, and failure across both channels simultaneously.
Impact At A Glance
✓ Designed Extron's first voice-enabled interaction model — establishing patterns for discoverability, feedback, and error recovery across voice and touch simultaneously
✓ Solved the "what do I say?" problem — created signifier systems and help overlays that made voice capabilities legible to non-technical users in shared physical environments
✓ Established multimodal state communication patterns — LED feedback, on-screen confirmation, and audible response working in concert to build user trust and confidence
✓ Prototyped cross-channel failure recovery — designed graceful degradation paths when voice recognition failed, ensuring users could always fall back to touch without losing context
Discoverability display in conference room — users need to know voice is available before they can use it.
Strategic Challenge
The Design Problem: Voice interfaces in 2019 had a fundamental discoverability problem — users couldn't know what commands were available, couldn't tell if the system was listening, and had no reliable way to recover when something went wrong.
The Real Challenge: Layering modalities without creating confusion
Adding voice to an existing touch interface created three distinct interaction design problems:
Discoverability: How does a user in a shared room know voice commands are available and what they can say?
State communication: How does the system signal across both voice and touch simultaneously that it heard, understood, and executed a command?
Trust and recovery: When voice fails — misrecognition, ambient noise, unsupported commands — how does the system recover gracefully without leaving the user stranded?
The Strategic Frame: This wasn't a technology integration project. It was a trust design problem — how do you make a novel interaction modality feel safe and reliable for users who have no prior mental model for it?
Anatomy of an Alexa Skill
Alexa Overlay (automatically appears and retreats with user voice commands)
Voice command icon
LED light which flashes to provide additional feedback
What the system “heard” the user say
The command being executed
Help icon - when pressed reveals more information about Alexa and commands
Cancel icon - when pressed will cancel the command
As a part of the multi-modal experience, we looked to provide feedback and additional controls and assistance through the touch device interfaces found in most meeting rooms.
PROCESS:
I approached this as a modality design problem rather than a feature addition — the question wasn't how to add voice to the existing touch interface, but how to define the appropriate role of each modality within a coherent interaction model.
Phase 1: Modality Mapping — Defined which tasks were appropriate for voice versus touch versus both, based on context, cognitive load, and physical environment constraints
Phase 2: State Communication Design — Designed the feedback system spanning LED hardware signals, on-screen overlays, and audible responses to communicate system state across all channels simultaneously
Phase 3: Discoverability Patterns — Designed signifier systems and contextual help overlays making voice capabilities legible without requiring prior training
Phase 4: Failure Recovery Design — Prototyped graceful degradation paths for misrecognition, unsupported commands, and ambient noise scenarios
Alexa Extended Help Overlay
Alexa Extended Help Overlay
Help arrives just as the user needs it
If the user selects the help icon from the Alexa overlay, they receive additional information on how to interact with a particular command.
Key Decisions
-
The most intuitive model for non-technical users was to let voice initiate and the touch panel confirm — giving users a visual checkpoint before irreversible actions. This reduced anxiety about accidental commands and built trust incrementally.
-
Displaying what the system "heard" on screen before executing the command. This provides users a moment to catch misrecognition and allows them to correct. This single pattern resolved the most common failure mode in early testing.
-
Rather than a global help screen, the extended help overlay appeared in context with the specific command being executed; showing alternative phrasings and add-ons relevant to what the user was already doing. Help that arrives when you need it is fundamentally different from documentation you have to seek out.
Results
✓ Established Extron's first systematic multimodal interaction model — patterns for voice + touch layering adopted as the foundation for future voice-enabled products
✓ Resolved the core trust problem — by showing system understanding before execution and providing contextual recovery paths, users who had never used voice controls could operate the system confidently without training
✓ Created reusable patterns applicable beyond AV — the discoverability, state communication, and failure recovery frameworks developed here directly inform how I approach AI interaction design today, where the same fundamental challenges of trust, transparency, and graceful degradation apply
Sample Skill: Start The Meeting
User triggers Alexa by some variance of: "Alexa, can you start the meeting?"
a. The front LED on the touch panel flashes to indicate it is "listening"The touch panel shows what it heard the user say
If successful, the touch panel shows it executing the command
a. The front LED on the touch panel glows solid green to indicate it was successful.
b. Alexa responds audibly with some variant of "Starting your Test Meeting now".
The Alex modal retreats after 5 seconds of inactivity
Prototype of Alexa On Screen Skill process