We’re all familiar with the sci-fi parallels often drawn around ADAS and autonomous vehicles. Every year at CES journalists reach for words like “futuristic” and “Jetsons-like”, comparing modern vehicles to James Bond’s gadget-filled spy cars or the Batmobile. For many engineers working on vehicle interiors, however, the real reference point was something else entirely: Knight Rider.
KITT wasn’t just fast or autonomous. It communicated. It spoke, listened, warned, joked, and interacted with its driver. The car was not simply a machine; it was part of the experience.
Let me elaborate.
Written by
Jenny Campbell B. Rust
Head of Content
Human senses shape our holistic experience. It’s not only about how the car looks, but how it makes you feel when you’re inside. Sound plays a huge role in that. Throwing a party? You want music with deep rhythm, bass and volume, unless it’s a dinner party, then smooth jazz might be more apropos. Heading to the spa calls for something calmer, perhaps some whale sounds or ambient music, but if the house is on fire, you want the loudest, most obnoxious alarm imaginable to get you out of the building to safety.
Sound is inherent to our situational awareness, our mood, and our understanding of the world around us. Audio, both inside and outside the vehicle, is therefore becoming increasingly important, and it’s a topic we explore across the agenda in Detroit this June.
Last year we heard from Professor Michael Nees of Lafayette College in one of our expert tutorials on which audio signals are most effective for warning drivers and how to avoid confusing or irritating them with alerts. One example that stuck with me was research on “alarm fatigue” in hospitals. A study by researchers at Johns Hopkins found clinicians could be exposed to thousands of device alarms per day, the vast majority of which were non-actionable. Over time the constant beeping becomes background noise, meaning genuinely urgent alerts risk being missed. Translate this to automotive, and safety-critical alerts can be ignored or turned off altogether (he touched on this research in the blog wrote for us previously). He later joined a panel to discuss designing for distraction-free driving, and this year returns to moderate the discussion Ready or Not: What Takeover Readiness Really Means in the Shift to L3. Joining him will be experts from IIHS, Bosch, Cirrus Logic, JTEKT, and General Motors. The panel will examine the role of audio in takeover alerts as vehicles transition toward Level 3 automation. Audio alerts are an essential part of multimodal warning systems for most, but they also raise broader questions about how vehicles communicate clearly and effectively with drivers.
Of course, not everyone in the car needs those alerts. Most passengers have headphones today, but what if they didn’t need them? What if passengers could listen to different content without distracting the driver or each other, while still being able to talk normally without yelling at someone to remove their headphones?
In the track AI-Native IVI & Multi-Passenger Interaction, GPU Audio will explore real-time audio processing using GPUs, NPUs and APUs already present in modern infotainment systems. Advances in GPU-based audio processing are allowing complex audio workloads to run on shared infotainment compute platforms rather than dedicated DSP hardware. This shift enables significantly more sophisticated real-time signal processing without requiring additional hardware. At the same time, new signal-processing techniques are enabling distinct audio “bubbles” within the cabin. By modelling the acoustic environment of the vehicle and carefully controlling constructive and destructive interference, systems can create isolated listening zones that minimise sound leakage between occupants. The result is a more personalised, multi-passenger audio experience that reflects the shift toward the cabin as a shared digital space rather than a single listening environment.
Voice interaction is another important part of this picture. In the same track, Myelin Foundry will discuss AI-native entertainment experiences such as karaoke (car-aoke?). Voice activity detection, pitch tracking and speech interaction enable interactive entertainment including collaborative games, conversational interaction and multi-passenger engagement.
While we are on the topic of voice interaction, I have a question for the community. Is anyone exploring voice detection or microphone analysis within the DMS sensing stack to help identify intoxicated drivers through altered or slurred speech patterns? I have not found much evidence of this yet and would be interested to understand why. Feel free to drop me a message on LinkedIn if you have thoughts.
Combining speech signals with vision-based sensing allows systems to model passenger engagement and dynamically manage interactions between multiple occupants.
China, unsurprisingly, is already moving quickly in this direction. Last year in Hefei we hosted a NIO demo car and enjoyed a closing keynote from Frank Du, Senior Director of Architecture and Innovation of the Digital Cockpit Department. He introduced many Western attendees to NOMI, an interactive companion mounted on top of the dashboard. The small rotating display acts as a visual avatar for the vehicle’s voice assistant, turning to face occupants, responding to voice commands, and controlling features like music, climate, and navigation.
While many Western OEMs are still debating whether vehicles should integrate with Siri or Alexa, NIO has built its own native in-car AI assistant. The animated dashboard module that represents NOMI is an optional add-on costing roughly $700 (US), and it has become one of the brand’s signature features. Owners can even personalise the assistant with different animations and accessories, turning it into something closer to a digital companion than a traditional voice interface and creating an ongoing stream of post-purchase revenue for NIO. Genius.
Finally, for those wondering about the underlying architectures, the track The Intelligent Cockpit: Multi-Modal Sensing and Fusion for Enhanced UX will include a presentation from Texas Instruments titled Building the Backbone for Intelligent Cockpits: Ethernet Ring Architecture for Multi-Modal Sensor Fusion and Audio Integration.
As vehicles move toward zonal architectures and high-speed Ethernet backbones, the network infrastructure inside the vehicle is becoming just as important as the sensors themselves. Audio streams are increasingly being carried alongside camera feeds, biometric sensing, and other interior data within deterministic network architectures built on technologies such as Time-Sensitive Networking (TSN) and Audio Video Bridging (AVB). This enables synchronized, low-latency communication between distributed zone control modules and central compute platforms, allowing audio systems, voice interfaces, and multimodal sensing stacks to operate reliably as part of a unified cockpit platform.
Audio sits at the intersection of safety, interaction, and experience. The same systems that deliver takeover alerts or safety warnings must also support natural voice interaction and immersive entertainment. Microphone arrays, spatial audio processing, and multimodal sensing are increasingly working together to interpret occupant behaviour, respond to spoken commands, and shape the overall atmosphere of the cabin. Whether it is warning a driver at the right moment, enabling conversational interfaces, or creating personalised sound environments for passengers, audio is becoming a core component of how intelligent cabins communicate with the people inside them.
This is why we are expanding both the agenda and the show floor to explore the topic further. Alongside the conference sessions, attendees will be able to see and interact with new technologies on the exhibition floor, from next-generation audio processing platforms to multimodal cockpit sensing systems and interactive passenger experiences. Companies like Syntiant will be showcasing advanced audio, sensing, and edge‑AI technologies that demonstrate how intelligent in‑cabin and exterior sound awareness is evolving in next‑generation vehicles. These demonstrations provide a chance to move beyond theory and see how emerging approaches to voice interaction, spatial audio, and multimodal sensing are being translated into real vehicle architectures. As the industry continues to rethink the role of the vehicle interior, audio is quickly emerging as one of the key technologies shaping the intelligent cockpit.
Syntiant develops ultra-low-power neural processors, sensors, and AI models that enable machines to continuously hear and understand the world around them. In the automotive industry, this technology allows vehicles to support always-on voice interaction—enabling drivers and passengers to control functions like doors, trunks, and infotainment through simple voice commands. At the same time, Syntiant’s edge AI solutions help vehicles remain aware of critical sounds outside the car, such as approaching emergency vehicles, even in challenging conditions like wind, rain, or road noise. By bringing intelligent audio sensing directly to the edge, Syntiant helps create safer, more intuitive, and more comfortable driving experiences for the next generation of smart vehicles.
DID YOU KNOW?
Ticket holders for InCabin USA also get complimentary access to the AutoSens USA conference? AutoSens is the leading event for the ADAS and AD ecosystem, covering sensing and perception modalities, architectures, testing and validation, and many more besides.