Development
9
min read

AI software development insights from a real-time voice app prototype

Nick Hu
August 19, 2025
Modern workspace with a laptop, phone, and notes, reflecting on the process of developing AI software with real-time voice.

This post accompanies our AI case study with Total Inter Action, diving deeper into the technical process and learnings from the build and what it takes to develop AI software in a real-world scenario.. Here, I break down how we built a real-time, voice-based AI prototype using GPT-4o, LiveKit, Pinecone, and more.

Building a real-time AI voice app

We were engaged by Total Inter Action to deliver a prototype application that simulates a sales role-play call using real-time audio, powered by a custom GPT agent.

The goal was to create a dynamic, voice-based training tool that enables users to practise sales conversations with preloaded customer profiles and receive structured AI-generated feedback.

This prototype was a proving ground, combining emerging tools like LiveKit Agents, OpenAI GPT, Pinecone, AWS, and React Native to explore what’s technically possible.

Core technologies behind the AI prototype

This was an exciting opportunity to experiment with technologies that are still very new to many developers, especially those looking to develop AI software with real-time capabilities. Some of these tools are incredibly powerful but also come with sharp edges, especially when you start to integrate them in real-time applications. Here’s the stack that powered the prototype:

Backend

  • LiveKit: Real-time WebRTC-based audio communication
  • LiveKit Agent: Custom-coded AI audio bot hosted on AWS EC2
  • OpenAI GPT: Handles real-time audio input/output, no text conversion required
  • OpenAI Embeddings + Pinecone: We converted the instruction set and customer profiles into embeddings and stored them in Pinecone.
  • AWS EC2: Hosts the agent and Express server for token generation

Frontend

  • React Native + Expo: Framework for cross-platform app development and deployment
  • Tamagui: UI component library for rapid prototyping

Highlights and technical insights

Once we had the core pieces connected and a working end-to-end prototype, the real learning began. This phase surfaced the most interesting (and occasionally surprising) parts of the project, where the reality of real-time AI collided with user experience, infrastructure limitations, and practical constraints.

Real-time audio interactions

GPT enabled near-instant voice conversations. Because it accepts audio input and responds directly via audio, there was no need for text-to-speech or speech-to-text bridges. This reduced latency and made for a far more natural conversational experience, with tone, pacing, and contextual awareness all coming through.

The result was natural, voice-based role-plays driven entirely by GPT, without relying on a traditional chatbot interface.

LiveKit Agent functions

Using LiveKit Agent functions gave us real-time control over app behaviour, including:

  • Moving between role-play stages
  • Generating JSON-structured feedback at session end

This functionality proved essential in linking the AI with the app’s frontend, creating a seamless flow.

Dynamic persona management with Pinecone

Rather than hardcoding customer profiles, we stored them as vector embeddings. Pinecone enabled fast, dynamic lookup and context switching based on the user’s role-play selection. This allowed the AI to understand the context of the interaction, retrieve the most appropriate information from a stored database and adapt dynamically to the situation.

AI Live Chat Software On A Mobile Phone

AI software development process learnings

Building this prototype wasn’t just about stitching components together, it was a deep dive into the unpredictable realities of AI voice interaction. Our aim was to create something that felt genuinely human to talk to, but that meant facing the friction points head-on.

From reconnection issues to surprise accessibility quirks, this section unpacks what worked, what didn’t, and what we’d approach differently next time.

Cost per conversation

Real-time GPT usage costs ranged from $1–$3 per session. This would need to be optimised for scale, as cheaper models typically require separate speech layers, increasing complexity and latency.

Accessibility considerations

Issues like high contrast mode hiding text reminded us to rigorously test across device accessibility settings. These must be factored into any production-ready version.

Connection management

We discovered limitations when reconnecting agents via ECS-hosted services in the app environment. EC2 handled this better, but with tradeoffs in cost and scalability. More work is needed to resolve this in scalable deployments.

Prompt engineering

Prompt design was critical. From triggering the right functions to keeping the AI "in character," precise and well-scoped prompts made the system function reliably.

Technical details and challenges

By the time we’d integrated the core components and had a working prototype, we’d uncovered some interesting engineering challenges:

Session Lifecycle: Each role-play needed to run in an isolated room with cleanup processes to avoid dangling sessions or cost blowouts.

Loading Delays: Profile loading introduced 2–6s delays due to Pinecone queries; preload optimisation is a future opportunity.

Audio Visualisation: Limited access to low-level audio APIs in Expo made implementing a visualiser challenging.

Pause Limitation: GPT didn’t support pausing mid-session. Workarounds would require app-side or agent-side handling.

What’s next for AI voice app development

While the prototype successfully demonstrated the potential of real-time AI-driven voice interactions, there’s still plenty of room to refine and optimise the experience before it's ready for production. Further exploration is underway to:

  • Reduce latency and costs
  • Enable more robust reconnections and session control
  • Expand testing tools (e.g. partial role-plays, scripted flows, or low-cost LLM validation)

This project demonstrated what’s possible with AI voice interaction today and what still needs refinement before productisation.

AI voice interaction is no longer just a concept, it's possible to build today. But making it robust enough for production still requires thoughtful design, rigorous testing, and a willingness to prototype, adapt, and learn.

We're using hands-on exploration to help clients test early, iterate fast, and make smarter decisions about their AI investments.

Want to develop AI software?

Interested in building your own AI prototype? We help Australian organisations validate ideas fast with time-boxed AI software development services. Reach out to us at hello@airteam.com.au or via our contact form.

Share this post