Designing an interactive audio narrative for children

Posted filed under Experience, Featured, Projects.

Reading Time: 11 minutes

“Alexa, open The Messlins” says one of my kids out loud to the smart speaker setup in our living room. After eight months of research, design and production, I was about to witness my children interacting with my voice-enabled audio story for the very first time.

With a background in performance and media production, I have always been driven to find new ways to connect with an audience, whether taking the stage to do sketch comedy, giving live readings of my self-published children’s book, or producing audio-visual content for broadcast, film, and web. In 2019, I began to explore the ways in which storytelling can transform into storyliving; an undeniable shift that we have been seeing more of in the last decade. This exploration eventually led me to Ryerson University’s Media Production MA program. 

I completed my master’s degree in 2020 with a research and design focus on interactive audio narratives using a smart speaker. Interactive stories told through a smart speaker are a relatively new way to engage an audience. As part of my research, I developed a prototype for an interactive audio narrative for children titled “The Messlins,” using Amazon’s voice-enabled device, the Echo. The story used a branching narrative design. Branching narrative is a form of storytelling that allows audiences to decide the path and outcome of the story ( A child can decide how the story unfolds by voicing their choice when prompted by the characters heard in the story. Later, I will describe further why the Amazon Echo was chosen as the device for the design and user-testing of the prototype.

As a parent, my world presently revolves around all things that educate and entertain my twin 6-year olds, so it made the most sense to write a story for an audience for that age group.  Through my research, I learned that children between the ages of 4 to 7 have developed the skills to use their imagination and play make-believe (Calvert and Wilson 291; Vygotsky 1967). Many of us can recall a time in our own childhoods when a parent reads a bedtime story. Even though my mother or father had the responsibility of reading the words on the page, I was always encouraged to ask questions, add sound effects or speak “directly” with the characters. The beautiful thing about an audio narrative is that it can open a world of possibility that allows the listener to paint their own imagery of the storyworld and characters in their minds. 

Two young children listen to a story told using Alexa.

Since the prototype was designed to be an interactive experience, it was important to understand what motivates a young child to interact with the story. One consideration was a child’s ability to communicate verbally. Storytelling with elements of play can help a child with their communication skills (Phillips 4; Ryokai and Cassel 2). Throughout the story of “The Messlins”, children are meant to interact with the characters as well as move around their physical space as if they are in the story. The design of the characters was another important factor when considering a child’s motivation to interact with the story. The main characters are siblings who would be described as tweens, as it has been found that young children respond better to older characters that they can look up to (Miller 2014).  As a side note, the inspiration for the main characters with whom the user interacts came from the adventurous spirit of both my son and daughter, so I decided to create a young male and female character with equal status. I also wanted to pay a small tribute to my heritage as a Filipino-German-Canadian, so I created another main character with a German name while weaving in Filipino references throughout the script. 

My years of experience as a video producer taught me to always start with a script outline, a “blueprint” that ensures flow and connectivity between the beginning, middle and end. An interactive story with multiple plots and endings adds an extra layer of complexity. Before writing the audio script, I first mapped out the main plot points of the story which evolved into a branching map, which looks very much like a tree, hence the name. Unlike a linear plotline, stories that employ a branching narrative require a lot of work and pre-planning to ensure that every choice available makes sense for the story progression (Crawford 117). I personally like to get literally hands-on for a project like this, so to begin I created the branching map by hand with lots of different coloured markers and a large piece of paper; this is where being a parent with an abundance of craft material comes in handy! I then created a digital version of the branching map using a mapping software called Lucid Chart.

There is one differentiating factor that sets interactive content apart from a linear audio story; the ability to make choices. In a branching narrative, the listener can influence the direction of the story. A narrative designer must consider how a user will interact with the system and factor in as many outcomes as possible (Crawford 31). The element of play involved in an interactive experience also changes the role of the child. When a child simply listens to a linear story, this is considered a passive activity. However, in an interactive story a child is an active participant who can be viewed as either a player, user, learner, or a combination of the three (Markopoulos et al 28). Designing for an active participant requires empathising with them, in this case the child, and seeing the story through their eyes. In the case of my prototype, it would be hearing it through their ears. 

Once an idea is sparked, the hardest part of the creative design process for me is often simply starting. I was carrying out two roles in parallel; the writer of a fairytale story and the VUI designer of an interactive experience. The writer in me needed to think about how to tell an engaging story that hit all the major beats. If the experience wasn’t entertaining, I would lose the interest of the listener (Buurman 1). While my inner-VUI designer had to consider the conversation between the user and the device from beginning to end (Pearl 8).  For no particular reason, other than it being one of the most popular brands of smart speakers that I also happen to own, I chose the Amazon Echo, which uses the virtual assistant AI technology called Alexa. 

Voice-enabled apps that are built for the Echo are called “Skills.”  Early in the design process, I knew that I wanted a narrator to introduce the Skill before the story began. I also wanted it to be a voice that was different from the voice assistant (in this case, Alexa) so that the user would know that the experience had officially begun. This meant that the narration portion would have to be scripted and recorded. 

Once I had established how the Skill would launch, mapping out the branches was a relatively fluid process. I would complete one whole branch from start to finish before going onto another branch. Some branching narrative designers believe that it is best to follow through to the end with one branch otherwise one may run the risk of creating several half-complete branches (LudoNarraCon 2020). In addition, this approach insured that every path was given the same importance in the design process. To avoid the map from branching out exponentially, I used a technique referred to as foldback, which involves linking certain points of different branches together (Crawford 121). Once completed, the first version of the branching map resulted in a total of 31 scenes which included the opening narration and credits. 

With the map created, I was then able to write the script, a process which was once again twofold; the beginning and end of each scene were driven by principles of VUI design while the middle section was all about the narrative. The front and backend of each scene were treated as conversational markers (Pearl 40). This means that a scene opens with positive feedback to the user indicating that the choice they had made in the previous scene was heard and accepted. It also means that the same scene would eventually end with one of the characters prompting the user to make a choice for the next scene. I opted for the characters to suggest how to respond as it has been found that providing examples is a better user experience (Pearl 21). 

The original branching map included two distinct choices for the user at every major plot point. My original thought was that it was best to keep the answer short and simple as to make it easier for the child to repeat. I later discovered that short commands like “Study” can be problematic for a voice-driven application and it is easier for the system to recognise phrases such as “Go to the Study” (Pearl 131). Of course, we can’t forget about the middle section of each scene; this was my moment to shine as a storyteller and comedian. I was making a prototype for kids after all and humour can make an experience more enjoyable (Hall and Maeda 95) but it is important to note that it is different from adult humour (Miller 2014). Fortunately, I have a strong background in sketch comedy which makes me keenly aware of comedic timing and the art of being silly. I added little moments in the script that would hopefully elicit a giggle from the listener, whether it was for the child or their parent. 

When the university closed its doors due to the Covid-19 pandemic, I had to be proactive to ensure that the production process would continue as smoothly as possible. Unfortunately, the social distancing rule that was put in place in mid-March 2020 made in-person recording sessions impossible and I had to pivot my approach. I planned a virtual sound production that required building a temporary sound booth in my basement, delivering equipment to the homes of my voice talent, directing them over a video conference call, and uploading the audio files onto a shared drive which my sound engineer could access.

While I was able to find alternate methods to keep production going, it did not come without its challenges and setbacks. The sound booth that I was able to build at home was the perfect size but I was unable to cover it with blankets from top to bottom, which would have been ideal in order to cut out any external sounds. This meant that there were moments when noise in the home like the furnace going on, would get picked up by the microphone. Directing the voice talent remotely was also challenging. I had to listen to my voice actors through my computer speakers. There was no way of really knowing how the recordings sounded until I received the files. Fortunately, my voice actors also live together which made it easier to match their ambient sound. The recordings that my husband and I did at home also matched. Adjusting the reverb and adding ambient sounds created the illusion that the characters were in the same space. This is where having a sound engineer who can make this happen is incredibly important.

The audio clips were made up of various sound elements which included voice recordings, original and stock music, and sound effects. The voice recordings were provided by myself, my husband, and two friends. Since this was an interactive audio story for children, I wanted all the characters including the mischievous creatures, to be lively and likeable. The high energy delivery of the character voices was intended to keep a young listener engaged. As sound can affect us psychologically and emotionally (Treasure 2020), I also wanted to avoid any of the characters from sounding too menacing or scary for a child. I had a similar objective for the music selection. All the stock music in the prototype are instrumental pieces that were part of a family-friendly music library. The main theme song of the story includes original lyrics that I wrote with three characteristics in mind; simple, fun, and catchy. 

The sound effects evolved throughout production. I had originally planned for sound effects that would help support each scene such as ambient sounds of a room (eg. the sound of feet walking down a long hallway) and the characters’ movements and interactions with objects (eg. a character pulls a cork out of a bottle). Further research on VUI design prompted me to consider how sound effects could be used to provide the user a context on how to interact with the story (Pearl 62). For instance, a twinkling sound effect was later added to cue the user whenever it was time to make a choice in the branching narrative. This type of sound effect is considered an earcon because it is a piece of information that lets the user know that they are expected to respond to the system ( 2020). Another addition into the audio mix were various hero sounds, which indicated whenever the user had selected a choice that would advance the narrative closer to success. This could also be considered a form of feedback, which has been found to be effective in a VUI designed for children (Cieślak 2020).

Despite all the unexpected ways in which I had to shift production, overall I was pleased with the end result.  Once the produced audio clips were ready, I incorporated them into the conversational map that I created with Voiceflow. I was then able to test out the prototype with my children using our home smart speaker and observe how they interacted with the story. To witness their intuitive responses to the characters and get up to take part in some of the activities brought a huge smile to my face. As a creator, there is no better feeling than to see the content elicit a reaction from the audience, and on a new level, interact with the content. Observing my children playing with the interactive audio narrative allowed me to confirm what worked and didn’t work. The children’s interactions and reactions to the story confirmed that receiving feedback from the characters affirmed their role as the choice-makers. Music played its most important role as a cue for commands. The children knew that they could respond once the music had ended. Their attentiveness throughout the story was due in large part to their required involvement to determine the story progression (Miller 2014; Sperring and Strandvall 233).

The prototype test also revealed what needed to be improved. Using vocal commands to communicate with a smart speaker is intuitive but the moment the voice assistant cannot pick up a command, which happens more often with children, is when it becomes frustrating for a user. I also found it interesting that my children were more invested in chasing after the Messlins than to do the side activities. For them, that was the main objective of the game, and anything beyond that was an obstacle. In terms of the character design, I spent considerably more time on the development of the prince and princess characters because I had assumed a child would feel more connected to them because of their age and shared goal of finding the Messlins. So, it came as a surprise that my children were more engaged whenever a Messlin was in the scene. Although the Messlins had considerably fewer lines compared to the other characters, they were the most memorable characters because they were, as one of my kids described them, “silly and funny.”The MesslinsIt is an exciting time for voice technology, and with huge brands like Disney and Hasbro starting to develop their own content for voice-enable devices it’s evident that we are just witnessing the beginning of a new way to connect with audiences. It is especially important these days, with everyone at home and relying more on screen time. Providing children with a new way to interact with content while also encouraging them to imagine the scene in their own minds is fulfilling to me as both a creator and a parent. In the coming months, I look forward to testing out the prototype with other users through the Ryerson University Transmedia Zone Incubator program of which I am currently a member. Further testing is an integral part of the process and I hope to gain valuable new insight on the user-experience.  Much like a branching narrative, where this creative process leads me to next is still unknown, but I’m looking forward to the adventure nonetheless. 


Buurman, H. A. Virtual Storytelling: Emotions for the narrator. MS thesis. University of Twente, 2007.

Calvert, Sandra L., and Barbara J. Wilson. The Handbook of Children, Media, and Development. Wiley-Blackwell, GB, 2008.

Cieślak, Katarzyna. “How to design a Voice-First game for kids? – Voice Tech Global Medium.” Medium, 24 Jan. 2020, first-game-for-kids-79ec345d0c35.

Crawford, Chris. Chris Crawford on interactive storytelling. New Riders, 2012.

Hall, Erika, and John Maeda. “Conversational design.” (2018).

LudoNarraCon. “Developing Branching Narratives.” YouTube, uploaded by Fellow Traveller, 5 May 2020, “Applying sound to UI.” 20 March 2020, 

Markopoulos, Panos, et al. Evaluating children’s interactive products: principles and practices for interaction designers. Elsevier, 2008.

Miller, Carolyn H. Digital Storytelling: A Creator’s Guide to Interactive Entertainment. Focal Press/Elsevier, US. 2008; 2014;.

Pearl, Cathy. Designing voice user interfaces: principles of conversational experiences. ” O’Reilly Media, Inc.”, 2016.

Phillips, Louise. “Storytelling: The seeds of children’s creativity.” Australasian Journal of Early Childhood 25.3 (2000): 1-5.

Ryokai, Kimiko, and Justine Cassell. “Computer support for children’s collaborative fantasy play and storytelling.” CSCL. 1999.

Sperring, Susanne, and Tommy Strandvall. “Viewers’ Experiences of a TV Quiz show with Integrated Interactivity.” International Journal of Human-Computer Interaction: Social Interactive Television, vol. 24, no. 2, 2008, pp. 214-235.

“The evolution of video games as a storytelling medium, and the role of narrative in modern games.” 6 Apr. 2020,

Treasure, Julian. “The 4 ways sound affects us.” 13 Mar. 2020,

Vygotsky, Lev S. “Play and its role in the mental development of the child.” Soviet psychology 5.3 (1967): 6-18.

Inga is a media professional who has pursued a broad and deep range of experience throughout her career of over 15 years.  She completed her MA in Media Production (Ryerson University, 2020) with a research focus on interactive audio narratives using conversational design and voice technology. Inga is the newest ambassador of the Women In Voice Canadian chapter to help build and support the Canadian voice tech community.  She recently joined the EdTech Office at the University of Toronto as the Senior Educational Technologist, where she leads the team's media production efforts.

Related posts