The Voice in the Machine
I recently posted about using AI generated art as starting points in composing my book covers. Following on that theme, what got me looking at AI as a tool in the first place, and what broke the ice.
Like many people, I had reservations about using AI for three reasons: whether its quality was good enough, questions on the impact it has on professional artists (including voice actors), and audience judgement (a severe case of imposter syndrome heaped on more imposter syndrome that comes with being an independent author).
What can AI actually do these days?
Let's talk about quality. There are a number of narration solutions out there, from awful for almost lifelike. A complicating factor is also that certain platforms will only accept their preferred solutions. For example, ACX, which allows you to publish audiobooks to Audible, only accepts digital narration that they create… you can't come to them with an already AI-narrated book. Spotify is the same, except they only accept digitally narrated books if they were narrated by Google Play's service.
Google play will narrate your ebooks for free right now, but … their sample voices … These sound horrid. They are certainly not "best in class".
I started playing around with elevenlabs.io, and from their free quota I was able to try out some of my book paragraphs with a number of their voices.
I found the quality far superior, and quality that I was happy with to use for my books. I love working with Andromeda, for she indeed has a warm and lovely voice (once of the voice options). Here’s an example paragraph from the third book (currently in production), the Tides of Artalon. To set up what’s happening without spoilers: the POV is a young woman who has fallen in love with an elf hunter. A few days prior, they experienced a telepathic link and he saw something he should not have: she had fallen in love with him. She had been angry with him for several days now, and he just apologized to her for seeing what he should not have seen.
Is the voice as good as a professional voice actor? No. Is it better than an amateur voice actor? I think so. Making an audiobook is more than just the voice as well; it's also the sound engineering for the recording, to make sure levels are right, no background noises are intrusive, etc. The results I'm getting with elevenlabs.io is better than I could do recording it myself. I know, I've tried.
So… quality? Check. Now what about resourcing it. Google Play is free, but that doesn't pass the quality check (and I didn't delve deep enough to find out if I could guide it's pronunciations of the fantasy names and places, or guide its emotional performance). Hiring a professional to do the voice acting and sound engineering runs between $250 and $300 per hour of finished content.
Let's do the math. Lightfall is 12 hours, and I have 2 more books of that length. Covenant is 14 hours. The Tides of Artalon and Myth and Incarnation are expected to approach 20 hours each. That's… 90 hours of content not counting the remaining book I need to finish nor the short stories. Let's say $275 per hour… that's $24,750. Yeah… no. I'm not a publishing house. I'm an independent writer.
I've seen some complaints on Reddit that elevenlabs.io is overpriced compared to other tools out there. I'm not sure it is. Based on its subscription model and how much I'm re-rendering paragraphs and coaching it to get the emotional delivery I'm looking for, I expect I'll spend less than $2K to release all 6 published books and the two short stories on audio. That's orders of magnitude less expensive, with quality that approaches human levels. More than good enough for someone to be able to listen and enjoy the story.
Professional Voice Actors
What about the impact on the professional community? I'm not going to repeat the whole discussion here since I think I hit on it in my last post about AI art. I think there will always be a market for voice actors, and I think the market will get tougher. I wasn't hiring one anyway, so for me it was either AI or no audiobook at all. Taking a principled stand to not use these tools isn't going to stop them. They are here to stay, and I'd be foolish not to take advantage of the capacities this brings to share my art with others. Like with the AI generated images, the audio delivery is not my art. My art is the story itself, and the rest of it is packaging and delivery.
Judgment
I already had to overcome imposter syndrome when choosing to go the self-publishing route. There will always be some bias out there that "Oh, you're not with a publisher. You're not a real author." Yes, there's a lot of self-published drivel out there. There is no gate keeper, except the market itself. But there are also plenty of folks finding success as independent authors. So I had to get over that fear and just do it…
…this feels similar to me (especially with the AI paintings). The fear of, "Oh, you're just doing what anyone can do." Maybe, maybe not. You still have to have vision. You still need a quality story.
Ultimately, one has to get over the fear of judgment. I decided to focus on the art and make the best world I can make. Hopefully that becomes apparent to the readership.
The Process
So what's it like working with AI? It starts to feel like working with a partner. Sometimes she's breathtakingly brilliant. Sometimes she's unshakably stupid.
The elevenlabs.io tool has a projects feature, where you create your book. It allows you to create individual chapters, and I copy and paste my chapters into the project as I get to each chapter, then download the final mp3 when finished.
For pronunciations on those words she just can't get right, I use the speech generator tool to find a phonetic way that she gets right most of the time. (When she gets it wrong on occasion, I just re-render the paragraph). One example of a real-world word she gets wrong too much: sidhe. So, I make a working copy of the Word doc of the book before starting the project, and I do a global find-replace of all instances of "sidhe" to be "shee". Other examples: Keruhn (she wants to say Keruhin) became Keroon. Oriand (she wants to rhyme it with Orion) became Ori-ahnd. Kallanistan (she rhymes it with with Afghanistan), intended to be an adjective describing “Something from Kallanista”, becomes KallanEEstan. I keep a OneNote page with all my standard substitutions.
She doesn't store italics. So if I want to distinguish something like: "He didn't want to do this" vs "He didn’t want to do this.", I would use double quotes thus: "He "didn't" want to do this." This usually helps her know which word to stress.
And sometimes it's just the inflection and emotion is off for the scene. That's harder to deliberately coach, since the engine takes its guidance from context. So, sometimes I have to re-render the paragraph four or five times before it settles on the right delivery.
I'm thrilled with the quality of the audiobook that's being put out. Two (Lightfall and Covenant) are already on Google Play for purchase, along with a short story (Complete Me). The only limitation at this time is Google Play seems to be the only platform that will host third-party generated digital narration. But that's ok… I'm willing to accept the limit in favor of the quality of the product. And, I'm hopeful that over time, more and more platforms will open up.
In the meantime, I have a world to tend.