AudioMarch 7, 2026· 9 min read

How Audiobook Creators Actually Process Their Recordings

From raw voice files to polished audiobooks—the real workflow behind what narrators and indie creators do to make audiobooks sound professional.

If you've ever tried recording yourself reading something aloud and then listened back, you know the vibe: it sounds nothing like the smooth, effortless audiobooks you buy on Audible. Maybe there's background hum. Maybe you can hear every breath. Maybe your voice volume jumps all over the place.

That's because audiobook production isn't just about reading well—it's about a ton of invisible audio work that happens after the recording. And if you're thinking about narrating your own book (or starting a side gig as a narrator), understanding the actual workflow matters.

Here's what really happens between "record" and "publish."

Step 1: Recording in a Format That Won't Bite You Later

Most audiobook narrators record in WAV at 44.1kHz or 48kHz sample rate, 24-bit depth. That's not because they're audiophile snobs—it's because WAV is lossless and gives you room to edit without degrading quality.

You don't want to record straight to MP3. MP3 compression is lossy, and every time you edit and re-export, you lose a bit more fidelity. It's like photocopying a photocopy—fine once, ugly after five rounds.

But eventually, you will convert to MP3 or M4B (the audiobook-friendly format). That comes at the end. For now, keep everything in WAV so you can chop, tweak, and re-export cleanly.

And if you need to split chapters or prep different takes for comparison, having a tool that can convert between formats quickly without re-encoding hell is a lifesaver.

Step 2: Cleaning Up the Raw Audio

This is where the magic (or tedium) happens. Raw narration is full of stuff you don't want:

Breaths between sentences (some are fine, but loud gasps aren't)
Mouth clicks and lip smacks
Background hum from AC, computer fans, or traffic
Retakes where you stumbled on a word
Long pauses where you lost your place

Professional narrators spend 2-4 hours editing per finished hour of audio. Some of that is trimming out the bad takes. Some is using tools like iZotope RX to surgically remove clicks and pops. Some is just listening closely and cutting breaths that sound too wet or distracting.

Here's a pro trick: record 5 seconds of "room tone" (complete silence in your recording space) at the start of every session. Then use that as a noise profile in your audio editor. Tools like Audacity or Adobe Audition can "learn" what your background noise sounds like and subtract it from the whole file. It's not perfect, but it's shockingly effective.

Step 3: Normalizing Levels So Your Voice Doesn't Whisper-Shout

Even if you think you're reading at a consistent volume, your waveform will show otherwise. Quiet passages dip. Dialogue gets louder. If you don't fix this, listeners will constantly adjust their volume—annoying.

Audiobook platforms like ACX (Amazon's audiobook distributor) have strict loudness requirements:

Peak levels between -3 dB and 0 dB
RMS (average loudness) between -18 dB and -23 dB
No clipping, no silence longer than a few seconds

If you submit files outside those specs, they get rejected. So narrators use normalization and compression (the audio kind, not file compression) to keep everything in the sweet spot.

Most DAWs (digital audio workstations) have a "normalize to RMS" function. You set the target (say, -20 dB RMS), hit apply, and it automatically adjusts your entire recording to match. Done.

Step 4: Splitting Into Chapters

Nobody wants one giant 8-hour audio file. Audiobooks are split by chapter, sometimes by section. That means you're exporting 15-30 separate files per book.

If you recorded chapter-by-chapter, great—you already have separate files. But if you recorded in long sessions (which is more efficient), you'll need to split the master file at the right timestamps.

Some narrators use markers in their DAW to label chapter breaks while recording. Then they export each segment as its own file. Others just eyeball the waveform and cut manually.

Either way, you end up with a folder full of files like Chapter_01.wav, Chapter_02.wav, etc. Then you batch-convert them all to MP3 or M4B for distribution. If you need to batch-process audio files without clicking 30 times, automation is your friend.

Step 5: Adding Metadata (So Listeners Know What They're Hearing)

MP3 and M4B files can store metadata—title, author, narrator, chapter titles, cover art. Platforms like Audible pull this info to display in the app. If you skip this step, your audiobook shows up as "Unknown Artist" with no chapter navigation. Not a great look.

Tools like Mp3tag (Windows) or Kid3 (Mac/Linux) let you batch-edit ID3 tags across all your chapter files at once. You set the album name (book title), artist (author), and individual track titles (chapter names).

For M4B files (which are basically AAC audio wrapped in an MP4 container), you can also embed chapter markers so listeners can skip between sections like they're using a DVD menu. It's a nice touch that makes your audiobook feel more polished.

Step 6: Final Export and Quality Check

Before you upload anywhere, you listen to the whole thing. Not in your DAW—export it to MP3 or M4B and listen on the same device your audience will use. Phone speakers, earbuds, car stereo.

Why? Because playback on different devices can reveal problems you didn't hear in your studio monitors. Maybe there's a low hum that only shows up on phone speakers. Maybe the bass is muddy in a car. You catch it now or you catch it in 1-star reviews.

Some narrators also run their files through ACX Check (a free plugin for Audacity) to verify they meet platform specs before uploading. It scans peak levels, RMS, and silence length. If it passes, you're good to go.

What Format Do You Actually Publish In?

Depends where you're uploading:

ACX (Amazon/Audible): MP3, 192 kbps CBR, 44.1kHz, mono or stereo
Findaway Voices: MP3 or M4B, similar specs
Self-hosting or Patreon: M4B is nice because it bundles chapters + cover art in one file

M4B is technically superior for audiobooks because it supports chapter markers and bookmarking (so listeners can pause and resume exactly where they left off). But not all platforms accept it, so MP3 is the safe fallback.

If you recorded in WAV (which you should have), converting to MP3 or M4B is a one-time lossy step. Do it once, at the end, with high quality settings (320 kbps for MP3, 128-192 kbps for AAC/M4B). That way you preserve as much as possible from your lossless source.

Tools the Pros Actually Use

You don't need a $2,000 software suite to make a good audiobook. Here's what working narrators rely on:

Audacity (free) — handles recording, editing, noise reduction, normalization
Reaper ($60) — more powerful DAW if you want advanced features
iZotope RX ($$$) — the gold standard for cleaning up mouth noise and background sounds
ACX Check plugin (free) — verifies your files meet Audible's specs
Mp3tag or Kid3 (free) — batch metadata editing

And honestly? For format conversion, chapter splitting, or quick batch jobs, having a fast audio converter that doesn't make you install anything is clutch. You're already spending hours on the narration and editing—don't add more software headaches.

How Long Does This All Take?

Let's say you're narrating a 60,000-word book. That's roughly 6-7 hours of finished audio. Here's the real time breakdown:

Recording: 10-12 hours (you'll do retakes, breaks, etc.)
Editing: 12-28 hours (2-4x the finished length)
Proofing: 6-7 hours (listening at 1.5x speed to catch issues)
Exporting and metadata: 2-3 hours

Total: 30-50 hours for a ~7-hour audiobook. That's why audiobook narrators charge $200-400 per finished hour for indie projects. It's not just reading—it's a production pipeline.

What About AI Narration?

Look, AI voices are getting scary good. Google's WaveNet, ElevenLabs, and others can produce narration that sounds almost human. Some indie authors are already using them to cut costs.

But here's the thing: AI still struggles with pacing, emotion, and character voices. It can read a technical manual just fine. It falls apart with dialogue-heavy fiction or anything that needs nuance.

Plus, listeners notice. Audiobook communities are vocal about preferring human narrators. So while AI can handle the grunt work (maybe generating chapter splits or suggesting cuts), you're not replacing human narrators anytime soon.

Final Thoughts: It's a Craft, Not Just a Recording

Making an audiobook sound professional takes way more than a decent mic and a quiet room. It's editing out breaths, balancing levels, exporting in the right format, adding metadata, and doing a final quality pass.

If you're doing this yourself—whether you're an author narrating your own book or a freelance narrator building a portfolio—respect the process. It's tedious, but it's what separates amateur recordings from stuff people actually want to listen to for hours.

And if you need help managing the technical side—like converting files, splitting chapters, or batch-processing metadata—there are tools that make it way less painful. Because the last thing you want after 40 hours of narration work is to fight with audio software for another three.

Frequently Asked Questions

What audio format should I use to record audiobooks?

Record in WAV at 44.1kHz or 48kHz, 24-bit for editing flexibility. Export final chapters as MP3 320kbps or M4B (AAC 128-192kbps) for distribution. WAV gives you clean source files, MP3/M4B are what listeners actually use.

How do professional narrators remove background noise?

Record 5 seconds of room tone (silence) at the start, then use that profile in noise reduction plugins like iZotope RX or Audacity's Noise Reduction. They capture your room's acoustic fingerprint and subtract it. Manual EQ cuts at 80Hz and below also help remove rumble.

Do I need expensive software to edit audiobooks?

No. Audacity is free and handles 90% of audiobook work—cutting, normalizing, exporting chapters. If you want pro polish, iZotope RX (noise repair) or Reaper (full DAW, $60) are solid upgrades. Adobe Audition is overkill unless you're already subscribed.

How long does it take to edit one hour of audiobook narration?

Expect 2-4 hours of editing per finished hour. That includes removing breaths, mouth clicks, retakes, normalizing levels, and exporting. Veterans can hit 1.5x, beginners might need 5+ hours. It speeds up once you have templates and shortcuts down.

What is RMS and why do audiobook platforms care about it?

RMS (Root Mean Square) measures average loudness. ACX requires -18 to -23 dB RMS so all audiobooks sound consistent. Too quiet and listeners max out volume; too loud and it clips. Use a loudness meter plugin or normalization filter to hit the target range before export.

← Back to Blog