AudioMarch 6, 2026· 8 min read

How Audiobook Creators Actually Process Their Recordings

Ever wonder what happens between recording a voice and publishing an audiobook? Here's the real workflow indie narrators use to turn raw audio into polished files.

How Audiobook Creators Actually Process Their Recordings

So you've spent six hours in your blanket-fort studio recording a chapter. Great! Now comes the part nobody warns you about: turning that 400 MB WAV file full of mouth clicks, room tone, and that one cough into something people will actually want to listen to for twelve hours straight.

If you're curious about how audiobook narrators (especially the indies) go from raw recordings to finished files, you're in the right place. This isn't theory — it's what actually happens in the day-to-day grind of audiobook production.

Step 1: The Raw Recording (and Why It Matters)

Here's the thing: you can't fix everything in post. If your recording is trash, no amount of plugins will save you. Most pros record in WAV or FLAC at 48 kHz / 24-bit because it gives them headroom for editing. You'll downconvert later, but starting with high quality means less digital artifacts when you start cutting things up.

And yes, recording in a closet surrounded by clothes actually works. Bedrooms with hard floors? Not so much. Acoustics matter more than your microphone brand (though a decent USB mic like a Blue Yeti or Audio-Technica AT2020 goes a long way).

Step 2: Editing Out the Mistakes

This is where the time disappears.

Narrators go through the entire recording and punch out every mistake — the re-takes, the stumbles, the dog barking in the background. Some people do this as they record (punch-and-roll technique), but most record in longer takes and edit afterward.

Professional tip: leave a few seconds of silence before and after each mistake. Makes it way easier to find in the waveform.

Tools? Audacity is free and surprisingly capable. Pros often use Reaper ($60, absurdly underpriced) or Adobe Audition ($22/month, which adds up). Some swear by Twisted Wave on Mac.

Step 3: Noise Reduction and Room Tone

Even the quietest room has some background noise. Air conditioning, computer fans, the hum of existence. Before you start editing, record 10-15 seconds of pure silence in your recording space. This is called room tone, and you'll use it to fill gaps where you've cut things out.

Most editors also run a light noise reduction pass. The trick is subtlety — overdo it and your voice sounds like it's underwater. In Audacity, you capture a noise profile from your room tone, then apply it with reduction set to around 6-9 dB (not the default 12).

Step 4: Dealing With Breaths, Clicks, and Pops

Mouth noises are the bane of every narrator's existence.

Some people manually cut every breath. Others reduce breath volume by 6-10 dB instead of removing them entirely (more natural). There are plugins like Breath Control and De-Clicker, but a lot of narrators just zoom in and manually delete or fade them because automation tends to be aggressive.

Pro move: keep a glass of room-temperature water nearby while recording. Cold water tightens your throat. Sugary drinks make mouth clicks worse. Plain water, sipped between paragraphs, is your friend.

Step 5: Mastering (Making It Sound Consistent)

Your recording needs to hit certain technical specs for platforms like Audible (ACX) or Findaway Voices. The big three requirements:

  • Peak level: between -3 dB and -6 dB (no clipping)
  • RMS (average loudness): between -18 dB and -23 dB
  • Noise floor: below -60 dB (ideally -65 dB or lower)

Most narrators use a mastering chain: EQ (cut lows below 80 Hz, boost presence around 3-5 kHz if needed), compression (gentle, 2:1 or 3:1 ratio), then a limiter to catch peaks. There are free ACX-specific mastering presets floating around the internet that do 90% of the work.

If all this sounds like gibberish, there's also the "Loudness Normalization" feature in Audacity that gets you pretty close automatically. Not perfect, but honestly good enough for most indie projects.

Step 6: Converting and Exporting

Here's where things get practical.

Most platforms want MP3 files at 192 kbps constant bitrate (CBR), 44.1 kHz, mono. Not stereo. Not 320 kbps. Mono 192 CBR is the sweet spot for file size vs quality in spoken word.

Some narrators also create an M4B version (audiobook format with chapter markers) for personal use or direct sales, but MP3 is what you'll upload to distributors.

If you've got a bunch of chapter files that need batch converting, audio conversion tools can save you hours of repetitive exporting. You can also merge audio files if you recorded in shorter segments and want to combine them before final export.

Step 7: QA (Quality Assurance)

Before you upload, listen back. Not to the whole thing (unless you're a masochist), but spot-check random sections. Are the levels consistent? Any weird pops you missed? Does chapter 8 sound way quieter than chapter 3?

A lot of narrators also run their final MP3s through ACX's own Audio Lab tool (free online) to confirm they pass technical requirements before submitting. Nothing worse than uploading 15 files and getting rejected for a -61 dB noise floor.

How Long Does All This Take?

Industry standard: expect to spend 2-4 hours editing and mastering for every finished hour of audio. Beginners can easily hit 6-8 hours per finished hour. Pros with clean recording technique and templates might get it down to 1-2 hours.

A 10-hour audiobook? That's 20-40 hours of post-production. Which is why audiobook narration rates are what they are (typically $200-400 per finished hour for established narrators, or royalty share for newbies).

The Tools You Actually Need

Let's be real about what's essential vs nice-to-have:

Must-have:

  • DAW (Audacity is free, Reaper is $60)
  • Decent USB mic ($100-300 range is fine)
  • Quiet recording space (closets work)
  • Headphones for monitoring

Nice-to-have:

  • Pop filter ($10-20)
  • Mic stand or boom arm
  • Acoustic treatment (blankets count)
  • Plugins for noise reduction (iZotope RX is the gold standard but costs $$$)

You don't need a $2,000 setup to make good audiobooks. You need a quiet space, decent technique, and patience.

What Nobody Tells You

Recording the book is the fun part. Editing is where you earn your money. Some narrators burn out not because they hate performing, but because they underestimate how tedious the post-production grind becomes after the 50th hour of staring at waveforms.

Also: your first few audiobooks will take forever. That's normal. You're learning where your workflow bottlenecks are, building templates, and training your ears. By book three or four, you'll be way faster.

And here's the uncomfortable truth: sometimes it's worth paying someone else to do the editing. If you charge $300/finished hour and editing takes you 4 hours per finished hour, but you could hire an editor for $50/hour, the math starts to make sense. (Though many narrators find the editing meditative once they get into a rhythm. Your mileage may vary.)

Final Thought

Audiobook production is weirdly satisfying once you get the hang of it. There's something about taking messy raw audio and sculpting it into something listenable that hits the same part of the brain as organizing a messy closet.

But it's also a lot of work. If you're thinking about getting into narration, budget your time accordingly. The performance is maybe 40% of the job. The rest is sitting in front of a screen, zooming into waveforms, and cutting out the sound of you swallowing.

Welcome to the glamorous world of audiobook creation.

Frequently Asked Questions

What audio format do audiobook platforms accept?
Most platforms want MP3 files at 192 kbps (constant bitrate), 44.1 kHz sample rate, mono. Audible/ACX also accepts M4B files but MP3 is the safest bet across platforms. Don't use variable bitrate or stereo — they'll get rejected.
Do I need expensive software to edit audiobooks?
No. Audacity is free and handles 90% of what most narrators need. Pros often use Reaper ($60) or Adobe Audition ($22/month), but you can absolutely start with free tools. The expensive stuff (like iZotope RX) is nice for advanced noise reduction but not required.
How long does it take to process a finished hour of audio?
Industry rule of thumb: 2-4 hours of editing per finished hour. Beginners might take 6-8 hours. Pros with good recording technique and templates can get it down to 1-2 hours. A 10-hour audiobook means 20-40 hours in post-production.
Should audiobooks be mono or stereo?
Mono. Voice recordings don't benefit from stereo, and mono files are half the size. All major platforms (ACX, Findaway, etc.) recommend mono for spoken word content. Save stereo for music and sound effects.