AudioMarch 25, 2026· 8 min read

How Audiobook Creators Actually Process Their Recordings

Behind the scenes of audiobook production: the editing, mastering, and format conversion pipeline that turns raw voice recordings into polished audio.

If you've ever wondered how audiobook narrators go from sitting in a booth reading a manuscript to that polished, professional file you download from Audible, you're not alone. The process is more involved than most people think—and involves a lot more than just hitting "record" and "export."

I've talked to dozens of indie audiobook creators over the years, and their workflows vary wildly. But there are common steps that almost everyone follows. Let me walk you through the real production pipeline.

Step 1: Recording (and why format matters immediately)

Most narrators record directly into their DAW (Audacity, Adobe Audition, Reaper, Pro Tools). The smart ones record in WAV format at 44.1kHz, 16-bit minimum. Some go higher—48kHz or 24-bit—but that's overkill for spoken word.

Why WAV? Because it's uncompressed. You're going to edit the hell out of these files—removing breaths, fixing mistakes, adding room tone—and every time you edit a compressed format like MP3, you introduce artifacts. WAV gives you a clean master.

Here's the thing: raw recordings are huge. A single chapter might be 300-500 MB. That's fine. Storage is cheap. Quality loss is forever.

Step 2: The brutal editing phase

This is where the magic (and tedium) happens. Professional audiobook editors spend 3-4 hours editing for every 1 hour of finished audio. That's not an exaggeration.

What are they actually doing?

  • Removing mistakes: Flubbed words, re-takes, coughs, dog barks, neighbor's lawnmower
  • Taming breaths: Not removing all of them (that sounds robotic), just the loud, distracting ones
  • Mouth clicks and lip smacks: These are the bane of audiobook production. Staying hydrated helps, but you'll still need to manually cut dozens of them
  • Pacing adjustments: Trimming dead air, adding pauses where the narrator rushed
  • Consistency checks: Making sure volume is consistent across chapters (some people get quieter as they tire)

A lot of this is done by hand. Yes, there are plugins that claim to automate breath removal and de-clicking, but they're not perfect. Over-process, and your narrator sounds like a lifeless robot. Under-process, and it sounds amateurish.

Step 3: Noise reduction (the tricky part)

Even in a treated recording booth, there's background noise. AC hum, computer fan whir, refrigerator buzz in the next room. Noise reduction tools can remove most of it—but there's an art to using them subtly.

Most pros use a two-pass approach:

  1. Capture a "noise profile" from a silent section of the recording (where the narrator isn't speaking)
  2. Apply noise reduction at 50-70% strength—not 100%

Why not go full strength? Because aggressive noise reduction introduces a hollow, underwater quality. The goal is to reduce distractions, not sterilize the audio. You want some natural room tone—it makes the narration feel present and human.

Step 4: Mastering and loudness standards

Now we get technical. Audiobook distributors (like ACX, which feeds Audible, Amazon, and iTunes) have strict loudness requirements:

  • RMS (average loudness): Must be between -23dB and -18dB
  • Peak levels: Cannot exceed -3dB
  • Noise floor: Must be below -60dB

If your file doesn't meet these specs, ACX will reject it. And trust me—getting a 10-hour audiobook rejected after weeks of work is soul-crushing.

This is where audio normalization tools come in. They analyze your entire file and adjust the volume to hit those targets without clipping or distorting. Most DAWs have this built in, but standalone tools often do it better.

Step 5: Splitting into chapters

A finished audiobook might be 10+ hours long, but you don't deliver it as one massive file. Most creators split it into individual chapter files (easier to upload, easier for listeners to navigate).

Each chapter gets its own WAV file, labeled consistently:

  • BookTitle_Chapter01.wav
  • BookTitle_Chapter02.wav
  • BookTitle_Chapter03.wav

Some narrators include an "opening credits" file (title, author, narrator) and a "closing credits" file (copyright info, production notes). ACX requires these.

Step 6: Format conversion for distribution

Here's where most creators hit a wall. You've got pristine WAV files, but distributors want MP3 or M4B. And not just any MP3—they have specific requirements:

  • MP3: Constant bitrate (CBR), 192 kbps, 44.1kHz, mono or stereo
  • M4B: AAC codec, 64-128 kbps, with chapter markers embedded

Why M4B for audiobooks? Because it supports bookmarking. When you pause an audiobook on your phone and come back days later, M4B remembers exactly where you left off. MP3 doesn't do that reliably.

Most creators convert WAV to MP3 using FFmpeg or online tools, then use specialized software (like Chapter and Verse or Audiobook Builder) to create the M4B with embedded chapters.

But here's the dirty secret: a lot of indie creators skip M4B entirely and just upload MP3 files to ACX. It works. ACX converts them on the backend. But if you're selling directly to listeners (outside of ACX), M4B is the professional choice.

Step 7: Quality control (listen to the whole thing)

Yes, really. Professional audiobook producers listen to the entire finished product before uploading. They're checking for:

  • Accidentally left-in mistakes
  • Volume inconsistencies between chapters
  • Audio glitches or clipping
  • Chapter markers in the wrong place

This takes hours. For a 10-hour audiobook, that's 10 hours of playback. Most people speed it up to 1.5x or 2x, but you still have to actually listen.

And if you find a problem? Back to the DAW, fix it, re-export, re-convert, and listen again. It's why audiobook production is slow and why pros charge $200-400 per finished hour.

The indie shortcut: doing it all yourself

A lot of self-published authors narrate and produce their own audiobooks to save money. It's totally doable, but expect a steep learning curve. Plan on:

  • 1 week to record a 10-hour book (if you're efficient)
  • 3-4 weeks to edit and master
  • 1-2 days for QC and final tweaks

Tools you'll need:

  • A decent USB mic ($100-300)
  • A quiet recording space (closet full of clothes works surprisingly well)
  • A DAW (Audacity is free and good enough)
  • Noise reduction and normalization tools (built into most DAWs)
  • A reliable audio converter for format changes

If you're serious about audiobook production, consider investing in a pop filter, a boom arm, and basic acoustic treatment (foam panels or blankets). The difference in audio quality is massive.

Why format conversion is the final boss

After all that work—recording, editing, mastering—nothing is more frustrating than having ACX reject your upload because of a technical encoding issue. It happens all the time.

Common culprits:

  • Variable bitrate (VBR) MP3s when they require constant bitrate (CBR)
  • Sample rate mismatch (you recorded at 48kHz but converted to 44.1kHz incorrectly)
  • Peak levels exceeding -3dB because your converter didn't apply limiting
  • Embedded metadata that doesn't match ACX's requirements

This is why so many audiobook creators use dedicated conversion tools rather than just hitting "export" in their DAW. You need fine-grained control over bitrate, sample rate, channels, and loudness normalization.

And if you're producing multiple audiobooks, you want batch processing—converting 20+ chapter files in one pass, with identical settings, so everything is consistent.

The reality of audiobook production

Look, audiobook production is work. It's not glamorous. It's hours of staring at waveforms, trimming breaths, and re-listening to the same paragraph 15 times to make sure it sounds natural.

But the result—a polished, professional audiobook that listeners actually enjoy—is worth it. The difference between an amateur production (inconsistent volume, distracting mouth noises, poor pacing) and a professional one is night and day. Listeners notice. Reviewers definitely notice.

If you're just starting out, expect your first audiobook to take twice as long as you think. But by your third or fourth, you'll have a system. You'll know which plugins work, which shortcuts save time, and which quality standards you can't compromise on.

And you'll have a new appreciation for every audiobook you listen to—because now you know exactly how much effort went into making it sound effortless.

Frequently Asked Questions

What audio format should I record audiobooks in?
Record in WAV at 44.1kHz/16-bit or higher. WAV is uncompressed, giving you maximum editing flexibility. You can convert to MP3 or M4B later for distribution, but always keep your master files in WAV.
How loud should my finished audiobook be?
ACX (Amazon/Audible) requires audiobooks to measure between -23dB and -18dB RMS, with peaks no higher than -3dB. Most platforms follow similar standards. Use audio normalization tools to hit these targets consistently.
Can I use noise reduction on voice recordings without making it sound robotic?
Yes, but subtlety is everything. Use noise reduction at 50-70% strength rather than 100%. Remove constant background hum (AC, computer fans) but leave natural room tone. Over-processing makes voices sound hollow and artificial.
What is the M4B format and why do audiobooks use it?
M4B is essentially an AAC audio file with chapter markers and bookmarking support. Unlike MP3, M4B remembers your playback position when you close the app. It's the standard for audiobooks because listeners rarely finish a 10-hour book in one sitting.
How do professional narrators remove mouth clicks and breaths?
Most use a combination of manual editing (cutting out loud breaths) and subtle de-clicking plugins. The goal is not to remove all breaths—that sounds unnatural—but to reduce distracting ones. Pro tip: staying hydrated and avoiding dairy before recording sessions dramatically reduces mouth noise.