Auto sound design
Automatically mix music, voice, and SFX levels. The agent detects audio types and sets balanced volumes with fades.
Get a balanced audio mix without touching faders. Auto sound design detects music, voice, and SFX by name/pattern, then sets appropriate levels and fades.
What it does
- Scans timeline audio tracks
- Classifies by name and content:
- Voice — interview, dialogue, narration
- Music — background tracks, beds
- SFX — sound effects, stingers
- Other — uncategorized audio
- Sets volume targets per type
- Adds fade in/out where needed
- Optionally mutes embedded video audio when clean voice audio exists
When to use it
- Mixed footage — camera audio + music + voiceover
- Interviews with BGM — duck music under voice
- Tutorials — ensure narration is clear, music supports
- Social clips — punchy music-forward, voice audible
How to use
Quick mix:
"Auto sound design"
"Balance the audio levels"
With specifics:
"Mix with voice upfront"
"Music-forward mix for TikTok"
"Auto sound design and mute any embedded video audio"
Volume targets
The agent applies:
| Type | Target | Notes | |------|--------|-------| | Voice | -6 to -12 LUFS | Clear and prominent | | Music | -20 to -24 LUFS | Supporting, doesn't mask voice | | SFX | -12 to -18 LUFS | Audible but not jarring | | Embedded video | Often muted | Use clean separate audio when available |
Fades
Auto-adds:
- Music — 1-2 second fade in/out
- SFX — Quick fade in/out
- Voice — Subtle fade if abrupt starts
Audio type detection
Based on:
- Asset names — "music," "bgm," "voice," "VO," "sfx"
- Track position — Audio track 0 (video embedded) vs separate tracks
- Audio characteristics — stereo, mono, branching
Ducking and sidechain
When voice + music coexist:
- Voice remains clear (-6 to -12 LUFS)
- Music drops to support (-20 to -24 LUFS)
- Relative levels preserve intelligibility
Tips
- Name assets clearly — "music-upbeat," "voice-intro" helps detection
- Separate tracks — keep voice on its own track for clean detection
- Preview after mix — adjust if specific moments need more/less
- Manual override — fine-tune any clip after auto-mix; only affected clips are processed
Limitations
- Uses filename heuristics, not audio analysis (doesn't "hear" the content)
- Sets constant levels per clip (no dynamic ducking during playback)
- Works on clip volumes, not track automation
See also
- Text-to-speech voiceover — add clean voice audio
- Stock sounds and music — background tracks and SFX
- Timeline audio editing — manual volume and fade control