stereo: add Mixed Mode joint stereo; improve throughput#110
Open
nschimme wants to merge 1 commit into
Open
Conversation
…ughput Introduce JOINT_MIXED (mode 3) as the new default joint stereo mode. Per scale-factor band it selects IS above 5.5 kHz, M/S where channels are correlated, or L/R otherwise, recovering stereo image vs. forced IS. Fix a band-silencing bug in midside(): the thrside zeroing ran on already-M/S-coded bands, producing L=R=0 on decode. Add the !ms guard. Guard enrgSum/Diff against negative rounding before sqrt (NaN fix). Keep IS available at low sample rates by capping the frequency floor to 70% of Nyquist. Factor repeated M/S, IS, and channel-zeroing transforms into apply_ms(), apply_is(), and zero_channel() helpers. Single-pass energy accumulation and restrict-qualified pointers reduce per-band work. Expose Mixed Mode in the CLI (--joint 3) and GUI dropdown. Move the Joint Stereo dropdown below the stereo checkboxes in the GUI.
Collaborator
|
That's quite some change, he? I admit that I don't have a clue what's going on here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mixed Mode Joint Stereo + Throughput Optimization
Summary
This PR introduces a new Mixed Mode joint-stereo coding path that selects
between Intensity Stereo (IS), Mid/Side (M/S), and plain L/R per scale factor
band, and makes it the default joint mode. It also rewrites the stereo module
for throughput, fixes two real correctness defects, adds a low-sample-rate
robustness fix, and removes duplicated code. Legacy joint modes
(None / M/S / IS) remain bit-identical; Mixed Mode is faster than the previous
default and reconstructs a truer stereo image.
Motivation
The previous default was forced Intensity Stereo (
--joint 2), whichdiscards the inter-channel phase relationship across the whole spectrum to bank
bits for monaural spectral fidelity. That collapses the stereo image more than
necessary at the frequencies where phase still matters. Mixed Mode keeps M/S
(phase-preserving) below a crossover and only uses IS above it, recovering
stereo fidelity while keeping the bit savings where they don't hurt.
Benchmark Results
See results
Windows Screenshot
What's in this PR
JOINT_MIXED, mode 3) per-band IS/M/S/LR selection,exposed in the CLI (
--joint 3) and GUI (Joint Stereo dropdown), and made thedefault.
midside(); guard thesingle-pass energy math against negative-rounding before
sqrt.restrictpointers, hoistedinvariants.
into shared helpers; drop a dead clamp.
Design: Mixed Mode decision logic
Decision hierarchy
Per scale factor band, with IS and M/S mutually exclusive:
frequency floor.
Intensity Stereo
savings against phase fidelity for music; below it, phase carries the stereo
image and IS is not used.
(e.g. 16 kHz) a hard 5.5 kHz floor can exceed the top band and disable IS for
the whole frame; the cap keeps an IS region available. At ≥44.1 kHz the cap
is well above 5.5 kHz, so common rates are unchanged.
isthr = (0.18 / quality) + 1.0, clamped tosqrt(2). Linear(not quality²) scaling retains more phase at low quality. The per-band gate is
ethr = (sqrt(enrgL) + sqrt(enrgR))² / isthr.is dropped entirely (
HCB_ZERO) rather than IS-coded.Mid/Side
thrmid = (thr075 · 0.85) / quality + 1.0(withthr075 = 0.09, i.e.0.0765 / quality + 1.0), clamped; the0.85makesMixed Mode's M/S slightly tighter than pure
JOINT_MSsince IS carries thehigh end.
thrside = 0.1 / quality, clamped to0.3.min(enrgL, enrgR) · thrmid >= max(enrgSum · 0.25, enrgDiff · 0.25). The0.25compensates for the0.5·(L±R)mid/side scaling.Correctness fixes
midside()band-silencing bug. The "zero the far-quieter channel"(
thrside) step ran unconditionally, including on bands already M/S-coded.After M/S the buffers hold mid/side, so zeroing one — while still signalling
ms_used=1— makes the decoder reconstructL = R = 0and silences the band.Added the
!msguard thatmixed()already had. Verified against the FFmpegreference decoder: M/S reconstruction is the butterfly
(L,R) = (M+S, M−S),confirming the failure mode and the fix.
enrgSum/Diff = enrgL + enrgR ± 2·enrgLRis mathematically non-negative butcan round slightly below zero for
L ≈ ±R. Instereo()/mixed()those feedsqrt, so they are now clamped to 0 (NaN guard).midside()has nosqrtconsumer, so no clamp is needed there.
Throughput optimization
The energy loops accumulate
enrgL,enrgR, and the cross termenrgLRin asingle pass, then derive sum/diff energies via
|l±r|² = l² + r² ± 2·l·r(instead of separate sum-of-squares passes). Pointers are
restrict-qualifiedand invariant work is hoisted out of the per-band loops. These changes preserve
the algorithm's output while reducing per-band work.
Code cleanup
The mid/side butterfly, the intensity-stereo combine, and the channel-zeroing
loop were copy-pasted across
stereo(),midside(), andmixed(). They are nowsingle-sourced in
apply_ms(),apply_is(), andzero_channel(). This removesthe duplication that let the
midside()/mixed()guard drift apart in the firstplace, and shrinks the object code. Comments were trimmed to explain only
non-obvious rationale.
Validation & benchmarking
Validated with the
faac-benchmarksuite (candidate build vs. a master baselinebuild) plus direct bit-exact checks.
--joint 0/1/2(None/M/S/IS) producebyte-identical output to master (md5-checked); only the new
--joint 3differs, and only at low sample rates does the IS-floor cap change anything.
stimuli measured roughly +3% overall (CI is positive but noisy run-to-run;
treat the local A/B as authoritative).
audiomode cannot see the stereo image(decoding to stereo vs. mono scores identically), so quality is tracked with a
windowed inter-channel coherence error (
ic_err, lower = truer). MixedMode improves it vs. master (CI Stereo Image Δ ≈ +0.0038), and the fixes
above improve it further.
stereo-coding changes only redistribute bits; MOS is therefore at parity
(and is treated as a regression floor, not a target — maximizing it would
reward stereo collapse).
ms_mask_presentsemantics, and theintensity-sign /
ms_usedcoupling were confirmed against the FFmpeg referencedecoder.
Defaults & compatibility
JOINT_IStoJOINT_MIXED. APIconsumers that never set
jointmodewill get Mixed Mode, which changesbitstream contents (not validity) versus before.
--jointand the GUI dropdown expose all four modes (None / M/S / IS /Mixed).
Investigated but not adopted
ms_mask_present = 2(all-bands-M/S, no per-band mask). Not viable forMixed Mode: value 2 sets the mask for every band, which would invert the
intensity sign of every IS band (the decoder applies
c *= 1 − 2·ms_mask) andforce M/S onto the phase-protected low bands. Only valid for an all-M/S frame,
which this design never produces.
isthr,thrmid, andthrside(over plentiful and bit-starved bitrates) found the existing defaultsalready at a flat optimum — every change was ≤0.004 MOS, within the benchmark's
measurement noise and far below its 0.05 "minor regression" threshold. No
retuning win; the defaults stand.
Caveats
the gold standard remains a subjective MUSHRA/ABX listening test.
Mode's advantage is relative and grows with available bits.