Decimation of streamlines#3410
Conversation
The cubic Hermite (tension Catmull-Rom) basis used to reconstruct streamline trajectories can now yield, alongside the interpolated position, the first derivative of position with respect to the interpolation coefficient mu. The parametric derivative normalises to a unit tangent and its norm gives the local parametric speed, both of which downstream spline-aware decimation and distance computations require. The augmentation is purely additive: derivative basis weights are populated in the same set() pass, the existing position weights and value()/coef() results are byte-for-byte unchanged, and current callers are numerically unaffected. A focused unit test validates the analytic derivative against central differences across several tensions and parameters and checks position/tangent continuity across segment joins. Plan segment (stage 3.1 — augmented interpolator): > 3.1. Alteration of the code that performs interpolation of streamline > trajectories such that it may optionally yield both an interpolated position > between two points and the derivative with respect to the interpolation > coefficient at that location (which can be normalized to produce an interpolated > unit vector tangent). Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
Cubic Hermite interpolation up to a streamline endpoint requires one control point beyond it, and six call sites independently hand-rolled the linear reflection that supplies it, several of them building a throwaway padded copy of the streamline to do so. This introduces a single non-owning SplineView wrapper that precomputes the two reflected ghosts once and exposes them through a signed-index operator[], yielding the front ghost at index -1 and the back ghost at size(). Every consumer now draws its endpoint ghosts from this one source of truth and indexes the view directly, eliminating the duplicated reflection arithmetic and the temporary padded copies. The reflection formula is unchanged, so the generated geometry and all golden outputs are preserved. Plan segment (stage 3.2 — centralised reflected-ghost wrapper): > 3.2. Identify all locations in the code base where additional reflected ghost > vertices are inserted at the head and tail of each streamline to facilitate > cubic interpolation. Centralise this operation to a class that wraps a const > MR::DWI::Tractography::Streamline<>&, overloading operator[] in such a way that > the two pre-calculated reflected ghost vertices are yielded at index -1 and > Streamline<>::size() (unless overridden by a superior proposal during planning > phase). If confirmed during planning, implement change to how those reflected > ghost vertices are generated. > > Planning outcome: reflected-ghost generation is KEPT as linear reflection > (2·P0 − P1); this stage centralises only, with no behaviour change. Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
Introduce MR::DWI::Tractography::curvature(), a dedicated per-vertex
curvature estimator (1/mm) under cpp/core/dwi/tractography/curvature.{h,cpp},
replacing the inline computation in the TWI mapper. Curvature magnitude is
read directly from arc-length-smoothed first and second derivatives via a
single weighted local-polynomial (Savitzky-Golay, with a Gaussian-weighted
variant) rather than from renormalised, turn-angle tangents, so magnitude is
preserved. The smoothing scale auto-tunes per streamline from the turn-angle
autocorrelation, using a tight window for smooth anatomy and a wide one for
decorrelated probabilistic wiggle, bracketed by the sampling step and total
length; FIXED and short-streamline fallbacks remain available. The
load_factors curvature branch is reduced to one call plus an identity factor
copy, deleting the O(N^3) all-pairs arc-length matrix, the dense Gaussian
tangent smoothing and the curvature_smoothing_mm constant. This intentionally
alters tckmap -contrast curvature output: the factor still means per-vertex
curvature in 1/mm, but is now magnitude-preserving and auto-tuned rather than
attenuated and fixed-kernel smoothed.
Plan segment (stage 3.4 — implement curvature plan):
> 3.4. Read file "~/OneDrive/Documents/MRtrix3/tckdecimate/curvature.md" and
> implement planned changes.
Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
Compute the symmetric Hausdorff distance between the tension Catmull-Rom splines of two streamlines, interpreting both via the stage-3.2 ghost view and the stage-3.1 augmented Hermite interpolator. Each directed sweep upsamples one spline to probe points and locates the nearest point on the other spline by a warm-started marching Newton-Raphson on the foot-point condition, advancing the foot monotonically under the proximity and co-orientation assumptions for O(1) amortised cost per probe; the Newton step uses the Gauss-Newton Jacobian f'(s) ~= -||P'(s)||^2 (the stage-3.1 derivative; no analytic second derivative required), with a bracketed local-search fallback on non-convergence. A curvature-aware rule sets the probe spacing from the robust minimum radius of curvature, the inter-vertex spacing, and an optional Hausdorff threshold so the chord-vs-arc discretisation error stays far below the quantity of interest. Degenerate inputs fall back to vertex/endpoint distances and never yield NaN. Plan segment (stage 3.5 — spline Hausdorff distance): > 3.5. Write a function for computing the Hausdorff distance between two > streamlines based on CatmullRom spline interpolation. This initial version of > the function must assume that the two streamlines are proximal along their > entire lengths and their vertex orders are not reversed with respect to one > another, such that one can find from the start vertex of streamline A the > nearest point on the spline of streamline B by commencing Newton-Raphson from > the first vertex of streamline B, then step to the next vertex on streamline A > and use the prior closest point on streamline B as the initialisation point for > Newton-Raphson. Use the augmented interpolator from step 3.1 to provide > derivative of position on streamline B as a function of interpolation > coefficient. Determine, given information on streamline inter-vertex spacing / > optional threshold to be applied to Hausdorff distance / knowledge of minimal > radius of curvature of streamlines the appropriate ratio by which to up-sample > streamlines prior to Hausdorff computation. Perform symmetric computation. Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
Expose a single-pass, curvature-adaptive a-priori streamline decimator as a new tckresample option, -decimate_fast. It places a length- and curvature-dependent number of vertices along each streamline by integrating a cost density rho = 1 + lambda*kappa^(1/4) over arc length and equidistributing the cumulative cost, so vertices cluster where the trajectory bends most sharply. The user knob mu sets the number of output vertices generated per unit curvature-weighted arc length, with stage-3.7 calibration mapping it to an mm error target. Output vertices are evaluated on the original tension-Catmull-Rom spline through the shared reflected-ghost view, so the decode-time reconstruction is consistent by construction; both endpoints are retained exactly, the operation never upsamples, degenerate inputs never yield NaN, and the whole pass is O(N). Plan segment (stage 3.6 — fast decimation): > 3.6. Implement "fast" streamline decimation, using option 7.2: curvature-adaptive > a priori sampling. Provide a suitable user-accessible parameter that allows > tuning of number of vertices to generate per streamline based on length and > curvature. > > Planning outcome: exposed as a new `tckresample` option (-decimate_fast), not a > standalone tckdecimate command. Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
Introduce a standalone command that calibrates the fast streamline decimator's dimensionless density knob against the geometric error it incurs. For each value in a user-specified mu range, every streamline is decimated with the shipped DecimateFast and the symmetric spline Hausdorff distance (mm) between the original and decimated trajectory is measured, reusing both the stage-3.6 decimator and the stage-3.5 hausdorff verbatim. The distribution of those per-streamline distances is reduced, per mu, to the percentiles [50, 75, 95, 99, 99.9, 100] together with the mean output/input vertex ratio, so a value of mu can be chosen that bounds decimation error for a given dataset. The heavy logic lives in cpp/core/dwi/tractography/ behind a thread-queued source -> multi -worker -> sink sweep, leaving the cpp/cmd/ wrapper trivial. Plan segment (stage 3.7 — calibration command): > 3.7. Write a command that allows for calibration of the decimation algorithm > parameter created in step 3.5. It takes a tractogram as input, executes the > decimation algorithm for a range of different values of this parameter, and > computes Hausdorff distances between the decimated and empirical versions. For > each value it then reports percentiles of the distribution of Hausdorff > distances across the streamlines in the tractogram: [50, 75, 95, 99, 99.9, 100]. > Place as much of this code in a suitable location in > cpp/core/dwi/tractography/ so that the invoking cpp/cmd/ command file is trivial. > > Planning correction: the tunable parameter is created in step 3.6 (3.5 is the > Hausdorff function); calibration targets the 3.6 parameter and uses 3.5 to > measure error. Exposed as a standalone command. Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
Introduce a slow streamline decimator that builds a near-minimal on-curve control set whose tension-Catmull-Rom reconstruction stays within a user tolerance epsilon (mm) of the original spline at every input vertex, exposed as the new tckresample option -decimate_slow. Control points are inserted greedily at the on-curve foot of the worst-deviating vertex and then slid along the original spline to tighten the local fit; the two endpoints are retained exactly. Per-vertex nearest-point feet are carried across iterations and only the arc adjacent to a freshly inserted knot is re-solved, with a running per-interval max/argmax, so the encoder avoids the quadratic per-iteration rescan. The warm-started foot-point solver shared with the spline Hausdorff distance is promoted to a single header. Lyche-Morken knot removal and Hoschek parameter correction are left as documented comments at their natural call sites rather than implemented. Plan segment (stage 3.8 — slow decimation): > 3.8. Implement "slow" streamline decimation, using option 3.2: greedy knot > insertion & slide refinement. Utilise streamline tangent interpolation as > implemented in step 3.1 if appropriate. Possible computational optimisation: > given each iteration yields for each input vertex the interpolated nearest > position on the output streamline, re-compute from these an updated suitable > initial search position for the next iteration based on insertion of one new > vertex into the streamline; this is intended to avoid a quadratic nearest-point > check per iteration. Do not attempt to implement Lyche-Morken knot removal or > Hoschek-style correction, but add code comments where they would be applicable > and what benefit they may provide. > > Planning outcome: exposed as a new `tckresample` option (-decimate_slow). Note > "option 3.2" here refers to the design discussion's Problem 2 / Method B > (greedy knot insertion + slide), distinct from plan stage 3.2. Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
The robust streamline curvature estimator auto-tunes its smoothing scale from the lag-2 autocorrelation of per-step turn angles, assuming each vertex is an independent geometric sample. iFOD2 output with the default downsampling disabled violates this: contiguous runs of sub-step vertices are samples of one analytically computed arc, which mis-scales the mean-step-normalised scale ladder and biases the autocorrelation toward "smooth", causing systematic under-smoothing. This adds a vertices_per_parent_arc field to CurvatureConfig and a configure_from_properties() helper that reads the iFOD2 samples_per_step and downsample_factor metadata, adjusts the scale ladder and decimates the autocorrelation to the parent-arc granularity, and warns once when an adjustment applies. The configuration is threaded through DecimateFast, get_resampler/tckresample, and the Hausdorff/decimate-calibrate path, with unit tests covering the metadata mapping and the noise-suppression effect. Session prompts: 1. > In commit 40a43d8, Claude implemented an algorithm for robustly estimating streamline curvature based on local weighted polynomial fitting. It includes an automatic tuning of scale based on autocorrelation. User has concern about prospect of erroneous tuning of this scale for certain streamlines data. With the iFOD2 tracking algorithm, if the default streamline downsampling before export is explicitly disabled, then sets of (by default) 3 steps will be intrinsically autocorrelated, because contiguous sets of three steps are in fact samples from a singularly computed arc. Comment on the prospect of erroneous calibration if such data are provided as input to that function. Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
…trics Rename the tckstats command to tck2metric and restructure it after the tensor2metric pattern: a single compulsory tractogram argument plus optional per-metric export options, all computed in one serialised pass via an ordered multi-threaded queue that preserves input streamline order for streamed outputs. Streamline length is exported per-streamline (-length) or as a mean to stdout (-mean_length); streamline curvature is exported as a per-streamline mean (-mean_curvature) or per-vertex to a .tsf file (-vertex_curvature), using the dedicated local-curvature function. The legacy length-only interface (the statistics table, -output and -histogram) is removed, and the dwigradcheck and peakscheck Python commands now call tck2metric -mean_length. Tests, CMake registrations and command documentation are updated accordingly. Session prompts: 1. > Rename command tckstats to tck2metric. Similarly to command tensor2metric, the only compulsory command-line argument is a path to an input tractogram; multiple command-line options can be specified for exporting an individual metric to file, and all are generated utilising a single serialised scan through the input data. An ordered multi-threading queue must be used to ensure that any output metrics that involve a streamed data format export their per-streamline data in the same order as the input tractogram. Streamline length is the first metric to be computed and exported, as per current tckstats functionality. The second metric to be added is mean streamline curvature, as computed using the dedicated function for computing local streamline curvature in commit 40a43d8. The third metric is per-vertex streamline curvature yielded by that function, which must be exported using a .tsf file. 2. > Change tck2metric interface for exporting streamline lengths. Many options have been carried over from previous tckstats command where streamline length was the exclusive metric of interest, but their interface no longer makes sense when there are multiple such metrics. Functionality of option -dump is now how behaviour of exporting streamline length as a metric of interest should behave. Remove option -histogram. Remove option -output. Add option -mean_length, which computes the mean streamline length and exports to stdout; Python commands dwigradcheck and peakscheck should make use of this functionality. Pretty-printout of statistics on streamline lengths should be removed. Generated-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Streamline length could previously be measured only as the sum of chord lengths between successive vertices. This adds a LengthMethod enumeration that selects between that piecewise-linear chord sum and the arc length of the interpolating cubic Catmull-Rom spline, the latter evaluated by a fixed five-node Gauss-Legendre quadrature over each segment using reflected ghost endpoints. To support it efficiently, Math::Hermite gains a processing-type mode so it can emit derivative-only weights, and the shared SplineProcessingType enumeration moves into its own header for reuse by both the matrix and Hermite interpolators. The length() interface now requires an explicit method argument; all existing call sites select the chord sum to preserve current behaviour. Session prompts: 1. > Streamline length is computed in MR::DWI::Tractography::length() > (cpp/core/dwi/tractography/streamline.h) based on a sum of differences > between successive 3-vectors. Streamlines are however commonly > interpreted as discrete samples on a Catmull-Rom spline. Comment on > whether an approach exists for computing the length of a streamline > based on this continuous representation that does not involve realising > a large number of vertex samples along the spline. 2. > Implement an enumeration that separates streamline length calculation > between chord sum (linear) and Gauss-Legendre quadrature on the cubic > Catmull-Rom spline. Implement the variant of the > MR::DWI::Tractography::length() function that utilises the latter. > Modify the Hermite operator to optionally yield derivative-only in > addition to value-only and value-and-derivative for this function to > make use of. Report on chosen technique for selecting the up-sampling > ratio for this quadrature integration. At all invocation sites, choose > the chord sum version of this function to restore compilation, but > provide a table listing all such call sites. Generated-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generalise tckdecimatecalibrate so it can evaluate either of the two tckresample decimation resamplers, selected with a new -algorithm option (default fast). The fast variant continues to sweep the density knob mu and report the resulting spline-Hausdorff error distribution, while the new slow path sweeps Hausdorff-distance tolerances and reports the compression achieved at each, using the measured Hausdorff percentiles to verify the tolerance bound. Both variants now also report the mean output vertex count and the total time spent inside the decimator, summed per parameter value, to expose the cost-versus-fidelity trade-off. The shared sweep, accumulation and reporting machinery is templated over the decimator type so the two algorithms differ only in the resampler instantiated. Session prompts: 1. > In command tckdecimatecalibrate, add command-line argument to switch > between evaluation of the "slow" vs. "fast" decimation resamplers. When > evaluating the "fast" variant (to be implemented), report for each > user-specified Hausdorff distance tolerance the compression ratio. For > both "slow" and "fast" variants, report the sum of computation times for > resampling streamlines using each individual decimation parameter (mu for > fast variant, Hausdorff distance for slow variant). Additionally suggest > any other metrics that may be of utility to the user in assessing these > algorithms. Generated-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After incrementally adding vertices to a streamline and optimising those vertex positions to minimise Hausdorff distace between the under-sampled streamline and the original, this new step iteratively removes vertices while still satisfying the original Hausdorff distance threshold. Prompt: > In cpp/core/dwi/tractography/resampling/decimate_slow.cpp, implement Lyche-Morken knot removal from line 306. Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
The previous warm-start marching approach would monotonically advance foot searches along the target streamline, causing it to stick on closed loops and report distances on the order of the loop diameter. The Gauss-Newton foot-point solver would also overshoot across high-curvature targets, converging to antipodal points instead of true nearest feet. Replace the marching warm start with arc-length correspondence seeding: each probe at fractional arc length f along the source is seeded at the target's position matching the same fractional arc length, removing monotone drift on loops. Add backtracking line search to the Gauss-Newton solver to prevent overshooting on high-curvature regions. Cost remains sub-quadratic: cumulative-chord tables build once in O(n+m), then each of n+m probes inverts arc length in O(log m) time. Prompt: > Claude is provided with file ~/src/test_data/tck2metric/select_5.tck. Command tckdecimatecalibrate reports Hausdorff distances orders of magnitude greater than the user-specified tolerance for the slow decimate operation. Investigate potential source. Investigate whether this is similar to, or is the same, problem as reported by Claude RE: deviations exceeding specified threshold prior to Lyche-Morken vertex removal. Ensure that Hausdorff computation during decimation is performed based on vertices of the input streamline rather than the output. Evaluate the nearest-point warm-start mechanism requested by the user and implemented by Claude to see if that may be the case of excessive reported Hausdorff distances, and whether a superior warm-start algorithm is possible (user was trying to prevent quadratic cost of finding closest vertex prior to optimisations on the interpolated spline). Questions: > Q: The warm-start fix resolves the 202 mm artifact. But the Lyche-Morken removal (added earlier) keeps vertex-level deviation ≤ tolerance while letting the continuous reconstruction sag up to ~6× tolerance between the sparser control points at tight tolerances. This fails tckdecimatecalibrate_slow's `p100 ≤ 1.5×tolerance` check and contradicts the command's "reconstruction bounded within tolerance... spline-level error respects the bound" description (which holds without removal: p100=0.0125 vs 0.123 at tol 0.02). How should I reconcile this? > A: Make removal spline-aware Generated-by: Claude Opus 4.8 <noreply@anthropic.com>
| #include <vector> | ||
|
|
||
| #include "command.h" | ||
| #include "memory.h" |
There was a problem hiding this comment.
warning: included header command.h is not used directly [misc-include-cleaner]
| #include "memory.h" | |
| #include "memory.h" |
|
|
||
| #include "command.h" | ||
| #include "memory.h" | ||
| #include "progressbar.h" |
There was a problem hiding this comment.
warning: included header memory.h is not used directly [misc-include-cleaner]
| #include "progressbar.h" | |
| #include "progressbar.h" |
| // clang-format off | ||
| void usage() { | ||
|
|
||
| AUTHOR = "Robert E. Smith (robert.smith@florey.edu.au)"; |
There was a problem hiding this comment.
warning: no header providing "MR::App::AUTHOR" is directly included [misc-include-cleaner]
cpp/cmd/tck2metric.cpp:20:
- #include "command.h"
+ #include "app.h"
+ #include "command.h"|
|
||
| AUTHOR = "Robert E. Smith (robert.smith@florey.edu.au)"; | ||
|
|
||
| SYNOPSIS = "Compute one or more per-streamline metrics for a tractogram"; |
There was a problem hiding this comment.
warning: no header providing "MR::App::SYNOPSIS" is directly included [misc-include-cleaner]
SYNOPSIS = "Compute one or more per-streamline metrics for a tractogram";
^|
|
||
| SYNOPSIS = "Compute one or more per-streamline metrics for a tractogram"; | ||
|
|
||
| DESCRIPTION |
There was a problem hiding this comment.
warning: no header providing "MR::App::DESCRIPTION" is directly included [misc-include-cleaner]
DESCRIPTION
^| Receiver receiver( | ||
| properties, length_path.has_value(), mean_curvature_path.has_value(), vertex_curvature_path, header_count); | ||
| Thread::run_ordered_queue( | ||
| reader, Thread::batch(Streamline<value_type>()), Thread::multi(worker), Thread::batch(Metrics()), receiver); |
There was a problem hiding this comment.
warning: no header providing "MR::Thread::batch" is directly included [misc-include-cleaner]
cpp/cmd/tck2metric.cpp:23:
- #include "types.h"
+ #include "thread_queue.h"
+ #include "types.h"| Receiver receiver( | ||
| properties, length_path.has_value(), mean_curvature_path.has_value(), vertex_curvature_path, header_count); | ||
| Thread::run_ordered_queue( | ||
| reader, Thread::batch(Streamline<value_type>()), Thread::multi(worker), Thread::batch(Metrics()), receiver); |
There was a problem hiding this comment.
warning: no header providing "MR::Thread::batch" is directly included [misc-include-cleaner]
reader, Thread::batch(Streamline<value_type>()), Thread::multi(worker), Thread::batch(Metrics()), receiver);
^| if (get_options("ignorezero").empty() && (receiver.empty_count() != 0 || receiver.zero_length_count() != 0)) { | ||
| std::string s("read"); | ||
| if (receiver.empty_count() != 0) { | ||
| s += " " + str(receiver.empty_count()) + " empty streamlines"; |
There was a problem hiding this comment.
warning: no header providing "MR::str" is directly included [misc-include-cleaner]
s += " " + str(receiver.empty_count()) + " empty streamlines";
^| } | ||
| if (receiver.zero_length_count() != 0) | ||
| s += " " + str(receiver.zero_length_count()) + " streamlines with zero length (one vertex only)"; | ||
| WARN(s); |
There was a problem hiding this comment.
warning: no header providing "WARN" is directly included [misc-include-cleaner]
WARN(s);
^| if (mean_length) { | ||
| const default_type sum_weights = receiver.sum_weight(); | ||
| const float value = sum_weights != 0.0 ? static_cast<float>(receiver.sum_length() / sum_weights) : NaNF; | ||
| std::cout << str(value) << "\n"; |
There was a problem hiding this comment.
warning: no header providing "MR::str" is directly included [misc-include-cleaner]
std::cout << str(value) << "\n";
^
This was a tinkering-with-Claude scratch-an-itch feature branch. Posting here as draft so that it doesn't get forgotten as there's probably some features worth preserving, but right now I need to clear out my parallel work terminals and hard drive space, so it might have to sit here idle for a while.
Main aspiration: In attempting #411 and including a couple of tractogram formats whose focus is compression, I was reminded of a long-standing idea on how to decrease tractogram storage space. The premise is as follows. We apply a specific interpretation to streamlines data, being: a Catmull-Rom "Hermite" spline, with a fixed tension term, and a specific mechanism for facilitating that interpolation all the way to the streamline endpoints. What we want is a way to encode streamline data that recovers that same finite continuous path within a permissible tolerance, but requiring less data.
One way to attempt this is to find an alternative set of vertices, for which imposing the very same spline interpretation yields a path for which the discrepancy to the original spline is less than some threshold. It would have the added benefit of still being an ordered set of 3-vectors without fancy binary encoding so could be visualised as-is. This is however unfortunately quite expensive due to the Hausdorff distance computation, despite some trickery to make it as efficient as possible; likely too expensive to use in
tckgen.There's a second approach that lives in a similar sort of space. The imprecision associated with a cubic fit grows as (curvature)^4; so rather than distributing vertices along a streamline equal in distance, one can instead distribute them in such a way that this error term is equally distributed. Works OK but not paradigm-shifting.
Here these are both just exposed through
tckresample.Some other neat little features included along the way:
Tractographynamespace.tck2metricsubsumestckstatsand can do curvature or length.To merge, would need a more thorough code review than what it's had thus far, remove the
tckdecimatecalibratecommand, and import new test data (MRtrix3/test_data@463e8c1).