Currently create will create in-memory arrays for all the fixed (VCF) fields from the two input VCZs: variant_contig, variant_position, variant_allele, variant_id, variant_quality, and variant_filter in order to merge them into a single VCZ.
It would be better to just load the first three (variant_contig, variant_position, variant_allele) and keep the indexes (and sort order) in memory, then process the other variant fields sequentially. This is similar to normalise which first computes an index, then uses it on the dataset.
See #95 for more details and timings.
Currently
createwill create in-memory arrays for all the fixed (VCF) fields from the two input VCZs:variant_contig,variant_position,variant_allele,variant_id,variant_quality, andvariant_filterin order to merge them into a single VCZ.It would be better to just load the first three (
variant_contig,variant_position,variant_allele) and keep the indexes (and sort order) in memory, then process the other variant fields sequentially. This is similar tonormalisewhich first computes an index, then uses it on the dataset.See #95 for more details and timings.