Skip to content

match

The match command calculates source match rates for inferred tracts.

For each tract, it compares the target sample with all source samples in the input VCF and reports the mean match rate across source individuals. The tract file must contain at least four columns: chromosome, start, end, and sample. Additional columns are ignored. Target and source sample lists must not overlap. For phased target labels, names ending in _1 or _2 are interpreted as haplotypes of the corresponding base sample.

Tract coordinates are interpreted as BED-style intervals. A VCF position is included in a tract when:

start < POS <= end

If a tract contains no variants from the VCF, match_rate is written as NA.

Example

Users can calculate the match rate between each inferred tract and a Neanderthal sample with the following command:

sstar2 match --vcf examples/data/sstar2.example.biallelic.snps.vcf.gz \
             --tgt examples/data/tgt.samples.list \
             --src examples/data/nean.samples.list \
             --tract-file examples/results/sstar2.example.inferred.tracts.bed \
             --output examples/results/sstar2.example.nean.match.rate.bed

The output can be found here.

Outputs

The output is a tab-separated BED file without a header:

chrom  start  end  sample  match_rate

Settings

Argument Description
--vcf Path to the VCF file.
--tgt Path to the target individual list.
--src Path to the source individual list.
--tract-file Path to the inferred tract BED file from sstar2 infer.
--output Path to the output match-rate BED file.
--ploidy Ploidy used to normalize dosage differences. Default: 2.