Skip to content

Calculate source match rates

Input

To calculate source match rates, users should provide a VCF file containing genotypes from the reference, target, and source populations (e.g. test.match.rate.data.vcf). Users also need to provide three files containing names of individuals from the reference, target and source populations (e.g. ref.ind.list, tgt.ind.list and nean.ind.list) for analysis. The file (e.g. test.match.rate.score.exp.results) containing S* scores from sstar score is also required.

Users can calculate source match rates with the following command:

sstar matchrate --vcf test.match.rate.data.vcf --ref ref.ind.list --tgt tgt.ind.list --src nean.ind.list --score test.match.rate.score.exp.results --output test.match.rate.results

The expected result above can be found in test.match.rate.exp.results.

Output

An example for the output is below:

chrom start end sample match_rate src_sample
21 9400000 9450000 NA06986 0.0454545 Nean

The meanings of the first to fourth columns are the same as those in the output from sstar score. The meanings of the remaining columns:

  • The match_rate column is the source match rate on the current region.
  • The src_sample column is the name of the individual from the source population for calculating the source match percentage.

Settings

By default, sstar assumes the reference allele is the ancestral allele and the alternative allele is the dervied allele. Users can use the argument --anc-allele with a BED format file (e.g. test.anc.allele.bed) to define the ancestral allele for each variant. If --anc-allele is used, then variants without ancestral allele information will be removed. Users also can provide a BED file (e.g. test.mapped.region.bed) defining non-overlapping mapped regions with the argument --mapped-region. Finally, users can use --thread to specifiy numbers of CPUs in order to speed up the calculation. If users do not specify numbers of CPUs.