Calculate source match rates
Input
To calculate source match rates, users should provide a VCF file containing genotypes from the reference, target, and source populations (e.g. test.match.rate.data.vcf). Users also need to provide three files containing names of individuals from the reference, target and source populations (e.g. ref.ind.list, tgt.ind.list and nean.ind.list) for analysis. The file (e.g. test.match.rate.score.exp.results) containing S* scores from sstar score is also required.
Users can calculate source match rates with the following command:
sstar matchrate --vcf test.match.rate.data.vcf --ref ref.ind.list --tgt tgt.ind.list --src nean.ind.list --score test.match.rate.score.exp.results --output test.match.rate.results
The expected result above can be found in test.match.rate.exp.results.
Output
An example for the output is below:
| chrom | start | end | sample | match_rate | src_sample | 
|---|---|---|---|---|---|
| 21 | 9400000 | 9450000 | NA06986 | 0.0454545 | Nean | 
The meanings of the first to fourth columns are the same as those in the output from sstar score. The meanings of the remaining columns:
- The match_ratecolumn is the source match rate on the current region.
- The src_samplecolumn is the name of the individual from the source population for calculating the source match percentage.
Settings
By default, sstar assumes the reference allele is the ancestral allele and the alternative allele is the dervied allele. Users can use the argument --anc-allele with a BED format file (e.g. test.anc.allele.bed) to define the ancestral allele for each variant. If --anc-allele is used, then variants without ancestral allele information will be removed. Users also can provide a BED file (e.g. test.mapped.region.bed) defining non-overlapping mapped regions with the argument --mapped-region. Finally, users can use --thread to specifiy numbers of CPUs in order to speed up the calculation. If users do not specify numbers of CPUs.