Skip to content

Calculate S* scores

Input

To calculate S* scores, users should provide a VCF file containing genotypes from the reference and target populations (e.g. test.score.data.vcf). Users also need to provide two files containing names of individuals from the reference and target populations (e.g. test.ref.ind.list and test.tgt.ind.list) for analysis.

Users can calculate S* scores with the following command:

    sstar score --vcf test.data.vcf --ref test.ref.ind.list --tgt test.tgt.ind.list --output test.score.results

The expected result above can be found in test.score.exp.results.

Output

An example for the output is below:

chrom start end sample S*_score region_ind_SNP_number S*_SNP_number S*_SNPs
21 0 50000 ind1 51470 11 6 2309,25354,26654,29724,40809,45079

The meaning of each column:

  • The chrom column is the name of the chromosome.
  • The start column is the start position of the current window for calculating S* scores.
  • The end column is the end position of the current window for calculating S* scores.
  • The sample column is the name of the individual.
  • The S*_score column is the estimated S* score.
  • The region_ind_SNP_number column is the number of shared derived variants in the current window between all the individuals from the reference populations and the current individual from the target population.
  • The S*_SNP_number column is the number of S* SNPs found in the current individual.
  • The S*_SNPs column is the positions for S* SNPs found in the current individual.

Settings

By default, sstar assumes the reference allele is the ancestral allele and the alternative allele is the dervied allele. Users can use the argument --anc-allele with a BED format file (e.g. test.anc.allele.bed) to define the ancestral allele for each variant. If --anc-allele is used, then variants without ancestral allele information will be removed. Besides, sstar uses a window size with 50,000 bp and step size with 10,000 bp for calculating S* scores. Users can change these settings with the arguments --win-len and --win-step. Finally, users can use --thread to specifiy numbers of CPUs in order to speed up the calculation.