infer
The infer command identifies candidate introgressed tracts from genotype data.
It reads preprocessing settings from the sstar2 configuration file, computes features for genomic windows, applies a trained ONNX model, and writes windows whose observed S*_score is greater than the predicted score to a BED file.
Example
Using the trained model from the train command and the configuration file, users can infer the introgressed fragments with the following command:
sstar2 infer --model examples/results/sstar2.example.trained.model.onnx \
--config examples/data/sstar2.example.config.yaml \
--feat-file examples/results/sstar2.example.inference.features.tsv \
--pred-file examples/results/sstar2.example.pred.tsv \
--tract-file examples/results/sstar2.example.inferred.tracts.bed
The inferred tracts can be found here. Two additional files are generated: one records the features for prediction, and the other contains the observed and predicted S* scores.
Outputs
--feat-file: feature TSV generated from the input genotype data specified in the configuration file.--pred-file: prediction TSV containing the observed and predicted S* scores.--tract-file: BED file of inferred tracts.
The BED file is tab-separated and has no header:
chrom start end sample
The start coordinate is written in BED-style 0-based format.
Settings
| Argument | Description |
|---|---|
--model |
Path to the trained model file. |
--config |
Path to the sstar2 configuration YAML file. |
--feat-file |
Path to the feature TSV file. |
--pred-file |
Path to the prediction TSV file. |
--tract-file |
Path to the output BED file. |
--match-bonus |
Bonus for matching genotypes between two variants. Default: 5000. |
--max-mismatch |
Maximum genotype distance allowed before a pair is discarded. Default: 5. |
--mismatch-penalty |
Penalty for mismatching genotypes between two variants. Default: -10000. |