Skip to content

train

The train command trains a quantile regression model for S*-based archaic introgression detection.

It reads a Demes demographic model without introgression and an sstar2 configuration file. If the training feature table is not already present, sstar2 train first simulates training data and writes the feature table. It then trains a GradientBoostingRegressor model to predict S*_score from Region_ind_SNP_number.

The training feature table is derived from the model output path:

<output-prefix>.training.features.tsv

If this file already exists, sstar2 train reuses it instead of running simulation again.

Example

Using the example demographic model and example configuration file, users can train a quantile regression model with the following command:

sstar2 train --demes examples/data/HumanNeanderthal_4G21_wo_introgression.yaml \
             --config examples/data/sstar2.example.config.yaml \
             --output examples/results/sstar2.example.trained.model.onnx

The trained model can be found here and the features used to train it are available here.

Outputs

  • --output: trained ONNX model file.
  • <output-prefix>.training.features.tsv: simulated training feature table.

Settings

Argument Description
--demes Path to the Demes demographic model file.
--config Path to the sstar2 configuration YAML file.
--output Path to the trained model output file.
--match-bonus Bonus for matching genotypes between two variants. Default: 5000.
--max-mismatch Maximum genotype distance allowed before a pair is discarded. Default: 5.
--mismatch-penalty Penalty for mismatching genotypes between two variants. Default: -10000.