This section collects all data sets used during the development of FATHMM-indel. Most data are freely available to download.
Data sets are released as VCF files with columns "CHROM", "POS", "REF", "ALT", and "CLASS" (all separated by "tab" characters). The column "CLASS" encodes the functional impact of an indel - 0 for neutral and 1 for pathogenic.
Training data set. Pathogenic non-coding indels were collected from the ClinVar database, whilst neutral non-coding examples were annotated from the exome variant server (EVS) project. Plase use the following link to download the training set:Training Data
Benchmark data sets. Pathogenic non-coding indels were collected from the human gene mutation database (HGMD), whereas neutral non-coding instances were from EVS. We cannot release HGMD indels and, consequently, click on the following link to download the neutral class within our benchmark data set (853 EVS indels):Benchmark Data
To further validate our computational model, we deployed FATHMM-indel to prioritise non-coding indels from the 1000 genomes (1KG) project. Use the following link to download 1KG data (1 466 000 indels):1KG Data