Clinical+ VCF
Clinical+ VCF format eliminates the ambiguity of standard VCFs while providing comprehensive information all within a single file.
The Clinical+ VCF format was created by Sequencing.com for use in clinical applications. Clinical+ VCFs resolve the ambiguity that exists in standard VCF data while still maintaining a manageable file size.
Standard VCF files include data for a specific chromosomal coordinate if the variant (alt) allele is detected. Ambiguity arises, however, when a chromosomal coordinate is not included in a VCF because this could mean either the reference allele was detected or there was a no call at that coordinate. This ambiguity is not acceptable for clinical applications because clearly differentiating between a reference allele and a no-call can have a significant impact upon the interpretation of the data.
Clinical+ VCF resolves this limitation by identifying if there is a no-call and indicating the cause of the no-call, such as low genotype quality score, low coverage or conflicting prediction. Clinical+ VCFs can therefore have two possible results: variant (alt) allele is detected or no-call. If a chromosomal coordinate is not listed then this means the call was the same as the reference allele.
While similar to gVCF (Genome VCF) format, Clinical+ VCFs are considerably smaller in size. gVCFs include three possible results (variant, reference, no-call) but the ‘reference’ result is extraneous information that can be safely excluded since the information can be obtained from the reference genome. As the majority of calls in a GVCF are reference, excluding reference calls from a Clinical+ VCF while still identifying no-calls means that a Clinical+ VCF provides the same data in a much smaller file size.
Clinical+ VCFs also include comprehensive information that may have clinical relevance. This includes:
- Annotation with SnpEff, HGVS and VEP
- Addition of reference SNP IDs (rs) with SnpSift
- Interpretation with ClinVar
Clinical+ VCF’s can be automatically generated from most genetic data files (such as FASTA, FASTQ, BAM, SEQUENCE.TXT, etc.) using the following apps:
There are three versions of Clinical+ VCFs:
Clinical+ WGS VCF
- Included: Calls, No-Calls and Coordinates not interrogated
- Coordinates not interrogated included as blocks
- Excluded: Homozygous Reference calls
Clinical+ Exome VCF
- Included: Calls, No-Calls and Coordinates not interrogated
- Coordinates not interrogated included as blocks
- Excluded: Homozygous Reference calls
Clinical+ Array VCF
- Included: Calls (including Homozygous Reference) and No Calls
- Excluded: Coordinates not interrogated