January 15, 2020

How To Use Whole Genome Sequencing Data

A straightforward guide for using whole genome sequencing data files from Dante Labs. If you've had your genome sequenced, you've discovered that your genome is far more than just one file.

How To Decide Which Genome Data File Is Best To Use With DNA Analysis Reports

When you have your whole genome sequenced, your genome can't fit into a single file. Instead, you'll receive several files in several different file formats. Dante Labs, for example, provides your genome spread out throughout many different files and file formats.

This guide will help you understand what type of data each of the files provides and which files are best to use with DNA analysis apps and DNA reports.

 

How To Download Genome Sequencing Data 

Your genome sequencing data files are going to be massive. They'll range in size from around 30 GB (FASTQ and BAM files) to around 1 GB (VCFs).

While Dante Labs allows its customers to download all of the raw genome sequencing data files, downloading a 30 GB file can sometimes overwhelming your local computer. The first step is to make sure your computer has enough free hard drive space. You can then download your data files directly from your Dante Labs account.

Dante Labs has set each download link to expire 60 seconds after its generated. This means you have to immediately start the download and can't save the download link for future use. If you are going to use the link in a download accelerator, make sure you copy and paste the link quickly so that the download starts before 60 seconds.

If you don't want to deal with the hassle of downloading and storing your genome files on your computer, we provide an alternative. To learn more, check out the next section for importing and storing genome sequencing data.

 

How to Import and Store Genome Sequencing Data For Free

You can use Sequencing.com's automatic importer to easily import all of your Dante Labs data files directly from your Dante Labs account into your Sequencing.com account.

Storage is unlimited, secure and free. This means you can import and save your genome data in your Sequencing.com account without having to worry about hard drive storage space or paying for cloud storage.

Once your genome data is imported into your Sequencing.com account, it's protected by our privacy and ownership policies. In short, you own your data and we help you keep it safe.

To import your data, simply go to the Upload Center and click the Dante Labs button.

 

How To Enhance Your Genome Data

One Genome is a new technology that automatically combines together the highest quality data from each of your genome sequencing files into a single enhanced virtual genome.

Learn more about enhancing your genome

 

How To Use Genome Sequencing Data Files and File Formats

The table below provides important information about the genome sequencing data files most commonly provided by Dante Labs and other genome sequencing laboratories.

File Type Filename About Relevance for DNA Analysis Apps Notes
FASTQ *.fq.gz or *.fastq.qz

A FASTQ file is the raw data from the sequencing machine. All other file types can be generated from the FASTQ. A genome is most often provided as a pair (2 large FASTQ, one with R1 in the filename and the other containing R2) but Dante Labs may also provide a genome as 14 or more smaller FASTQ files.

When your FASTQ files are imported into your Sequencing.com account, our system automatically detects all of the FASTQ files that compose a genome and links them together as a single dataset. When you use a FASTQ file with an app, simply select one of your FASTQ files and the app will analyze the entire dataset.

Number of FASTQ files per genome: most often 2 but could be as many as 16.

Excellent to use with apps.

Our system will automatically generate all of the different genetic variation data needed by the app so instead of using an app multiple times to analyze each of your VCF, if you use your FASTQ then you just need to use the app once. 

Once imported into your Sequencing.com account, our system automatically identifies and links FASTQ files from the same genome together as a dataset.

If your genome is composed of 4 or more FASTQ files, your FASTQ files will only be linked together as a single genome if you use our automatic importer (by clicking the 'Dante Labs' button in the Upload Center).

BAM *.bam

A BAM file is generated by aligning the FASTQ files to the reference genome. BAM is still considered raw DNA data.

Number of BAM files per genome: 1

Excellent to use with apps.

Our system will automatically generate all of the different genetic variation data needed by the app.

If your FASTQ files are already stored in your Sequencing.com account then this file isn't necessary to store. But for those who love data, it's still nice to have.
SNP VCF *.snp.vcf.gz

VCF files are generated by analyzing the BAM file. The SNP VCF contains data on single nucleotide variations. The SNP VCF provided by Dante may be a regular VCF, which means it does not contain data on homozygous reference calls (SNPs that have the same result as the reference genome). This is done to provide a smaller, more manageable file.

Dante Labs may also provide a genome VCF, which will contain data for all calls, even those that are homozygous reference as well as those that are low quality. Genome VCFs are usually around 5-10 GB with gz compression, while regulr VCFs are considerably smaller in file size. A regular VCF and a genome VCF will both have a filename that ends in .snp.vcf.gz so you cannot use the filename to differentiate.

Number of SNP VCF files per genome: 1

The best VCF file to use with apps.

If you don't have a FASTQ or BAM, use this file with apps.

Most DNA analysis apps primarily analyze SNP data, which is the type of data provided by this file.

 
INDEL VCF *.indel.vcf.gz

Contains data on insertion and deletion variations.

Number of INDEL VCF files per genome: 1

Usually not the best file to use with apps.

If you don't have access to your FASTQ or BAM and want to view the insertions and deletions in your genome analyzed, this file can be used with Genome Explorer.

 
CNV VCF *.cnv.vcf.gz

Contains data on copy number variations.

Number of CNV VCF files per genome: 1

Do not use with apps except for Genome Explorer.

Genome Explorer can be used to view and search the copy number variation data contained in this file.

 
SV VCF *.sv.vcf.gz

Contains data on structural variations such as duplications and rearrangements. This file also contains data on very large insertions and deletions.

Number of SV VCF files per genome: 1

Do not use with apps except for Genome Explorer.

Genome Explorer can be used to view and search the structural variation data contained in this file.

 
MITO VCF *.mito.vcf.gz

Contains data on mitochondrial heteroplasmy. This data file may only be provided under special circumstances, such as if it is specially ordered.

Number of MITO VCF files per genome: 1

Do not use with apps except for Genome Explorer.

Genome Explorer can be used to view and search the mitochondrial heteroplasmy data contained in this file.

If you weren't provided with a mito.vcf.gz file, our One Genome Technology can generate this file from either your FASTQ or BAM.
BAI *.bam.bai Contains index data. Incompatible with all apps. This file does not provide any relevant data beyond what is already provided by the BAM or FASTQ. While some third-party software may use BAI files, Sequencing.com does not.
TBI *.vcf.gz.tbi Contains index data. Incompatible with all apps. These files do not provide any relevant data beyond what is already provided by the BAM or FASTQ. While some third-party software may use TBI files, Sequencing.com does not.
PDF *.pdf Contains genetic reports. Incompatible with apps (as these are analyzed reports). These files can be stored in your account, securely shared with others and downloaded from your account whenever needed.

 

Dante Labs is a registered trademark of Dante Labs, Inc. The use of the name and logo are for compatibility information only and does not imply approval or endorsement of Sequencing.com by Dante Labs, Inc.

Categories:
DNA Analysis App Spotlight for Environmental Toxins, Nutrigenomics and Skin Genes
August 21, 2019
This week, our DNA App Spotlight shines upon three great apps from GeneInformed. Over the past six years, the experienced geneticists and DNA experts at GeneInformed have been hard at work, improving health and wellness through genetics.
5 Insightful DNA Report Apps from Silverberry Genomix
August 6, 2019
In this week's edition of the DNA App Spotlight, we're proud to highlight DNA apps from Silverberry Genomix, a DNA lifestyle company that uses artificial intelligence and genetic assessments to optimize your health, wellness, and lifestyle.
Father's Day Ultimate DNA Kit
June 9, 2020
Most DNA kits only test .1% of your DNA. Our DNA test gives you 50 times more data. Can you imagine the possibilities Dad will have at his fingertips? For Father's Day, we're offering 30% off all DNA Analysis Apps and Reports, $200 off Whole Genome Sequencing, and $120 off DNA Testing.