How To Use Whole Genome Sequencing Data
The straightforward guide to using whole genome sequencing data files from genome sequencing test providers such as Sequencing.com, Dante Labs, Nebula Genomics and GeneDx.
How To Decide Which Genome Data File Is Best To Use With DNA Analysis Reports
When you have your whole genome sequenced, your genome can’t fit into a single file. Instead, you’ll receive several files in several different file formats. Dante Labs and Nebula Genomics, for example, both provide your genome spread out throughout many different files and file formats.
This guide will help you understand what type of data each of the files provides and which files are best to use with DNA analysis apps and DNA reports.
How To Download Genome Sequencing Data such as those from Dante Labs and Nebula Genomics
Your genome sequencing data files are going to be massive. They’ll range in size from around 30 GB (FASTQ and BAM files) to around 1 GB (VCFs).
While Dante Labs, Nebula Genomics, Full Genomics, and most genome sequencing laboratories allow its customers to download all of the raw genome sequencing data files, downloading a 30 GB file can sometimes overwhelming your local computer. The first step is to make sure your computer has enough free hard drive space. You can then download your data files directly from your Dante Labs account.
Dante Labs has set each download link to expire 60 seconds after its generated. This means you have to immediately start the download and can’t save the download link for future use. If you are going to use the link in a download accelerator, make sure you copy and paste the link quickly so that the download starts within 60 seconds.
If you don’t want to deal with the hassle of downloading and storing your genome files on your computer, we provide an alternative. To learn more, check out the next section for importing and storing genome sequencing data.
Dante Labs Download Problems [UPDATED 2022]
Around September 2020, some Dante Labs customers started to notice that they were seeing errors when trying to download data from their Dante Labs account.
As of 2022, Dante Labs customers continue to report that they are still unable to download their raw DNA data files from their Dante Labs’ account.
While there has been no official announcement, Dante’s support representative stated that going forward, Dante will no longer allow files to be downloaded for free. Instead, the only way to download some of the raw DNA files from your Dante Lab’s account will be to sign up for and maintain a Dante Lab’s subscription. We further discuss this change in our Dante Labs review.
How to Import and Store Genome Sequencing Data For Free
You can use Sequencing.com’s automatic importer to easily import all of your Dante Labs and Nebula Genomics data files directly into your Sequencing.com account.
Storage is unlimited, secure, and free. This means you can import and save your genome data in your Sequencing.com account without having to worry about hard drive storage space or paying for cloud storage.
Once your genome data is imported into your Sequencing.com account, it’s protected by our privacy and ownership policies. In short, you own your data and we help you keep it safe.
To import your data, simply go to the Upload Center and click the Dante Labs button.
How To Enhance Your Genome Data
One Genome is a new technology that automatically combines together the highest quality data from each of your genome sequencing files into a single enhanced virtual genome.
Learn more about enhancing your genome
Alternatives to Dante Labs, Nebula and Full Genomes
There are several providers of whole genome sequencing. While our DNA test provider comparison provides insight into the most popular DNA testing and genome sequencing services, you can also now order whole genome sequencing from Sequencing.com.
We provide our own clinical-grade 30x whole genome sequencing as part of our Whole Genome Sequencing service. Our genome sequencing service obtains data on 3 billion chromosomal coordinates including all autosomes (chromosomes 1-22) as well as the X, Y (males only) and MT chromosomes.
Since diploid data is provided, the total amount of data obtained is on approximately 6 billion chromosomal coordinates.
Whole Genome Sequencing obtains data on
- SNVs/SNPs (single nucleotide variants)
- INDELs (insertion deletion variants)
- CNVs (copy number variations)
- SVs (structural variations)
- Mitochondrial heteroplasmy
Data is aligned to GRCh38.p13 + rCRS MT and is provided in the following files and formats
- Paired FASTQ
- Genome VCF (SNPs + INDELs)
- CNV VCF
- SV VCF
- TXT (Ultimate Compatibility File)
The Ultimate Compatibility File is a universally compatible txt file designed to work with third-party sites. For more information about this special file, please see our FAQs.
How To Use Genome Sequencing Data Files and File Formats
The table below provides important information about the genome sequencing data files most commonly provided by Dante Labs, Nebula Genomics, Sequencing.com, and other genome sequencing laboratories.
File Type | Filename | About | Relevance for DNA Analysis Apps | Notes |
---|---|---|---|---|
FASTQ | *.fq.gz or *.fastq.qz | A FASTQ file is the raw data from the sequencing machine. All other file types can be generated from the FASTQ. A genome is most often provided as a pair (2 large FASTQ, one with R1 in the filename and the other containing R2) but Dante Labs may also provide a genome as 14 or more smaller FASTQ files. When your FASTQ files are imported into your Sequencing.com account, our system automatically detects all of the FASTQ files that compose a genome and links them together as a single dataset. When you use a FASTQ file with an app, simply select one of your FASTQ files and the app will analyze the entire dataset. Number of FASTQ files per genome: most often 2 but could be as many as 16. | Excellent to use with apps. Our system will automatically generate all of the different genetic variation data needed by the app so instead of using an app multiple times to analyze each of your VCF, if you use your FASTQ then you just need to use the app once. | Once imported into your Sequencing.com account, our system automatically identifies and links FASTQ files from the same genome together as a dataset. If your genome is composed of 4 or more FASTQ files, your FASTQ files will only be linked together as a single genome if you use our automatic importer (by clicking the ‘Dante Labs’ button in the Upload Center). |
BAM | *.bam | A BAM file is a binary generated by aligning the FASTQ files to the reference genome. BAM is still considered raw DNA data. Unlike FASTQs and VCFs, BAMs are never compressed. Number of BAM files per genome: 1 | Excellent to use with apps. Our system will automatically generate all of the different genetic variation data needed by the app. | If your FASTQ files are already stored in your Sequencing.com account then this file isn’t necessary to store. But for those who love data, it’s still nice to have. |
CRAM | *.cram | A CRAM is very similar to a BAM. A sequencing service will usually provide a BAM or a CRAM but not both (since they are so similar). A CRAM file is generated by aligning the FASTQ files to the reference genome. CRAM is considered raw DNA data. Similar to BAM, CRAM files are also never compressed. Number of CRAM files per genome: 1 | Excellent to use with apps. Our system will automatically generate all of the different genetic variation data needed by the app. | If your FASTQ files are already stored in your Sequencing.com account then this file isn’t necessary to store. But for those who love data, it’s still nice to have. |
SNP VCF | *.snp.vcf.gz | VCF files are generated by analyzing the BAM file. The SNP VCF contains data on single nucleotide variations. The SNP VCF provided by Dante may be a regular VCF, which means it does not contain data on homozygous reference calls (SNPs that have the same result as the reference genome). This is done to provide a smaller, more manageable file. Dante Labs may also provide a genome VCF, which will contain data for all calls, even those that are homozygous reference as well as those that are low quality. Genome VCFs are usually around 5-10 GB with gzip (gz) compression, while regular VCFs are considerably smaller in file size. A regular VCF and a genome VCF will both have a filename that ends in .snp.vcf.gz so you cannot use the filename to differentiate. Number of SNP VCF files per genome: 1 | The best VCF file to use with apps. If you don’t have a FASTQ or BAM, use this file with apps. Most DNA analysis apps primarily analyze SNP data, which is the type of data provided by this file. | |
INDEL VCF | *.indel.vcf.gz | Contains data on insertion and deletion variations. Number of INDEL VCF files per genome: 1 | Usually not the best file to use with apps. If you don’t have access to your FASTQ or BAM and want to view the insertions and deletions in your genome analyzed, this file can be used with Genome Explorer. | |
CNV VCF | *.cnv.vcf.gz | Contains data on copy number variations. Number of CNV VCF files per genome: 1 | Do not use with apps except for Genome Explorer. Genome Explorer can be used to view and search the copy number variation data contained in this file. | |
SV VCF | *.sv.vcf.gz | Contains data on structural variations such as duplications and rearrangements. This file also contains data on very large insertions and deletions. Number of SV VCF files per genome: 1 | Do not use with apps except for Genome Explorer. Genome Explorer can be used to view and search the structural variation data contained in this file. | |
MITO VCF | *.mito.vcf.gz | Contains data on mitochondrial heteroplasmy. This data file may only be provided under special circumstances, such as if it is specially ordered. Number of MITO VCF files per genome: 1 | Do not use with apps except for Genome Explorer. Genome Explorer can be used to view and search the mitochondrial heteroplasmy data contained in this file. | If you weren’t provided with a mito.vcf.gz file, our One Genome Technology can generate this file from either your FASTQ or BAM. |
BAI | *.bam.bai | Contains index data. | Incompatible with all apps. | This file does not provide any relevant data beyond what is already provided by the BAM or FASTQ. While some third-party software may use BAI files, Sequencing.com does not. |
TBI | *.vcf.gz.tbi | Contains index data. | Incompatible with all apps. | These files do not provide any relevant data beyond what is already provided by the BAM or FASTQ. While some third-party software may use TBI files, Sequencing.com does not. |
Contains genetic reports. | Incompatible with apps (as these are analyzed reports). | These files can be stored in your account, securely shared with others, and downloaded from your account whenever needed. |
Dante Labs is a registered trademark of Dante Labs, Inc. The use of the name and logo are for compatibility information only and does not imply approval or endorsement of Sequencing.com by Dante Labs, Inc.