January 15, 2020

How To Use Whole Genome Sequencing Data

How To Use Whole Genome Sequencing Raw Data Files From Dante Labs

A straightforward guide for using whole genome sequencing data files from Dante Labs. If you've had your genome sequenced, you've discovered that your genome is far more than just one file.

When you have your whole genome sequenced, your genome can't fit into a single file. Instead, you'll receive several files in several different file formats. Dante Labs, for example, provides your genome spread out throughout many different files and file formats.

This guide will help you understand what type of data each of the files provides and which files are best to use with DNA analysis apps.

 

How To Download Genome Sequencing Data 

Your genome sequencing data files are going to be massive. They'll range in size from around 30 GB (FASTQ and BAM files) to around 1 GB (VCFs).

While Dante Labs allows its customers to download all of the raw genome sequencing data files, downloading a 30 GB file can sometimes overwhelming your local computer. The first step is to make sure your computer has enough free hard drive space. You can then download your data files directly from your Dante Labs account.

Dante Labs has set each download link to expire 60 seconds after its generated. This means you have to immediately start the download and can't save the download link for future use. If you are going to use the link in a download accelerator, make sure you copy and paste the link quickly so that the download starts before 60 seconds.

If you don't want to deal with the hassle of downloading and storing your genome files on your computer, we provide an alternative. To learn more, check out the next section for importing and storing genome sequencing data.

 

How to Import and Store Genome Sequencing Data For Free

You can use Sequencing.com's automatic importer to easily import all of your Dante Labs data files directly from your Dante Labs account into your Sequencing.com account.

Storage is unlimited, secure and free. This means you can import and save your genome data in your Sequencing.com account without having to worry about hard drive storage space or paying for cloud storage.

Once your genome data is imported into your Sequencing.com account, it's protected by our privacy and ownership policies. In short, you own your data and we help you keep it safe.

To import your data, simply go to the Upload Center and click the Dante Labs button.

 

How To Enhance Your Genome Data

One Genome is a new technology that automatically combines together the highest quality data from each of your genome sequencing files into a single enhanced virtual genome.

Learn more about enhancing your genome

 

How To Use Genome Sequencing Data Files and File Formats

The table below provides important information about the genome sequencing data files most commonly provided by Dante Labs and other genome sequencing laboratories.

File Type Filename About Relevance for DNA Analysis Apps Notes
FASTQ *.fq.gz or *.fastq.qz

A FASTQ file is the raw data from the sequencing machine. All other file types can be generated from the FASTQ. A genome is most often provided as a pair (2 large FASTQ, one with R1 in the filename and the other containing R2) but Dante Labs may also provide a genome as 14 or more smaller FASTQ files.

When your FASTQ files are imported into your Sequencing.com account, our system automatically detects all of the FASTQ files that compose a genome and links them together as a single dataset. When you use a FASTQ file with an app, simply select one of your FASTQ files and the app will analyze the entire dataset.

Number of FASTQ files per genome: most often 2 but could be as many as 16.

Excellent to use with apps.

Our system will automatically generate all of the different genetic variation data needed by the app so instead of using an app multiple times to analyze each of your VCF, if you use your FASTQ then you just need to use the app once. 

Once imported into your Sequencing.com account, our system automatically identifies and links FASTQ files from the same genome together as a dataset.

If your genome is composed of 4 or more FASTQ files, your FASTQ files will only be linked together as a single genome if you use our automatic importer (by clicking the 'Dante Labs' button in the Upload Center).

BAM *.bam

A BAM file is generated by aligning the FASTQ files to the reference genome. BAM is still considered raw DNA data.

Number of BAM files per genome: 1

Excellent to use with apps.

Our system will automatically generate all of the different genetic variation data needed by the app.

If your FASTQ files are already stored in your Sequencing.com account then this file isn't necessary to store. But for those who love data, it's still nice to have.
SNP VCF *.snp.vcf.gz

VCF files are generated by analyzing the BAM file. The SNP VCF contains data on single nucleotide variations. The SNP VCF provided by Dante may be a regular VCF, which means it does not contain data on homozygous reference calls (SNPs that have the same result as the reference genome). This is done to provide a smaller, more manageable file.

Dante Labs may also provide a genome VCF, which will contain data for all calls, even those that are homozygous reference as well as those that are low quality. Genome VCFs are usually around 5-10 GB with gz compression, while regulr VCFs are considerably smaller in file size. A regular VCF and a genome VCF will both have a filename that ends in .snp.vcf.gz so you cannot use the filename to differentiate.

Number of SNP VCF files per genome: 1

The best VCF file to use with apps.

If you don't have a FASTQ or BAM, use this file with apps.

Most DNA analysis apps primarily analyze SNP data, which is the type of data provided by this file.

 
INDEL VCF *.indel.vcf.gz

Contains data on insertion and deletion variations.

Number of INDEL VCF files per genome: 1

Usually not the best file to use with apps.

If you don't have access to your FASTQ or BAM and want to view the insertions and deletions in your genome analyzed, this file can be used with Genome Explorer.

 
CNV VCF *.cnv.vcf.gz

Contains data on copy number variations.

Number of CNV VCF files per genome: 1

Do not use with apps except for Genome Explorer.

Genome Explorer can be used to view and search the copy number variation data contained in this file.

 
SV VCF *.sv.vcf.gz

Contains data on structural variations such as duplications and rearrangements. This file also contains data on very large insertions and deletions.

Number of SV VCF files per genome: 1

Do not use with apps except for Genome Explorer.

Genome Explorer can be used to view and search the structural variation data contained in this file.

 
MITO VCF *.mito.vcf.gz

Contains data on mitochondrial heteroplasmy. This data file may only be provided under special circumstances, such as if it is specially ordered.

Number of MITO VCF files per genome: 1

Do not use with apps except for Genome Explorer.

Genome Explorer can be used to view and search the mitochondrial heteroplasmy data contained in this file.

If you weren't provided with a mito.vcf.gz file, our One Genome Technology can generate this file from either your FASTQ or BAM.
BAI *.bam.bai Contains index data. Incompatible with all apps. This file does not provide any relevant data beyond what is already provided by the BAM or FASTQ. While some third-party software may use BAI files, Sequencing.com does not.
TBI *.vcf.gz.tbi Contains index data. Incompatible with all apps. These files do not provide any relevant data beyond what is already provided by the BAM or FASTQ. While some third-party software may use TBI files, Sequencing.com does not.
PDF *.pdf Contains genetic reports. Incompatible with apps (as these are analyzed reports). These files can be stored in your account, securely shared with others and downloaded from your account whenever needed.

 

Dante Labs is a registered trademark of Dante Labs, Inc. The use of the name and logo are for compatibility information only and does not imply approval or endorsement of Sequencing.com by Dante Labs, Inc.

Categories:
Information and help guide for how to Upload 23andMe Data to Sequencing.com including automatically download, upload and import 23andMe data using our automatic data importer that is private, secure, confidential and encrypted. Quick, easy to use and free, the importer automatically obtains and imports DNA data from 23andMe.
September 18, 2018
If the recent 23andMe API changes may impact your DNA app and you are scared of losing access to genetic data, Sequencing.com might be able to help. While we can not provide you the same third-party apps and results, we can offer you a DNA app store you can turn to to analyze your DNA with other great third party tools.
What Is Rare Disease Day
February 28, 2019
Rare Disease Day is a day of observance to raise awareness for rare diseases. It was created in 2008 by the European Organization for Rare Diseases with the goal of improving access to treatment and increasing representation for those affected by rare diseases.
Sequencing.com Launches World's First App Store for DNA | Genome Apps
September 21, 2016
Following years of research and development, Sequencing.com has today announced its global launch to the public.