Whole genome sequencing is a genetic testing technology that obtains comprehensive data on every gene and all of your chromosomes in your DNA. While other DNA tests obtain data on either one gene (PCR) or spots within your DNA (genotyping microarrays), whole genome sequencing is different.
WGS is the acronym for whole genome sequencing. It’s a commonly used abbreviation when discussing whole genome sequencing. They are used interchangeably as they mean the same thing.
WGS obtains data on every chromosomal coordinate from the beginning of chromosome 1 to the end of chromosome 1. Sequencing then obtains data on all of chromosome 2. Then chromosome 3 and so on. It also includes full sequencing of the sex chromosomes (chromosome X and, in males, chromosome Y) as well as the mitochondrial chromosome.
WGS is different from the types of DNA tests used by laboratories such as MyHeritage, 23andMe, and FamilyTreeDNA. Those companies use a type of genetic test known as genotyping using DNA microarrays. Genotyping obtains data on ‘spots’ throughout the genome. For example, 23andMe’s DNA test obtains data on around 700,000 data points throughout a person’s genome while WGS obtained data on around 3 billion data points (100% of the genome).
This means that the types of tests performed by many of the direct-to-consumer genetic testing companies actually test less than 0.1% of the genome!
You may have seen WGS appear with a number and an ‘x’ before it, such as 0.4x WGS or 30x WGS. Those numbers refer to the ‘depth’ of sequencing. 30x WGS means that the genome has been sequenced 30 times. Another way to state is this that the genome was sequenced to a depth of 30.
Why would someone need to sequence their genome more than once? The ‘30x’ actually means that the laboratory that sequenced the genome repeated the sequencing 30 times. The data from each of the 30 sequences are combined together and analyzed by a powerful computer.
While sequencing a genome one time may include errors in the data, having access to multiple copies of the same sequence enables the computer to quickly exclude any errors. When there are 30 copies of the sequence, which is 30x WGS, the computer software is able to generate a single highly accurate genome sequence.
As an example, imagine if you were to copy the dictionary from the first page to the last page. No matter how careful a person is, everyone would end up making some errors. Now imagine if you copied the dictionary another 29 times. You’d definitely have a very sore hand, and you’d have 30 copies that each have some errors.
In copy 1, an error may be made on page 488 but in the other 29 copies, page 488 was copied perfectly. If a computer had access to all 30 copies that you wrote, it would be able to easily identify and exclude the error on page 488 since it has 29 other copies to compare it to.
Now imagine copying six billion letters. This is what a sequencing machine does because each person’s genome contains six billion letters! While an advanced sequencing technology may be 99.9998% accurate, this means that an error will occur 0.0001% of the time.
An error rate of 0.0001% while sequencing six billion letters in a genome means that there will be 6,000 errors every time a genome is sequenced! But if a genome is sequenced 30 separate times, a computer program is able to easily identify those errors and create a single, highly accurate sequence.
This is why sequencing a genome 30 times is so important. It means the sequence that you’ll receive is highly accurate!
If 30x means sequencing the same genome 30 times, what does 0.4x mean? We answer this question in our other blog post: What is 0.4x and 30x Whole Genome Sequencing?
During WGS, a DNA sample is collected from a person. The sample contains chromosomal DNA as well as DNA contained in the mitochondria. The DNA is then broken into smaller fragments, and making use of new technologies that allow for quick sequencing, the combination of nucleotides (A, T, C, and G bases) are analyzed.
Once the DNA is sequenced, it is then put back together in the correct order using bioinformatics approaches. This step is referred to as “assembly.” When the genome assembly is completed, researchers, physicians, and citizen scientists can conduct further analysis such as determining whether genetic variations are associated with health conditions.
There are several methods of DNA sequencing available. Whole genome sequencing and whole exome sequencing are the two methods most used in healthcare and research to identify genetic variations. However, whole genome sequencing should not be confused with whole exome sequencing because whole exome sequencing only analyzes less than 1% of the genome.
Both approaches can use third-generation or second-generation sequencing technologies. SMRT sequencing, Illumina dye sequencing, and pyrosequencing are examples of these technologies. Other technologies developed recently use novel approaches. For example, in nanopore sequencing, a strand of DNA is passed through a protein nanopore offering real-time and direct DNA sequencing.
It’s huge! A single genome contains around six billion letters.
When your genome is sequenced, it is stored as data files such as fastq, bam and vcf. The file containing a single person’s whole genome can be more than 300 GB in size! That means one person’s genome is larger than the entire hard drive of many computers!
This very large size usually means it is difficult to store your genome on your computer and instead most people need to store their genome in the cloud.
Sequencing.com is designed to provide just this: safe, confidential storage of your whole genome so you and your healthcare providers can always securely access and obtain value from it.
The very first human genome sequencing was completed in 2003 as part of the international scientific research Human Genome Project. It cost $2.7 billion and took 15 years.
Today, it can be accomplished in as little as a few weeks and is increasingly priced at less than $1,000. As a result of this incredible price reduction, whole genome sequencing is becoming an affordable option for many people.
This is important because whole genome sequencing is our most powerful tool for testing for genetic disorders such as mutations that drive cancer development, heart disease, medication reactions and even tracking infectious disease outbreaks.
As a result of this, the Food and Drug Administration (FDA) is laying the foundation for the use of whole-genome sequencing in public health by identifying the genomic sequence of pathogens associate in foodborne illness outbreaks.
Whole genome sequencing is also becoming particularly useful in creating personalized treatment plans for patients with cancer and some genetic conditions. It is also is starting to become the norm for citizen scientists and other consumers looking to learn a little bit more about their health.
In fact, using Sequencing.com, consumers who have had whole genome sequencing can upload their genome sequencing data, store it for free and then analyze their entire genome with apps from our DNA App Store.
Our Ultimate Genome Sequencing service provides affordable, clinical-grade whole genome sequencing. You can purchase it directly from our website.
By using our apps with a whole genome sequence, consumers can learn about their health, nutrition, fitness, beauty, and ancestry in the most accurate way possible since all of their DNA is available for analysis.