Whole Genome Shotgun Sequencing
By Dr. Brandon Colby MD, a physician-expert in the fields of Genomics and Personalized Preventive Medicine.
For most of recorded history, humans didn’t understand how genetics worked or what DNA was. But in recent decades, the field of genomics has made incredible advancements and created new research techniques to research and understand DNA — among them, DNA sequencing.
As you may be able to imagine, DNA sequencing today is quite different from what it was just a few decades ago. Shotgun sequencing was one of the earliest methods used to sequence full genomes — let’s learn more about it this fascinating technology that's changed the world!
What Does Shotgun Sequencing Mean?
As its name suggests, shotgun sequencing is a method used for DNA sequencing. The term “shotgun sequencing” originated as a result of the similarity between the random, explosive firing of a shotgun and the random manner in which this sequencing approach breaks DNA into smaller fragments before sequencing its base pairs.1
Shotgun sequencing has been used for a relatively long time — it was first used in 1981 to sequence the genome of the cauliflower mosaic virus, which is a type of virus that can infect plants. Later on, many different laboratories collaborated to sequence the genomes of different microbes, such as Haemophilus influenzae, Saccharomyces cerevisiae, and Escherichia coli K12.
Sequencing microbial genomes may seem simple now, but the mere fact that scientists were able to sequence entire genomes decades ago was groundbreaking. At the time, even simple genetic data from viruses, bacteria, plasmids, genets, and plant genomes was extremely challenging to sequence, let alone complex genomes such as human DNA.
However, original sequencing methods such as shotgun sequencing enabled researchers to undertake large-scale sequencing projects long before modern sequencing technologies became available. Traditional shotgun sequencing is based on Sanger sequencing, which was one of the first sequencing methods ever developed. Shotgun sequencing and chromosome walking are two methods used to fragment DNA for sequencing.
How Whole Genome Shotgun Sequencing Works
In shotgun sequencing, the DNA that is being analyzed is randomly broken up into many smaller pieces, which are then sequenced separately from one another using the chain termination method. Several rounds of shotgun sequencing are performed on the same DNA sample in order to obtain multiple short reads that cover the entirety of the target genome.
Once this is done, sequence reads are analyzed using bioinformatics programs that look for portions of the genome sequence that match identically — these fragments are called contigs. These repetitive sequences are used to overlap the ends of numerous reads and sequence assembly software is used to reassemble the DNA fragments in the correct order.
As DNA sequencing became more refined and researchers attempted to sequence large genomes, a new technique called double-barrelled shotgun or pairwise-end sequencing became more popular.
In double-barrelled shotgun sequencing, both ends of a single DNA fragment were sequenced in order to make the process of reassembling the original target genome much faster, while also allowing for longer read lengths.This technique is also known as paired-end sequencing, and it can also be combined with traditional shotgun sequencing to fill in sequence information gaps.
Shotgun sequencing has certain disadvantages.2 It requires significant software and computing resources, and it’s prone to ambiguities and sequencing errors — especially when it’s used to sequence complex genomes without a reference sequence. In complex genomes, small fragments of DNA can contain repetitive sequences that contain base pairs and nucleotides in the same order as other fragments, thus increasing the risk of genome assembly errors.
Another variation of this method is hierarchical shotgun sequencing, which can be used to sequence larger genomes. In this method, the genome is initially broken down into longer fragments and cloned using P1-derived artificial chromosomes (PAC) or bacterial artificial chromosome (BAC contigs) to create a preliminary map. Polymerase chain reaction (PCR) can also be used to amplify small genomic regions.
Then, the DNA fragments are sheared into smaller pieces and shotgun sequencing is carried out to complete the sequencing process. Because a low-resolution physical map has to be created first, hierarchical shotgun sequencing is slower than traditional shotgun sequencing; however, it requires less computational resources and relies less on software algorithms.
How the Shotgun Approach Differs from Whole Genome Sequencing
Shotgun sequencing was one of the earliest sequencing techniques that were developed to determine the exact order of nucleic acids and base pairs in the DNA of different organisms. However, recent years have seen the development of many new and innovative sequencing technologies like whole genome sequencing that have cut sequencing costs while making this process more easily accessible.
So, does shotgun sequencing still have a place in modern genomics? Let's find out.
Is Shotgun Sequencing Still Used Today?
Whole genome shotgun was used by Craig Venter, founder of Celera Genomics, to attempt to sequence the human genome.3 Celera Genomics used information derived from the Human Genome Project combined with their own sequence data and analyzed it using hundreds of sequencer machines to reassemble the human genome.
The resulting information was significantly similar to the results of the Human Genome Project. Venter, along with renowned geneticist Francis Collins, and then-President of the United States Bill Clinton, announced the completion of their genomic drafts in June 2000.4
Shotgun sequencing can be used to improve the accuracy of pre-existing sequence data, such as the human reference genome. It can also be used to correct errors or fill in data gaps left by other DNA sequencing methods.
Newer sequencing technologies, commonly known as next-generation sequencing (NGS) or high-throughput sequencing have become more widely used in recent years. These techniques have become increasingly automatized, which makes them more affordable and less time consuming than traditional sequencing methods. However, shotgun sequencing is still used alongside next-generation sequencing to perform WGS. Shotgun sequencing is more commonly used for short reads rather than long reads.5
Newer techniques are significantly more effective than Sanger methods since they provide better coverage in a shorter period of time. However, traditional sequencing methods are still commonly used to sequence-specific DNA regions or genes, such as the PROC gene.
Are All Genome Sequencing Technologies The Same?
Short answer: no. Over the years, bioinformatics companies and researchers have worked to develop many different sequencing technologies — and numerous others are being developed even as you read this.
How Illumina Sequencing Differs from Nanopore Sequencing
Illumina dye sequencing uses sequencing by synthesis (SBS) which is based on reversible dye-terminators. In this method, DNA molecules are attached to a slide and amplified to form DNA clusters. Then, reversible terminator bases (RT-bases) are added to the clusters and subsequently washed away, allowing nucleotides to be identified by a special camera.
This process is repeated several times, and it allows for very large numbers of DNA colonies to be sequenced. With modern technologies, Illumina dye sequencing can resequence a human genome at 30x coverage in approximately one day.6
Nanopore sequencing, on the other hand, utilizes electric currents to sequence DNA. The nanopore is a minuscule biopore in which a single DNA strand is placed. The DNA changes its current since each nucleotide blocks the current for a different period of time, which allows for the genomic DNA sequence to be determined.
Nanopore sequencing can provide long reads quickly, can be performed in real-time, and it doesn’t require a lengthy preparation of the sample to provide reliable results.7
How Illumina Sequencing Differs from BGI's MGI Sequencing
BGI’s MGI sequencing uses a technology called DNBSEQTM. DNBs or DNA nanoballs are loaded onto a DNA microarray or chip. A primer and sequencing reagents are pumped onto the sample, and images are taken to identify nucleotides. These images are then converted by MGI’s software in order to sequence the DNA before making annotations. MGI sequencing has a low amplification error rate and high accuracy.8
BGI-MGI sequencers can produce high-quality sequencing data at lower prices than Illumina sequencers.9 In fact, numerous studies support that MGI sequencers are an excellent, affordable alternative to traditional Illumina sequencers.
The Difference Between Short Read and Long Read Sequencing
As their names suggest, the main difference between short-read and long-read sequencing is simply the length of DNA reads that they can perform at once. This may not sound like a significant difference, but considering that the human genome contains 3 million base pairs, the amount of reads that you need to sequence it can have a huge impact on the amount of time WGS takes and how much it costs.
Short-read sequencing, also known as second-generation sequencing, can produce 1 million to 43 million short DNA reads that contain 50 to 400 bases each. Long-read sequencing or third-generation sequencing, on the other hand, can routinely produce reads of more than 10 kb.10
Many of the most popular NGS platforms that are currently in use deliver short-length reads, which can limit the identification of genetic variants that can play a significant role in human health.
Emerging next-generation sequencing platforms are currently under development and expect to provide long-read sequencing, which has been used to research the origin of specific genetic disorders. These newer methods could lower sequencing costs even further while reducing the time it takes to sequence a genome.
Additionally, long-read sequencing methods require minimal sample processing in comparison to older methods. These sequencers are also much smaller than even before — some newer sequencers are the size of a USB flash drive. This could improve portability and make it possible for sequencers to be easily transported to different healthcare and research institutions, and even underserved communities in many parts of the world.
What Does Sequencing Depth Mean?
The best way to overcome possible errors in shotgun sequencing is to use a reference genome and to ensure sufficient coverage. Coverage, also known as sequencing depth, refers to the number of reads that contain a specific nucleotide aligned to a given locus in a reassembled DNA sequence.
The breadth of coverage in a specific sequencing project is defined as the percentage of target bases that have been sequenced a specific number of times. For example, 30x coverage means that the target genome will be sequenced approximately 30 times, whereas 0.4x coverage means that it will only be sequenced 0.4 times.
A higher coverage helps ensure better sequencing accuracy. It also ensures that variants, such as single-nucleotide polymorphisms (SNPs), duplications, copy number variants (CNVs) are properly identified.
Human DNA is approximately 99.9 percent identical for all of us — however, that remaining 0.1 percent is what defines each individual and ensures that our DNA sequences aren’t the same. Thus, it’s incredibly important to guarantee a high degree of accuracy when sequencing a human genome.
What Technology Does Sequencing.com Use?
At Sequencing.com, we use MGI DNBSEQ-T7 genome sequencers to provide 150 bp paired-end read 30x whole genome sequencing. We utilize our own bioinformatics pipeline to analyze raw genetic data, along with BWA-MEM2 and SamTools integrated with proprietary software.
The Cost of Whole Genome Sequencing
Initially, sequencing an entire human genome was a massive enterprise that could cost millions of dollars. But thanks to the advent of modern sequencing methods, whole genome sequencing cost can be just a few hundred dollars, and the sample can be taken from the comfort of your own home.
Different companies charge different prices, of course, but it’s important to keep in mind that not all genetic testing is WGS. In many cases, online providers offer DNA testing that doesn’t sequence your genome and only analyzes a small fraction of your DNA. At Sequencing.com, we offer clinical-grade 30x Ultimate Genome Sequencing for just $399.
Difference In Cost Between Shotgun Sequencing and Current WGS
High-throughput sequencing methods have made it possible to parallelize different stages of the sequencing process and automatize them; this has led to a significant reduction of sequencing cost beyond what was thought possible in the past.
The low costs and wide availability of current WGS methods have made DNA sequencing available to millions of people around the world. Large-scale projects, such as the 1000 Genomes Project, have taken advantage of newer sequencing methods to produce high-quality results quickly. As third-generation sequencing methods are improved upon and become available, it’s likely that sequencing costs will become even more affordable than before.
About The Author
Dr. Brandon Colby MD is a US physician specializing in the personalized prevention of disease through the use of genomic technologies. He's an expert in genetic testing, genetic analysis, and precision medicine. Dr. Colby is also the Founder of Sequencing.com and the author of Outsmart Your Genes.
Dr. Colby holds an MD from the Mount Sinai School of Medicine, an MBA from Stanford University's Graduate School of Business, and a degree in Genetics with Honors from the University of Michigan. He is an Affiliate Specialist of the American College of Medical Genetics and Genomics (ACMG), an Associate of the American College of Preventive Medicine (ACPM), and a member of the National Society of Genetic Counselors (NSGC).
References and Sources
- National Human Genome Research Institute. (n.d.) Shotgun Sequencing. Retrieved December 30, 2020.
YourGenome. (2014). What is shotgun sequencing? Retrieved January 2, 2021.
Chial, H. (2008). DNA sequencing technologies key to the Human Genome Project. Nature Education 1(1):219 Retrieved January 3, 2021.
Brown TA. (2002). Chapter 6: Sequencing Genomes. Genomes. 2nd edition. Oxford: Wiley-Liss.
Healio Learn Genomics. (n.d.) Whole-Genome Sequencing Methods. Retrieved January 5, 2021.
Mardis, ER. (2008). Next-generation DNA sequencing methods. Annu Rev Genom Hum Genet. 9:387–402.
Liu, L., et al. (2012). Comparison of Next-Generation Sequencing Systems. BioMed Research International, vol. 2012, Article ID 251364, 11 pages.
MGI Tech Co., Ltd. (n.d.) MGI Sequencers. Retrieved from mgi-tech.com on January 8, 2021.
Jeon, S. A., et al. (2019). Comparison of the MGISEQ-2000 and Illumina HiSeq 4000 sequencing platforms for RNA sequencing. Genomics & Informatics, 17(3), e32.
Amarasinghe, S.L., et al. (2020). Opportunities and challenges in long-read sequencing data analysis. Genome Biol 21:30.
Storage is free, unlimited and private. You own your genetic data - we keep it safe and never sell or share it.
If you have small files or if you already have your files stored with a cloud provider:
- Use the Upload Center
- Upload files smaller than 100 MB
- Import files of any size from AWS, Google Cloud, Google Drive, Azure, Dropbox, etc.
If you have many files or if your files are large and stored on your computer or server:
Don't have genetic data? No worries! You can still use Sequencing.com to learn and discover.
Use free sample data to experience our apps and discover just how useful genetic data can be.
Get genetic testing
Our list of preferred providers will help you decide which genetic testing service to use. Each preferred provider is pre-screened by Sequencing.com and verified to provide highly reputable genetic testing.