Says Rockefeller University researcher Erich Jarvis, who is also a Howard Hughes Medical Institute investigator, long-read sequencing has led to fewer gaps in genome assemblies. The biological benefits of this technology for his projects include a more accurate assessment of gene duplications and their orthology, and thus a greater understanding of gene family evolution. He points to work by Constantina Theofanopoulou, a Hunter College researcher and visiting professor at Rockefeller University. Long-read sequencing helped her parse the evolutionary history of the oxytocin and vasopressin ligand and receptor families. By studying synteny — long, conserved blocks near the genes of interest — she and her team located orthologous genes across species. “It is impossible to run solid long-range synteny analysis with short reads,” she says.
Long-read sequencing has had a big impact on the greater canid community, says Elaine Ostrander from NIH NHGRI, who runs a number of studies in the Dog Genome Project. This impact stems, for example, from the fact that multiple reference sequences are needed that represent different canids — wolves, coyotes and domestic dogs, among others. Given their quite dissimilar histories, different clades of domestic dogs must also be represented. Studying dogs with long reads sheds light on domestication and thus human migration, she says. Although such questions could be approached with assembled sequence from multiple types of canids from around the world and with alignment of that information to domestic dog sequences, says Ostrander, “that is intrinsically error prone when considering wild canids, or ancient canids, and does not accurately reflect history, particularly as it relates to the location and timing of domestication for many canids.”
Says Jarvis, long-read sequencing makes it possible to measure gene network interactions across chromosomes in ways not previously possible. These reads capture G+C-rich regions, which are mainly found in gene regulatory regions. That yields, he says, “a much more complete picture within and across species of the DNA promoter regions that regulate genes.” All of this also matters to the VGP, which Jarvis chairs. Chul Lee, a postdoctoral associate in the Jarvis lab, led the development of methods6 to quantify the difference long reads make. The use of long reads squelched thousands of errors from previous genome assemblies of a number of animal species because false gene gains and losses in short-read-based assemblies were corrected.
Isidro Cortes Ciriano and his cancer-genomics-focused team at European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) develop computational tools to, for example, assess mutation patterns and genome instability in cancer. Long-read sequencing, he says, delivers ways to study repetitive and complex genomic regions such as centromeric regions, long repeats and complex structural variants. With long reads generated with nanopore sequencing on ONT instruments, they can “resolve those complex genomic aberrations in cancer that are recalcitrant to Illumina sequencing,” he says.
Carolin Sauer, a postdoctoral fellow in the Cortes Ciriano lab, says that long reads are of particular interest to researchers working on cancers with copy number aberrations and unstable genomes, such as esophageal and ovarian cancers. Long-read approaches are generally better for detecting and characterizing the complex genome rearrangements and structural variation typical of many cancers.
Among the gnarly genome sections more readily tackled with long reads, says Patel, are the human genome’s many types of repetitive elements: short tandem repeats of a few hundred base pairs; Alu elements, which can run around 300 base pairs; LINE1 elements, which can be up to six kilobases long; segmentally duplicated regions hundreds of kilobases long; and the megabases of repeats within repeats such as centromeres and ribosomal DNA. These all differ in their mutational processes and regulatory roles.
Long-read sequencing has been “a huge deal” to him and his team, says Fergal Martin, who leads the EMBL-EBI’s eukaryotic annotation team. The vastly higher-quality sequence helps the team to tease out structures such as genes and repetitive sequence. And with long-read RNA sequencing, researchers can describe expressed genes and find gene structures. “So it’s a double win,” he says.
In her microbiome and metagenomics projects, Karoline Faust, a researcher at KU Leuven, works with organisms of known genome sequences and uses ONT’s MinION for “cheap in-house cross-contamination checks.” Right now, to confirm the bugs in the bioreactor are the ones the lab put there, the team needs to use 16S rRNA Sanger sequencing, but that doesn’t distinguish strains or identify fungi. “In my case, cheap and easy contamination checks” and organism identification are the greatest promise of long reads. Price and speed matter in such instances since spotting contamination quickly means one can quickly halt an expensive experiment.
In metagenomics, long reads “have not fully arrived and may still take time,” says University of California Davis researcher C. Titus Brown. That’s due to the challenges of DNA extraction for long molecules and because complex microbiomes cannot yet be sequenced at sufficient depth. Long-read sequencing successes in metagenomics, he says, mainly involve host-associated microbiomes, which are less complex and involve fewer microbial strains than microbiomes in marine environments, sediment and soil.
To tease out tough-to-find structures in the genome, long read-sequencing has been enormously helpful, say these EMBL-EBI researchers: Fergal Martin (left) leads the eukaryotic annotation team; Carolin Sauer (right) is a postdoctoral fellow in the cancer-genomics-focused lab of Isidro Cortes Ciriano (middle).
Credit: J. Dowling, EMBL-EBI