Bacterial Species In the Age of Next-Generation Sequencing
Rapid expansion of genomic information from next-generation sequencing (NGS) is challenging how the lines are drawn between bacterial species. Unlike eukaryotes, bacteria often fail to fit neatly into a universal concept of species. Methodological and cultivation-related barriers have limited the ability to measure the diversity within and between bacterial species. Advancements in the technology and accessibility of sequencing methods have surmounted many of these challenges and exponentially increased the number of sequenced bacterial genomes. Researchers are now able to observe bacterial genetic diversity in greater resolution.
Recently, the combination of NGS and bioinformatic tools to compare whole genomes has allowed researchers to compare and distinguish isolates with greater resolution. NGS methods allow for high-throughput sequencing, yielding faster sequencing of more organisms in a shorter amount of time than traditional sequencing techniques. The influx of genomic data has led to restructuring of defined taxonomic relationships, including the proposal of the three domains, Archaea, Bacteria and Eukarya. With declining costs of whole genome sequencing, and the number of publicly available genomes increasing, researchers are using whole genome average nucleotide identity to delineate species. Bioinformaticians can compare genomes to determine phylogenetic relationships and are using these methods in the field of systematic biology, which aims to name and describe organisms and their evolutionary relationships. Despite the increase in data and methods to measure diversity and evolutionary relationships, scientists are still debating how and where to separate bacteria into species. Some have proposed a universal boundary of genomic diversity. For example, Ciufo and colleagues suggested a threshold of 96% average nucleotide identity (ANI) of 2 aligned whole genome sequences as an operational species definition. Others have reexamined individual defined species and proposed subdividing the organism into multiple species.
These techniques have aided in revising the taxonomy of some pathogens. Over the past few decades, molecular biology techniques have prompted the reclassification of clinically relevant species, such as Borrelia burgdorferi. Burgdorfer et al. discovered the etiological agent of Lyme in 1982, when Lyme patients’ sera were shown to contain antibodies to a spirochete found in ticks. Similar spirochetes were found in patients with acute Lyme. Johnson et al. classified this organism as B. burgdorferi in 1984, after comparing relatedness between these spirochetes and other Borrelia spirochetes, using DNA-DNA hybridization, and finding the Lyme-associated strains to be more similar to each other than to other Borrelia species. In 1992, Branton and colleagues compared 48 isolates described as B. burgdorferi, using DNA-DNA hybridization and other molecular methods, and suggested that there was enough diversity among these strains to classify them as 3 “genospecies,” proposed taxonomic groups based on DNA relatedness. They named the genospecies that most closely resembled reference strains B. burgdorferi sensu stricto and described the 3 genospecies, all formerly known as B. burgdorferi, B. burgdorferi sensu lato. The terms "sensu stricto" and "sensu lato" are Latin for “in the narrow sense” and “in the broad sense,” respectively, and are used to differentiate between new and legacy definitions of species names. Preserving an aspect of the former classification maintains interpretability, especially in clinical settings where it may be difficult to change established names. However, differences in the clinical manifestations of Lyme disease associated with these 3 genospecies demonstrate the significance of accurately resolving pathogenic bacteria. Now, NGS is revealing additional novel species of B. burgdorferi sensu lato.
The pathogen Bacillus cereus has also been reclassified into multiple species. Examples of these species include the foodborne pathogen B. cereus sensu stricto and the bioterrorism agent B. anthracis. NGS allowed high-throughput delineation of new B. cereus sensu lato species with whole-genome ANI methods and a rapid increase in the number of proposed B. cereus sensu lato species, with 12 novel species proposed between 2013 and 2017. The studies proposing these novel species used different ANI thresholds, leading to ambiguity and the possibility of isolates belonging to multiple species. Carroll and colleagues observed distributions of whole-genome ANIs and described “natural gaps,” where genomes clustered within 92.5% ANI and yielded species that did not have strains overlapping into multiple groups. They therefore proposed a 92.5% ANI threshold for species within B. cereus sensu lato. Because these species are found in different environments and present differently in patients, correct delimitation and identification is important for correct diagnosis and treatment.
Gardnerella researchers have proposed several methods to resolve G. vaginalis into multiple groups, including sequence variants of specific genes, such as cpn60 or the 16S rRNA gene, and phylogenetic clades. Vaneechoutte and colleagues used the 96% ANI threshold recommended by Ciufo and colleagues to delineate genomes of G. vaginalis into 13 species. Another team of researchers compared different bioinformatic techniques, including amino acid similarity and found that the Gardnerella genomes they assessed could be divided into 8-14 species, the 14th being an additional genome not assessed by Vaneechoutte and colleagues. This additional species underscores the importance of adequate sampling in order to fully determine the breadth of diversity of a taxon.
Recent studies suggest that variants of Gardnerella, including these newly defined species and amplicon sequence variants of the 16S rRNA gene, may contribute to inconsistencies in the association of Gardnerella with health outcomes. If some Gardnerella species are harmless residents of the vaginal microbiome, whereas others are risk factors for bacterial vaginosis and preterm birth, the previous classification of all Gardnerella as G. vaginalis limits the ability for clinicians to differentiate when Gardnerella presence indicates a health risk and determine when intervention is appropriate. Additionally, if some species are resistant to certain treatments, it is important that clinicians can correctly screen for and identify different species to prescribe effective treatment. Therefore, correct delimitation and identification of pathogenic species is important for correct diagnosis and treatment.
New Tools to Redefine Species
Beginning in the late 19th century, scientists differentiated bacteria based on morphology, growth requirements and pathogenic potential. Morphology was still commonly used until the middle of the 20th century, as methods evolved to include techniques such as DNA-DNA hybridization, biochemistry and chemotaxonomy. Researchers proposed criteria to demarcate species with these methods, but heavily relied on previously described species groups to determine these thresholds. For example, researchers defined a threshold of 70% DNA-DNA hybridization after Johnson measured the hybridization between and within species and described a “distinct break” at 70% homology in the 1970s. Digital DNA-DNA hybridization, an in silico method introduced in 2010 to estimate DNA-DNA hybridization for genome comparison, also uses this 70% threshold.Recently, the combination of NGS and bioinformatic tools to compare whole genomes has allowed researchers to compare and distinguish isolates with greater resolution. NGS methods allow for high-throughput sequencing, yielding faster sequencing of more organisms in a shorter amount of time than traditional sequencing techniques. The influx of genomic data has led to restructuring of defined taxonomic relationships, including the proposal of the three domains, Archaea, Bacteria and Eukarya. With declining costs of whole genome sequencing, and the number of publicly available genomes increasing, researchers are using whole genome average nucleotide identity to delineate species. Bioinformaticians can compare genomes to determine phylogenetic relationships and are using these methods in the field of systematic biology, which aims to name and describe organisms and their evolutionary relationships. Despite the increase in data and methods to measure diversity and evolutionary relationships, scientists are still debating how and where to separate bacteria into species. Some have proposed a universal boundary of genomic diversity. For example, Ciufo and colleagues suggested a threshold of 96% average nucleotide identity (ANI) of 2 aligned whole genome sequences as an operational species definition. Others have reexamined individual defined species and proposed subdividing the organism into multiple species.
These techniques have aided in revising the taxonomy of some pathogens. Over the past few decades, molecular biology techniques have prompted the reclassification of clinically relevant species, such as Borrelia burgdorferi. Burgdorfer et al. discovered the etiological agent of Lyme in 1982, when Lyme patients’ sera were shown to contain antibodies to a spirochete found in ticks. Similar spirochetes were found in patients with acute Lyme. Johnson et al. classified this organism as B. burgdorferi in 1984, after comparing relatedness between these spirochetes and other Borrelia spirochetes, using DNA-DNA hybridization, and finding the Lyme-associated strains to be more similar to each other than to other Borrelia species. In 1992, Branton and colleagues compared 48 isolates described as B. burgdorferi, using DNA-DNA hybridization and other molecular methods, and suggested that there was enough diversity among these strains to classify them as 3 “genospecies,” proposed taxonomic groups based on DNA relatedness. They named the genospecies that most closely resembled reference strains B. burgdorferi sensu stricto and described the 3 genospecies, all formerly known as B. burgdorferi, B. burgdorferi sensu lato. The terms "sensu stricto" and "sensu lato" are Latin for “in the narrow sense” and “in the broad sense,” respectively, and are used to differentiate between new and legacy definitions of species names. Preserving an aspect of the former classification maintains interpretability, especially in clinical settings where it may be difficult to change established names. However, differences in the clinical manifestations of Lyme disease associated with these 3 genospecies demonstrate the significance of accurately resolving pathogenic bacteria. Now, NGS is revealing additional novel species of B. burgdorferi sensu lato.
The pathogen Bacillus cereus has also been reclassified into multiple species. Examples of these species include the foodborne pathogen B. cereus sensu stricto and the bioterrorism agent B. anthracis. NGS allowed high-throughput delineation of new B. cereus sensu lato species with whole-genome ANI methods and a rapid increase in the number of proposed B. cereus sensu lato species, with 12 novel species proposed between 2013 and 2017. The studies proposing these novel species used different ANI thresholds, leading to ambiguity and the possibility of isolates belonging to multiple species. Carroll and colleagues observed distributions of whole-genome ANIs and described “natural gaps,” where genomes clustered within 92.5% ANI and yielded species that did not have strains overlapping into multiple groups. They therefore proposed a 92.5% ANI threshold for species within B. cereus sensu lato. Because these species are found in different environments and present differently in patients, correct delimitation and identification is important for correct diagnosis and treatment.
Clinical Importance of Accurate Species Delineation: The Case of Gardnerella vaginalis
Another species of clinical importance that scientists are reevaluating is Gardnerella vaginalis. Until recently, all bacteria in the genus Gardnerella were considered G. vaginalis, but recent examinations of genetic diversity have led researchers to divide G. vaginalis into multiple species. Gardnerella spp. are associated with adverse health events, like bacterial vaginosis and preterm birth, but are also found in healthy vaginal microbiomes. Leopold, and separately Gardner and Dukes, first described G. vaginalis in the 1950s, using traditional methods like microscopy and Gram staining. Leopold described prostatitis and cervicitis in patients from whom it was isolated, and Gardner and Dukes identified “clue cells” of small bacteria on vaginal epithelial cells by microscopy, indicating cases of vaginitis.Gardnerella researchers have proposed several methods to resolve G. vaginalis into multiple groups, including sequence variants of specific genes, such as cpn60 or the 16S rRNA gene, and phylogenetic clades. Vaneechoutte and colleagues used the 96% ANI threshold recommended by Ciufo and colleagues to delineate genomes of G. vaginalis into 13 species. Another team of researchers compared different bioinformatic techniques, including amino acid similarity and found that the Gardnerella genomes they assessed could be divided into 8-14 species, the 14th being an additional genome not assessed by Vaneechoutte and colleagues. This additional species underscores the importance of adequate sampling in order to fully determine the breadth of diversity of a taxon.
Recent studies suggest that variants of Gardnerella, including these newly defined species and amplicon sequence variants of the 16S rRNA gene, may contribute to inconsistencies in the association of Gardnerella with health outcomes. If some Gardnerella species are harmless residents of the vaginal microbiome, whereas others are risk factors for bacterial vaginosis and preterm birth, the previous classification of all Gardnerella as G. vaginalis limits the ability for clinicians to differentiate when Gardnerella presence indicates a health risk and determine when intervention is appropriate. Additionally, if some species are resistant to certain treatments, it is important that clinicians can correctly screen for and identify different species to prescribe effective treatment. Therefore, correct delimitation and identification of pathogenic species is important for correct diagnosis and treatment.