**Abstract**

Microbial single-cell genomics represents an innovative approach to study microbial diversity and symbiosis. It allows us to recover genomes of microbes possessing specific features of our interest, or detect relationships between microbes found in close proximity to each other (one microbe inside of the other or microbes attached to each other). It can be used for linking phages with their bacterial hosts in different kinds of environmental samples, which often contain an enormous diversity of yet uncultured bacterial species and novel bacteriophages. In the typical microbial single-cell genomics workflow, fluorescence-activated cell sorting (FACS) is used to collect bacterial cells of interest, based on their cell size, internal granularity, or fluorescence. Femtograms of DNA from each sorted particle are then amplified up to the quantities required by the standard sequencing library preparation kits. Single-cell assemblies then reveal presence of phages in sorted bacterial cells. In case of highly abundant viral species, single-cell genomics can be coupled with metagenomics (shotgun sequencing of the total microbial community), which can provide insights into the bacteria-bacteriophage population fluctuations in time or space. In this chapter, we explain the details of uncovering relationships between bacteriophages and their hosts coming from so-called viral or bacterial dark matter.

**Keywords:** microbial single-cell genomics, single-amplified genomes, fluorescence activated cell sorting, bacteriophages, microbial dark matter

### **1. Introduction**

Bacteriophages (the viruses of bacteria) influence biogeochemistry across all environments on Earth, and can also affect our health. They contribute to the bacterial evolution, impact ecosystems by killing their bacterial hosts, and have enormous industrial and pharmaceutical potential [1]. Thousands of novel phages are being discovered daily thanks to the recent advances in the sequence-based recovery of genomes of yet uncultured microbes in environmental samples [2]. In the typical workflow, the viral-like contigs are extracted from sequence assemblies of metagenomes, which contain DNA sequences from all microbes in a given sample, or from viromes, which are samples enriched for the particles smaller than 0.2 μm (**Figure 1**) [3]. The largest database of viral-like contigs derived from metagenomes, IMG/VR,

#### **Figure 1.** *Sequence-based recovery of microbial genomes.*

currently contains nearly 3 million viral-like contigs and this number is increasing exponentially [4]. It is important to say that also the relatively well-studied environments formed by bacterial groups with cultured representatives, such as the human gut, harbor thousands of yet undiscovered phages [5].

For a long time, our knowledge of phages in the human gut has been limited to phages with easily culturable hosts. Then, scientists started to compare metagenomic samples from healthy volunteers with samples from patients suffering from different diseases, and many of these diseases resulted to be associated with novel uncultured phages targeting unknown hosts [6]. This suggests that the phages have an enormous potential to influence human health indirectly, by shaping the bacterial composition of the human microbiome. Therefore, the idea of employing phages in clinical practice is attracting a lot of scientific attention. Phages isolated by culture methods in the laboratory can be used for elimination of multidrug resistant pathogenic bacteria affecting organs with low number of commensal bacteria, such as the lungs or skin [7]. In the case of more complex microbiomes, such as the human gut and fecal microbiota transplant (FMT), is applied to aim to change the whole gut microbiome composition [8]. Each preparation of fecal material from a healthy donor is free of common pathogens but harbors an unknown diversity of bacteria and phages. Few clinical experimental studies showed that 0.2 μm filtered FMT preparation (containing only phages) can have the same beneficial effects as the traditional unfiltered FMT preparation containing bacterial cells, which suggests that phages play an important role in restoration of the healthy human gut, but the identity of their bacterial hosts remains unknown [9, 10].

It is intriguing that our knowledge of the phage biology is not catching up to speed with the sequence-based discoveries. For studying the biology of the novel phages, we first need to identify their bacterial hosts, which is traditionally done by plaque assays. Plaque assays are the most straightforward method for testing interaction of phages with culturable bacteria [11]. Bacterial culture is mixed with phage particles in agarose and distributed evenly on a standard agar plate, and after incubation, zones of clearing (plaques) appear in the bacterial lawn on the agarose overlay. Nevertheless, it is widely known that the plaque assays do not capture all viruses able

*Single-Cell Genomics for Uncovering Relationships between Bacteriophages and Their Hosts DOI: http://dx.doi.org/10.5772/intechopen.108118*

to infect the given bacterial strain [12]. The most important drawback of the plaque assays is that they cannot be used for uncultured bacteria, since in many environments, as much as 99.9% of bacterial species do not have any previously cultured representatives [13]. Even if a species of our interest has several easily culturable representatives, which is typical for the human gut bacteria, isolating new strains of the same species can be complicated by bacterial community complexity and strainspecific culture media requirements. Consequently, the culture-based approaches for studying phage-bacterial interactions cannot keep up with the rapidly increasing number of novel phages discovered by sequence-based methods.

#### **2. Methods for linking bacteriophages with their hosts**

#### **2.1 Computational methods**

Computational biologists are currently trying to develop highly efficient methods for linking millions of recently discovered viral-like contigs with their bacterial hosts avoiding the need to culture them in the laboratory. The first step is the recovery of high-quality host genomes from metagenomes. If a microbial community is simple enough, it is possible to recover nearly full genomes of bacteria by assembling sequencing reads into longer contigs [14]. However, in environments with a more complex species composition, binning algorithms must be applied to organize the contigs into larger sets, which results in metagenome-assembled genomes − MAGs (**Figure 1**) [15]. Nowadays, articles published in high impact journals report hundreds of thousands of new MAGs at once, providing an enormous source of reference genomes of uncultured microbes for the whole scientific community [16–18]. There are initiatives for normalized taxonomic classification of these recently discovered bacterial genomes, for example, the genome taxonomy database project GTDB, which allows an objective assessment of novelty of a MAG based on a large set of single-copy marker genes [19]. Sequence databases currently contain thousands of uncultured bacterial species, and each species surely consists of thousands of strains, most of them are not yet been discovered. Each of these strains can be infected by several bacteriophages, which are also yet to be discovered.

The genomes of phages or their hosts contain genetic features revealing links between them. These signatures are acquired during the phage infection, which can occur in different ways (**Figure 2**). Phages use bacterial cell machinery for forming virions (viral particles) and releasing them from the cell. This can be done without destroying the host cells (chronic infections), or through host cell lysis (lytic infection). Chronic infections can impact the cell growth, but aggressive lytic infections can reduce the host population significantly. Some phages can integrate into the host genome and replicate along with the host chromosome as prophages without producing any virions, which is called lysogenic cycle. In general, phages can switch between lysogenic and lytic cycles depending on ecological factors influencing growth of their bacterial host (**Figure 2**) [20].

Integrated prophages can be easily detected in about one third of the genomes of uncultured bacteria, nonetheless, hosts of lytic phages cannot be identified by this method [21]. Lytic phages can be linked to their uncultured hosts by several genetic signatures found in the bacterial or phage genomes. The first type of these genetic signatures is the clustered regularly interspaced short palindromic repeats (CRISPR) arrays detectable in the bacterial genomes, which represent an evidence

#### **Figure 2.** *Different types of phage infection.*

of the previous phage infection. However, the percentage of bacterial genomes harboring a CRISPR array is quite low; the estimates vary from 30% in species from the human gut microbiome 5–10% of all bacterial lineages from different environments [22]. The second type of genetic signatures providing phages-hosts links is the genes acquired by the phages from bacteria during the past infections, for example, tRNA sequences [23], sequences of ribosomal proteins [24], or so-called auxiliary metabolic genes [25]. Nevertheless, these sequences do not provide strain-level host resolution and are present only in a small fraction of all viruses [4, 26]. As a consequence of this, the computational host-range predictions failed to reveal hosts for 85% of the millions of bacteriophages in the IMG/VR database [26]. In addition, these computational host-range predictions have another big disadvantage: they show links acquired during the past infections; however, we do not know how these phages adapted to constantly evolving bacterial hosts and changing environmental conditions [27, 28].
