**2.2 Experimental methods**

Fortunately, a certain portion of cells collected directly from an environment contain viruses in the cell or attached to the cell, which provides evidence for their relationship. There are several experimental approaches for observing these interactions in their natural environment [29]. The first option is the digital PCR applied to emulsion droplets containing bacterial cells, in which the infection is detected by primers targeting a specific phage [30]. Digital PCR can be applied only to phages with previously known sequences, thus does not provide insights into the full diversity of phages able to attack one bacterial species. An example of a method, which is not limited to the previously known phages, is the proximity ligation MetaHiC [31]. It is a combination of experimental and computational approaches revealing which DNA molecules had physical contact in the cell, such as phages infecting the hosts. MetaHiC method is applied to the whole environmental sample and is limited to highly abundant bacteria and viruses, which are detectable by the traditional shotgun metagenomic sequencing.

The most advantageous approach for studying phage-bacteria interaction is single-cell genomics. In the typical single-cell genomics workflow,

#### *Single-Cell Genomics for Uncovering Relationships between Bacteriophages and Their Hosts DOI: http://dx.doi.org/10.5772/intechopen.108118*

fluorescence-activated cell sorting (FACS) is used to collect bacterial cells of interest, based on their cell size, internal granularity, or fluorescence, which is analyzed by the FACS instrument at a speed of several thousand of cells per second. The cells are sorted into 96 or 384 well plates and DNA is released from the cells by alkaline lysis. Afterward, a mixture of random hexamers and phi29 polymerase is applied to the single cells in an isothermal reaction of 4−12 hours to enrich the DNA by whole genome amplification (WGA). The femtograms of DNA from one cell are amplified up to the quantities required by the standard sequencing library preparation kits. Content of each single-cell is sequenced separately, resulting in so-called single amplified genomes SAGs (**Figure 3**) [32]. The SAG is then searched for viral-like contigs, which enables us to get links between previously unknown viruses and their uncultured hosts. Microbial single-cell genomics can be targeted toward minor bacterial groups by specific fluorescent probes, thus, it is not limited to highly abundant species as the shotgun metagenomics.

To avoid confusion, it is important to mention that microbial single-cell genomics largely differ from the single-cell genomics of human cells, which is widely used for characterization of gene expression in single cells from cancer tissues [33]. Expressed genes in eukaryotic organisms can be amplified through oligo(dT) primers; however, this is impossible in the case of bacteria, which lack the poly(A) tails in their transcripts. While the single-cell sequencing in cancer research is becoming more accessible for small labs and is moving into high-throughput scale [34], the single-cell genomics of bacterial cells is routinely managed only in few laboratories in the world, although has potential to be used more widely, especially if the requirements for sterile FACS sorting are fulfilled.

**Figure 3.** *Single-cell genomics workflow.*

#### **2.3 Assessing host-phage relationships from single-cell genomics data**

Single-cell genomics in the context of studying viruses has been successfully applied to several habitats, ranging from seawater to hot springs [35, 36]. This approach has shed light on important aspects of viral biology, such as horizontal gene transfer, including the ability of viruses to reprogram their host's energy metabolism [37, 38].

If a phage-bacteria pair, which was detected on a single sorted particle, has a high abundance in the environmental sample, it is possible to assess the lifestyle of the phage by coupling the single-cell data with metagenomics performed on a longitudinal sample series or on samples collected in close proximity to each other, for example, different layers of a sediment or different soil layers. An example is a study on phage-host relationships in a hot spring microbial mat in California characterized by a layer-specific bacterial composition and high cellular density [39]. The single-cell genomics demonstrated that one quarter of microbial cells in this mat contained viral contigs. By mapping metagenomic reads from different mat layers to the sequences of virus-host pairs obtained by single-cell genomics, a low mobility of the viruses across the mat layers and a low copy number of viral genomes compared to their hosts were revealed (**Figure 4**). The stable host-phage ratio suggested that the lysogeny was the predominant lifestyle of these phages, or that these phages form only few virions during the infection, so they do not outnumber the host cells. If the phages replicate in an aggressive way, their genome coverage would be higher than the genome coverage of their host. The opposite situation, in which the host genome has higher coverage than the phage, would mean that the phage is infecting only a fraction of the total host population, or is specialized only to certain strains of the host species, which are not distinguishable by metagenomics.

In some cases, phage lifestyle can be assessed directly by looking at the final completeness of the bacterial genome and the time when the fluorescence of WGA passes the critical point (Cp) detectable by the qPCR instrument. Bacterial genomes are not amplified uniformly due to the nature of the WGA reaction – some genomic regions will be over-amplified, while others will be absent in the final assembly. Nevertheless, this downside does not represent a big issue for the genome analysis − there are computational tools for estimating genome completeness, for example, CheckM, thus genomes with low completeness can be removed from the following analysis [40]. Normally, quickly amplifying wells (low Cp) result in bacterial genomes with a high genome completeness, while bacterial cells in wells with high Cp will have low genome completeness (**Figure 5**). If a well on the 384-well plate contains an easily accessible DNA fragment in several copies, such as phage replicating inside the bacterial cell, WGA reagents will preferentially enrich the phage genome rather than the bacterial genome, thus such a well will reach the Cp faster than the rest of the wells

**Figure 4.** *Phage lifestyle assessed from single-cell genomics and metagenomics.*

*Single-Cell Genomics for Uncovering Relationships between Bacteriophages and Their Hosts DOI: http://dx.doi.org/10.5772/intechopen.108118*

#### **Figure 5.**

*Active phage infection captured by single-cell genomics.*

on the plate, but it will have lower bacterial genome completeness (**Figure 5**). This phenomenon was observed in four out of 57 single cells from marine surface bacterioplankton, which is an environment with a high rate of lytic phage infections [35]. In contrast, it has not been detected in the single cells from the hot spring microbial mat, where lysogenic lifestyle prevails [39].

#### **2.4 Viral tagging**

The major advantage of the single-cell genomics is that it can be coupled with fluorescent probes targeting a specific subset of the total uncultured bacterial community, which possesses features of our interest (**Figure 3**) [32]. Fluorescent probes provide an experimental evidence of the tested feature, for example, the ability of the microbes to degrade cellulose [41] or to stimulate human immune system [42]. In addition, targeted single-cell genomics can lead to enrichment of low abundant bacteria with specific features of our interest, which would not be recovered by metagenome binning [43]. Targeted single-cell genomics can also employ phages as fluorescent tags. Phages stained with a generic nucleic stain can determine which bacteria are susceptible to phage attachment. This method is called viral tagging. The viral tagging represents a big advantage compared to detection of phages accidentally attached to the host cells in nature because the fluorescence provides evidence that the phages were present in the form of virions at the moment of the cell sorting, while detection of phages naturally occurring on single cells might be biased toward the prophages.

The first viral tagging experiments were performed in the 90's when infection of bacteria by stained phages was observed by a microscope [44]. A high throughput version of the viral tagging was developed two decades later and it involved FACS [45]. In a study published in Nature in 2014 [46], fluorescently labeled environmental phages were mixed with *Synechococcus*, a marine species cultured in a media containing isotopic nitrogen prior to the experiment. Phages, which were not attached, were removed prior to FACS by centrifugation, and some remaining phages were removed on FACS bi-plots by gating for particles of bacterial cell size. Only fluorescent bacterial cells were sorted using FACS in form of bulks of thousands of cells and the isotopic "heavy" host DNA was removed prior to sequencing to reduce sequencing efforts (**Figure 6**). By this way, a subset of 26 groups of environmental phages able to infect *Synechococcus* was revealed in a single FACS run, which saves a significant amount of work when compared with plaque assays.

In order to make viral tagging accessible to laboratories with no access to FACS instruments, a simplified adsorption assay has been recently developed, which does not involve fluorescent tagging. One of these methods is the removal of unattached phages by gel electrophoresis [47]. The principal advantage of the usage of the fluorescent viral tagging compared to the simplified methods is that correct phage attachment is confirmed by fluorescence emitted from the bacterial cells, while methods avoiding fluorescent tags can result in sequencing of cells containing no attached viruses, which results in wasting sequencing resources. Nevertheless, it is very convenient to apply the nonfluorescent separation techniques to communities with simple bacterial and viral compositions, which would easily detect the most active bacteriophages.

Viral tagging seems to be a very promising method for uncovering relationships between phages and their bacterial hosts. High specificity has been demonstrated by flow cytometry experiments and by mining phage-bacteria links detected on tagged single cells by computational methods [45, 46, 48]. Nevertheless, it can be argued that the adsorption of phages to the cell wall demonstrated by the acquisition of fluorescence does not always lead to infection. For example, there are several intracellular mechanisms, for example, CRISPR immunity, which can protect the bacterium from infection after the virus attachment to the cell wall. Nevertheless, the activity of the phage-bacteria links predicted by viral tagging can be assessed by metagenomic analysis of the same environmental samples, as explained above (**Figure 4**). In addition, in case of −culturable bacteria, the ability of attached phages to infect their hosts can be verified by following plaque assays [49].

Viral tagging has several advantages when compared to detection of phages naturally occurring on single cells. The principal advantage is that phages from one environment can be combined with bacteria from another environment. In a previous study, the viral tagging method was adjusted to the single-cell level and applied to the human gut microbiome. Viral tagging predicted phage-bacteria pairings, which could occur during a fecal microbiome transplant, in which viruses from a healthy individual are applied to restore altered gut microbiome composition [48].

#### **2.5 Resolving phage-host relationships in complex communities, such as human gut**

The human gut represents the most studied microbiome. However, while the majority of the most dominant bacterial groups in the human gut have some cultured representatives, the most of the human gut phages have not been cultured yet [50].

#### *Single-Cell Genomics for Uncovering Relationships between Bacteriophages and Their Hosts DOI: http://dx.doi.org/10.5772/intechopen.108118*

The results of the two recent studies on human gut phages clearly showed an enormous portion of novel phages. Nearly 190,000 phages clustered into 54,000 specieslike phage groups were detected in a set of 11,000 human gut metagenomes, and 92% of them were not found in existing databases [5]. Another study reported 142,000 nonredundant gut phages from 28,000 human gut metagenomes [51]. Basically, we can say that in each new metagenomic sequencing run from human gut microbiome, some new phages are found.

Nevertheless, while the number of phages discovered by sequence-based method is increasing enormously, the transition from the phage genome sequence to its isolation and understanding of its biology can take years. In 2014, the first crAssphage was discovered in metagenomic sequences [52]. It was computationally associated with *Bacteroides* host, one of the most abundant bacterial species of the human gut. The crAssphages were found to form the most widespread phage group in the human gut. Isolation and replication of the first crAssphage representative were finally achieved in the laboratory 4 years later [53, 54]. Its isolation was not a simple task because its lifestyle differs from the typical lytic phages. It forms plaques but has a very small burst of progeny (2.5 plaque-forming units per infected cell), much lower than the burst size of the widely studied *E. coli* phages (burst size up to 300) [55]. The crAssphage does not exist in a form of a prophage, so it does not switch from lytic to lysogenic life cycle. This unusual lifestyle led to an equilibrated coexistence with its *Bacteroides* host, which might benefit from the reservoir of auxiliary metabolic genes harbored in the crAssphages [53, 54]. The high number of prophages found in the genomes of gut bacteria and the relatively stable composition of the human gut microbiome suggest that so-called "piggyback-the-winner" model of host-phage interactions is more likely in the gut than the "kill-the-winner" model, which is often observed in aquatic ecosystems and is characterized by significant fluctuations of the phage-bacteria populations [56].

The single-cell viral tagging technique applied to the human gut microbiome has demonstrated on a high-throughput scale that the "kill-the-winner" model is not widely spread in this environment [48]. The links obtained from analysis of the tagged single cells were used for mapping metagenomics reads from a temporal sample series obtained during 2 weeks from each bowel movement and samples collected with a time difference of 1 year. The host-phage genome coverage ratios were very similar for most of the phages during the 2 weeks' period, but many pairs experienced greater changes over the period of 1 year after. The results suggested that gut phages move between integrated and lytic states using small burst sizes to avoid overwhelming their hosts, which, so far, has been experimentally proven for the crAssphage group only. The equilibrated relationship between phages and bacteria in the human gut revealed by single-cell techniques is in accordance with previous bacterial composition reports showing that the human gut microbiome remains relatively stable for months; however, rare events, such as travel or enteric infection influence the community dynamics [57].

#### **2.6 Novelty of the viruses recovered by single-cell genomics**

Single-cell techniques often result in discovery of novel viruses, which are not captured by sequencing of viromes (metagenomes of environmental samples enriched for the particles smaller than 0.2 μm). For example, the previous studies focused on detection of viruses on sorted single-cell and recovered phages, which were under the detection limit in the viromes, while their bacterial hosts were

detectable [39, 48]. There are several reasons for this observation. The first reason is the randomness of the cell sorting – while metagenomics or viromics always recover the most abundant microbes, cell sorting can accidentally recover some microbes with low abundance, which would otherwise remain uncharacterized as part of the microbial dark matter [58]. The second reason might be the attachment ≠ infection argument claiming that not all attached phages are successful in the cell infection. Therefore, the phages recovered by the single-cell genomics might not be the most infectious ones [59]. The third reason is the possibility of recovering ssDNA viruses by whole genome amplification reaction, which are not detectable in traditional viromes, in which only dsDNA is extracted, thus ssDNA phages are missed [60]. The fourth reason might be ability of single-cell genomics to recover viruses, which are difficult to assemble if traditional viromes are sequenced. This was the case of a *Pelagibacter* phage, which is the most abundant virus in the world, but its microdiversity has been hindering the metagenomic assembly of its genome, despite large sequencing efforts of the global ocean microbiomes [61].

Novel types of phages with atypical head or tail structures, unusual nucleic acids (such as RNA phages), or unique lifestyles are constantly being discovered [62]. The phage research community involves scientists from all around the world and there are many efforts to improve our ability to characterize novel phages. For example, the completeness of a viral genome recovered from metagenomes, viromes, or single-cell assemblies can be assessed by computational tools, such as CheckV [63]. There are also several computational tools for taxonomic classification of phages [64], and also, a possibility of a normalized taxonomic classification based on single-copy marker genes has been explored [65]. The International Committee on Taxonomy of Viruses (ICTV) is constantly updating the taxonomy system to accommodate the large number of novel viruses [66]. It is possible that if metagenomes or single cells sequence in the past are reanalyzed with new phage-mining computational tools, much more novel phages will be discovered.

#### **3. Conclusion**

Identification of the bacterial hosts of phages is important for elucidating their impact on the bacterial community in an environment. Traditional plaque assays provide this information for culturable bacteria only, and computational methods detect genome signatures reflecting past infections in a limited portion of bacterial or viral species. A certain portion of bacterial cells in each environment contains bacteriophages in their interior or attached to their surface. Microbial single-cell genomics is a high-throughput technique for capturing links between novel phages and their uncultured hosts if they are colocated on the same single particle. Lifestyle of uncultured phages in their natural environment can be revealed from whole-genome amplification curves of the collected single-particles or from their genome assemblies. Metagenomics of the same environmental samples, from which single-cells have been collected, provide additional information on the phage lifestyle. Phages from one sample can be combined with bacteria from another sample and their compatibility via phage adsorption can be tested by viral tagging. In summary, microbial single-cell genomics is a useful tool for obtaining important ecological data on bacterial and viral dark matter, which can influence biogeochemistry across all environments on Earth, and also affect our health.

*Single-Cell Genomics for Uncovering Relationships between Bacteriophages and Their Hosts DOI: http://dx.doi.org/10.5772/intechopen.108118*
