**1. Introduction**

Vector-borne diseases affecting human health are caused by pathogens transmitted by "living organisms" between humans or from animals to humans. These "living organisms" are known as "vectors," which generally are bloodsucking arthropods, such as mosquitoes, ticks, flies, sandflies, fleas, or triatomine bugs. These arthropods ingest disease-producing microorganisms during a blood meal from an infected host (human or animal) and later transmit it to a new host during their subsequent blood meals [1]. According to the World Health Organization (WHO), vector-borne diseases, such as malaria, dengue, human African trypanosomiasis, leishmaniasis, Chagas disease, yellow fever, Japanese encephalitis, or onchocerciasis, account for almost 20% of all infectious diseases worldwide. They cause more than 700,000 deaths annually, and more than half of the world's population is estimated to be at risk of these diseases [1]. They are a major obstacle to development, and the poorest segments of societies and least-developed countries are the most affected. The most deadly vector-borne disease, malaria, causes more than 400,000 deaths annually, mainly children under 5 years. However, the world's fastest-growing vector-borne disease is dengue, with a 30-fold increase in disease incidence over the last 50 years [1, 2]. Currently, there is an estimation of 96 million cases of dengue per year, and more than 3.9 billion people in over 128 countries are at risk of contracting this disease [1, 3]. Chagas disease, which is one of the primary study models of our research group and classified by the WHO within the group of Neglected Tropical Diseases (NTDs), is a major public health problem in Latin America where 6–7 million people are currently infected [4, 5].

The control of vector-borne diseases relies mainly on control programs targeted against the different vectors. Nevertheless, the efficiency of the different vector control strategies is highly linked to the local ecology of the vectors [6], which in turn defines local transmission cycles. Consequently, for the implementation of sustainable control strategies aimed at disrupting the transmission of vector-borne pathogens, comprehensive knowledge of the vector ecology and behavior in the different eco-epidemiological contexts, as well as the local transmission cycles of the pathogens and their dynamics, is an essential need. However, even when focusing only on one specific vector-borne disease, achieving this knowledge is challenging. Indeed, the pathogen may exhibit a high genetic diversity, and multiple vector species or subspecies and host species may be involved. In addition, the development of the pathogen and the vectorial capacity of the vectors may be affected by their midgut and/or salivary gland microbiome. Sometimes, many pathogen species can also be involved. For example, leishmaniases are caused by more than 20 *Leishmania* species [7].

The recent advent of Next-Generation Sequencing (NGS) technologies has brought powerful tools, with enormous potential, allowing the simultaneous identification of all these components for the understanding of the eco-epidemiology of vector-borne diseases. Nevertheless, their potential is only just starting to be realized. Here, we present a metabarcoding approach based on NGS that can facilitate

**87**

**Figure 1.**

*Metabarcoding: A Powerful Yet Still Underestimated Approach for the Comprehensive Study…*

**2. Complexity of vector-borne pathogen transmission cycles and their** 

The transmission cycles of vector-borne pathogens are shaped by the ecology and behavior of hosts and vectors in their specific environments and defined by the specific interactions between the vectors, the pathogens, and their hosts (which also act as blood-feeding sources of the vectors) [8]. Consequently, the comprehensive identification of these interactions is critical to disentangle transmission cycles and understand their dynamics. In most cases, an extraordinary diversity of organisms is involved, making the identification of those interactions challenging. In the case of Chagas disease, for example, the causative agent, a protozoan parasite called *Trypanosoma cruzi*, presents a very high genetic diversity, which has been classified into seven discrete typing units (DTUs) [9]. These DTUs are transmitted by more than 140 triatomine species, which live in a very wide variety of ecotopes and bioclimatic conditions [10], to more than 180 mammalian species, including wild animals, domestic animals, and human [11, 12]. In parallel, triatomines also take blood meals upon animals which are refractory to *T. cruzi* infection, called incompetent hosts, such as birds, reptiles, and amphibians [13, 14] (**Figure 1**). Finally, the establishment and development of the parasite and the vectorial capacity of the triatomines could be affected by the composition of their midgut microbiome [15], as has been shown for other vectors. For example, the development of *Trypanosoma brucei*, the agent of African trypanosomiasis, in its tsetse fly vector, is directly influenced by a microbiome-regulated gut immune barrier [16]. In the same way,

*Complexity of* T. cruzi *transmission cycles. The parasite is divided into seven genetic subgroups (DTUs), which are transmitted by more than 140 triatomine species to more than 180 mammalian species, including wild animals, domestic animals, and human. In parallel, triatomines also take blood meals upon animals which are* 

*refractory to* T. cruzi *infection (incompetent hosts). Figure adapted from [25].*

the creation of comprehensive host-pathogen networks, integrate important microbiome and coinfection data, identify at-risk situations, and disentangle the

*DOI: http://dx.doi.org/10.5772/intechopen.89839*

**dynamics**

transmission cycles of vector-borne pathogens.

*Metabarcoding: A Powerful Yet Still Underestimated Approach for the Comprehensive Study… DOI: http://dx.doi.org/10.5772/intechopen.89839*

the creation of comprehensive host-pathogen networks, integrate important microbiome and coinfection data, identify at-risk situations, and disentangle the transmission cycles of vector-borne pathogens.

## **2. Complexity of vector-borne pathogen transmission cycles and their dynamics**

The transmission cycles of vector-borne pathogens are shaped by the ecology and behavior of hosts and vectors in their specific environments and defined by the specific interactions between the vectors, the pathogens, and their hosts (which also act as blood-feeding sources of the vectors) [8]. Consequently, the comprehensive identification of these interactions is critical to disentangle transmission cycles and understand their dynamics. In most cases, an extraordinary diversity of organisms is involved, making the identification of those interactions challenging. In the case of Chagas disease, for example, the causative agent, a protozoan parasite called *Trypanosoma cruzi*, presents a very high genetic diversity, which has been classified into seven discrete typing units (DTUs) [9]. These DTUs are transmitted by more than 140 triatomine species, which live in a very wide variety of ecotopes and bioclimatic conditions [10], to more than 180 mammalian species, including wild animals, domestic animals, and human [11, 12]. In parallel, triatomines also take blood meals upon animals which are refractory to *T. cruzi* infection, called incompetent hosts, such as birds, reptiles, and amphibians [13, 14] (**Figure 1**). Finally, the establishment and development of the parasite and the vectorial capacity of the triatomines could be affected by the composition of their midgut microbiome [15], as has been shown for other vectors. For example, the development of *Trypanosoma brucei*, the agent of African trypanosomiasis, in its tsetse fly vector, is directly influenced by a microbiome-regulated gut immune barrier [16]. In the same way,

#### **Figure 1.**

*Vector-Borne Diseases - Recent Developments in Epidemiology and Control*

America where 6–7 million people are currently infected [4, 5].

The control of vector-borne diseases relies mainly on control programs targeted against the different vectors. Nevertheless, the efficiency of the different vector control strategies is highly linked to the local ecology of the vectors [6], which in turn defines local transmission cycles. Consequently, for the implementation of sustainable control strategies aimed at disrupting the transmission of vector-borne pathogens, comprehensive knowledge of the vector ecology and behavior in the different eco-epidemiological contexts, as well as the local transmission cycles of the pathogens and their dynamics, is an essential need. However, even when focusing only on one specific vector-borne disease, achieving this knowledge is challenging. Indeed, the pathogen may exhibit a high genetic diversity, and multiple vector species or subspecies and host species may be involved. In addition, the development of the pathogen and the vectorial capacity of the vectors may be affected by their midgut and/or salivary gland microbiome. Sometimes, many pathogen species can also be involved. For example, leishmaniases are caused by more than 20

The recent advent of Next-Generation Sequencing (NGS) technologies has brought powerful tools, with enormous potential, allowing the simultaneous identification of all these components for the understanding of the eco-epidemiology of vector-borne diseases. Nevertheless, their potential is only just starting to be realized. Here, we present a metabarcoding approach based on NGS that can facilitate

EcoHealth, One Health

**1. Introduction**

tation of sustainable, effective, and locally adapted control strategies.

This powerful approach should be generalized to unravel the transmission cycles of any pathogen and their dynamics, which in turn will help the design and implemen-

Vector-borne diseases affecting human health are caused by pathogens transmitted by "living organisms" between humans or from animals to humans. These "living organisms" are known as "vectors," which generally are bloodsucking arthropods, such as mosquitoes, ticks, flies, sandflies, fleas, or triatomine bugs. These arthropods ingest disease-producing microorganisms during a blood meal from an infected host (human or animal) and later transmit it to a new host during their subsequent blood meals [1]. According to the World Health Organization (WHO), vector-borne diseases, such as malaria, dengue, human African trypanosomiasis, leishmaniasis, Chagas disease, yellow fever, Japanese encephalitis, or onchocerciasis, account for almost 20% of all infectious diseases worldwide. They cause more than 700,000 deaths annually, and more than half of the world's population is estimated to be at risk of these diseases [1]. They are a major obstacle to development, and the poorest segments of societies and least-developed countries are the most affected. The most deadly vector-borne disease, malaria, causes more than 400,000 deaths annually, mainly children under 5 years. However, the world's fastest-growing vector-borne disease is dengue, with a 30-fold increase in disease incidence over the last 50 years [1, 2]. Currently, there is an estimation of 96 million cases of dengue per year, and more than 3.9 billion people in over 128 countries are at risk of contracting this disease [1, 3]. Chagas disease, which is one of the primary study models of our research group and classified by the WHO within the group of Neglected Tropical Diseases (NTDs), is a major public health problem in Latin

**Keywords:** vector-borne diseases, transmission cycles, vector ecology, behavior, metabarcoding, next-generation sequencing (NGS), blood meals, microbiome,

**86**

*Leishmania* species [7].

*Complexity of* T. cruzi *transmission cycles. The parasite is divided into seven genetic subgroups (DTUs), which are transmitted by more than 140 triatomine species to more than 180 mammalian species, including wild animals, domestic animals, and human. In parallel, triatomines also take blood meals upon animals which are refractory to* T. cruzi *infection (incompetent hosts). Figure adapted from [25].*

the sand fly midgut microbiome is a critical factor for *Leishmania* growth and differentiation to its infective state prior to disease transmission [17]. Gut microbiome similarly modulates dengue virus infection in *Aedes aegypti* mosquitoes [18, 19], and microbiome manipulation may be used to control virus transmission [20, 21]. Similar observations exist for other vector/pathogen systems, such as ticks and the causative agent of Lyme disease [22], or malaria vectors [23], in which salivary gland microbiome may also play a role [24].

## **3. Metabarcoding: a highly sensitive and integrative approach to disentangle vector-borne pathogen transmission cycles**

NGS technologies can generate millions of sequencing reads in parallel. This massive throughput sequencing capacity can produce sequence reads from fragmented libraries of a specific genome (i.e., genome sequencing) or from a pool of PCR products. Metabarcoding approaches rely on this technology where a large number of different amplicons of taxonomic informative genes (barcodes) can be sequenced. While metagenomics refers to the identification of all genomes within a particular ecosystem or sample, metabarcoding aims to identify only a subset of them (those that are of interest for a particular question) by sequencing of millions of different amplicons of these barcodes, without a necessity for cloning (i.e., sequences are obtained directly from a mix of different amplicons of different barcodes of interest) [26].

Consequently, in the case of vector-borne pathogens, starting only from the vectors as biological samples, it is possible to target and amplify well-chosen molecular markers (barcodes) of interest with universal primer sets to identify the different actors of transmission cycles (e.g., vertebrate blood sources, midgut microbiome, pathogen diversity, and vector diversity [27]). Other ecological interactions which are not directly involved in the transmission cycles but relevant for the understanding of the vector ecology and the dynamics of the transmission cycles (e.g., plant-feeding sources, sometimes required as a source of energy for routine activities such as flight, mating, and walking or a source of protein for maturation of eggs [28]) can also be identified. A schematic representation of the metabarcoding approach for the identification of ecological interactions of disease vectors is given in **Figure 2**. After purification of the total DNA (and RNA if working with RNA pathogens) contained in each vector midgut (and salivary glands, depending on the kind of vector) **(1)**, molecular markers (barcodes) of interest are PCR amplified (after RT-PCR if working with RNA pathogens) **(2)**. Then, to identify samples, a tag/index is added to each PCR product (amplicon). The same tag is used for all the amplicons obtained from a single sample **(3)**. After highthroughput sequencing **(4)**, the millions of reads **(5)** are sorted per sample thanks to the tags added to each amplicon **(6)**.

Currently, the most common systems provide up to 384 different tags and 25 million reads per sequencing run. The depth (i.e., the number of reads or the number of sequences) obtained per molecular marker and sample depends on the number of labeled samples and the number of markers amplified per sample. For instance, if we amplify 10 molecular markers for 100 vector specimens and run at a depth of 25 million sequences, about 250,000 reads per vector specimen and 25,000 reads per marker and specimen will be theoretically obtained. This kind of multiplexing allows to considerably lower sequencing costs per sample. Downstream analyses with bioinformatics tools, such as those provided on the open access Galaxy platform [29], allows to obtain and identify the sequences corresponding to each targeted marker for each vector specimen. This approach is thus

**89**

**Figure 2.**

*disease vectors. Figure adapted from [25].*

cycles in the study area.

*Metabarcoding: A Powerful Yet Still Underestimated Approach for the Comprehensive Study…*

extremely powerful to further reconstitute the pathogen transmission cycles and understand its dynamics, since it can reveal, after adequate analyses, all the existing ecological interactions thanks to the simultaneous identification and for each specimen of its species or subspecies, its blood-feeding source(s), the pathogen(s) of interest, the species or lineage(s) of the pathogen(s) of interest, the composition of its midgut microbiome, of its salivary gland microbiome, its plant-feeding

*Schematic representation of the metabarcoding approach for the identification of ecological interactions of* 

**4. Unraveling** *T. cruzi* **transmission cycles in the Yucatan peninsula (Mexico): an example of the metabarcoding approach use**

As a proof of concept, we recently performed a pilot study of the metabarcoding approach presented above using Chagas disease in the Yucatan peninsula (Mexico) [27]. In this region, *T. dimidiata* is the main vector, and different genetic subgroups of this species [30–32] live in sympatry [33]. The different molecular markers we selected for our metabarcoding approach are described below: (i) to classify *T. dimidiata* in its different genetics subgroups, we used primers targeting the Internal Transcribed Spacer ITS-2 as previously described [34]; (ii) for blood-feeding source identification, we used vertebrate universal primers targeting the 12S rRNA gene [35]; (iii) for *T. cruzi*, we used primers targeting the mini-exon gene, allowing further classification of the parasite in its different DTUs [36]; and (iv) finally, we used universal primers targeting the bacterial 16S rRNA gene to identify bacterial microbiome composition [37]. This way, we aimed to determine if there were detectable interaction patterns between the genetic subgroups of *T. dimidiata*, their blood-feeding hosts, the infection with *T. cruzi*, the parasite DTUs, and the microbiome composition, allowing elucidating at finer scales the *T. cruzi* transmission

This study, which was based on 14 *T. dimidiata* bugs collected in wild as well as in domestic ecotopes, evidences the feasibility and high sensibility of the proposed approach [27]. For example, we identified an average number of blood-feeding species per bug of 4.9 ± 0.7 and up to 7 blood-feeding species and 11 blood-feeding individuals in a single bug. Contrastingly, current techniques based on direct sequencing of PCR products can only identify the dominant sequence/host in each sample [38], while the addition of a cloning step prior to sequencing generally

source(s), mutations associated with insecticide resistance, etc.

*DOI: http://dx.doi.org/10.5772/intechopen.89839*

*Metabarcoding: A Powerful Yet Still Underestimated Approach for the Comprehensive Study… DOI: http://dx.doi.org/10.5772/intechopen.89839*

**Figure 2.**

*Vector-Borne Diseases - Recent Developments in Epidemiology and Control*

gland microbiome may also play a role [24].

barcodes of interest) [26].

to the tags added to each amplicon **(6)**.

the sand fly midgut microbiome is a critical factor for *Leishmania* growth and differentiation to its infective state prior to disease transmission [17]. Gut microbiome similarly modulates dengue virus infection in *Aedes aegypti* mosquitoes [18, 19], and microbiome manipulation may be used to control virus transmission [20, 21]. Similar observations exist for other vector/pathogen systems, such as ticks and the causative agent of Lyme disease [22], or malaria vectors [23], in which salivary

**3. Metabarcoding: a highly sensitive and integrative approach to disentangle vector-borne pathogen transmission cycles**

of different amplicons of these barcodes, without a necessity for cloning

NGS technologies can generate millions of sequencing reads in parallel. This massive throughput sequencing capacity can produce sequence reads from fragmented libraries of a specific genome (i.e., genome sequencing) or from a pool of PCR products. Metabarcoding approaches rely on this technology where a large number of different amplicons of taxonomic informative genes (barcodes) can be sequenced. While metagenomics refers to the identification of all genomes within a particular ecosystem or sample, metabarcoding aims to identify only a subset of them (those that are of interest for a particular question) by sequencing of millions

(i.e., sequences are obtained directly from a mix of different amplicons of different

Consequently, in the case of vector-borne pathogens, starting only from the vectors as biological samples, it is possible to target and amplify well-chosen molecular markers (barcodes) of interest with universal primer sets to identify the different actors of transmission cycles (e.g., vertebrate blood sources, midgut microbiome, pathogen diversity, and vector diversity [27]). Other ecological interactions which are not directly involved in the transmission cycles but relevant for the understanding of the vector ecology and the dynamics of the transmission cycles (e.g., plant-feeding sources, sometimes required as a source of energy for routine activities such as flight, mating, and walking or a source of protein for maturation of eggs [28]) can also be identified. A schematic representation of the metabarcoding approach for the identification of ecological interactions of disease vectors is given in **Figure 2**. After purification of the total DNA (and RNA if working with RNA pathogens) contained in each vector midgut (and salivary glands, depending on the kind of vector) **(1)**, molecular markers (barcodes) of interest are PCR amplified (after RT-PCR if working with RNA pathogens) **(2)**. Then, to identify samples, a tag/index is added to each PCR product (amplicon). The same tag is used for all the amplicons obtained from a single sample **(3)**. After highthroughput sequencing **(4)**, the millions of reads **(5)** are sorted per sample thanks

Currently, the most common systems provide up to 384 different tags and 25 million reads per sequencing run. The depth (i.e., the number of reads or the number of sequences) obtained per molecular marker and sample depends on the number of labeled samples and the number of markers amplified per sample. For instance, if we amplify 10 molecular markers for 100 vector specimens and run at a depth of 25 million sequences, about 250,000 reads per vector specimen and 25,000 reads per marker and specimen will be theoretically obtained. This kind of multiplexing allows to considerably lower sequencing costs per sample. Downstream analyses with bioinformatics tools, such as those provided on the open access Galaxy platform [29], allows to obtain and identify the sequences corresponding to each targeted marker for each vector specimen. This approach is thus

**88**

*Schematic representation of the metabarcoding approach for the identification of ecological interactions of disease vectors. Figure adapted from [25].*

extremely powerful to further reconstitute the pathogen transmission cycles and understand its dynamics, since it can reveal, after adequate analyses, all the existing ecological interactions thanks to the simultaneous identification and for each specimen of its species or subspecies, its blood-feeding source(s), the pathogen(s) of interest, the species or lineage(s) of the pathogen(s) of interest, the composition of its midgut microbiome, of its salivary gland microbiome, its plant-feeding source(s), mutations associated with insecticide resistance, etc.
