Landscape Genetics: From Classic Molecular Markers to Genomics

*Enéas Ricardo Konzen and Maria Imaculada Zucchi*

## **Abstract**

Landscape genetics combines population genetics and landscape ecology to understand processes that shape the distribution and organization of human, animal, or plant populations. This field of genetics emerged from the availability of several studies with classical molecular markers, such as isozymes, RAPD, AFLP, and microsatellites. Population genetic studies enabled the detection of population structure with those markers, but a more comprehensive analysis of natural populations was only possible with the development of statistical methods that combined both molecular data and environmental variables. Ultimately, the rapid development of sequencing technologies allowed studies at the genomic level, augmenting the resolution of association with environment factors. This chapter outlines basic concepts in landscape genetics, the main statistical methods used so far, and the perspectives of this field of knowledge into strategies for conservation of natural populations of plant and animal species. Moreover, we briefly describe the application of the field to understand historical human migration processes as well as how some diseases are spread throughout the world.

**Keywords:** molecular studies, environmental variables, population structure, genetic diversity, single nucleotide polymorphisms

#### **1. Introduction**

Population genetic studies deal with allele frequencies and processes that shape their variation within and among populations. Multiple studies have addressed genetic variation and their structure based on the screening of molecular markers such as allozymes (began with Lewontin and Hubby [1]), random amplified polymorphic DNA (RAPD) [2], amplified fragment length polymorphism (AFLP) [3], microsatellites or simple sequence repeats (SSR) [4], intersimple sequence repeats (ISSR) [5] and single nucleotide polymorphisms (SNP). The use of allozyme markers started up a series of population genetic studies, allowing relatively precise estimation of heterozygosity levels due to their codominance nature. Those markers were largely employed until the end of the 1990s. The development of techniques for screening directly at the level of DNA has accelerated the discovery of numberless markers in humans, animals, plants, fungi, and other organisms. RAPD, ISSR, and AFLP, in general, are more limited in describing genetic variation due to their dominance. In contrast, several SSR markers have been developed for studying a diverse set of species, enabling precise estimates of genetic diversity, gene flow, spatial genetic structure, paternity, linkage, and association mapping.

Ultimately, SNP markers have arisen as powerful markers for fine-scale genetic diversity, structure, and association mapping studies. The direct comparison among sequences of specific fragments generated by Sanger sequencing allowed the discovery of the first set of SNP. However, the revolution in sequencing technology of the last decade has provided numberless sequences for comparing individuals and deciphering population genetic mechanisms with high accuracy. The nextgeneration sequencing platforms generate millions of sequences that often result in thousands of SNP markers.

Nonetheless, the sole use of molecular data provides no definitive responses on evolutionary mechanisms operating in populations. An examination of the ecological factors, that drive the fate of individuals over generations or how current mechanisms impact in their adaptation or acclimation, is a much-needed task to better understand all species. Adequate statistical methods combining genetic and environmental variables are then necessary. Landscape genetics emerged as a field for the improvement of our understanding of the influence of geographical and environmental variables on the genetic structure of populations [6]. It diverges from the traditional basis of population genetics in the sense of more profound tests of the influence of landscape and environmental factors such as altitude, topography, and ground cover on population processes such as gene flow and population structure [7]. The rapid boost in genome-scale analyses also generated the terminology landscape genomics, as proposed by Joost et al. [8]. Landscape genomics differs from landscape genetics in the sense that it has become a powerful approach for scanning genes involved in complex adaptation mechanisms of species at populations and individual levels [9, 10].

This chapter is intended to provide brief concepts that cover the subject of landscape genetics and genomics. Furthermore, we outline potential applications of landscape genetic studies in the comprehension of adaptive traits of plants and animals and how such results may assist in the design of conservation strategies for endangered species. It is not our intent to provide an exhaustive panorama of landscape genetics studies so far, but rather contextualize concepts and applications with chosen case studies. Moreover, we briefly contextualize how landscape genetics is contributing in the comprehension of historical human migrations and the dispersion of human diseases.

#### **2. Molecular markers and population structure studies**

The most popular molecular markers employed in population genetic studies are SSR [4] and SNP. Simple sequence repeats are tandem repeated motifs with 1–6 bp [11] or up to 10 bp [12] with high frequency in genomes of all organisms. Plants commonly have AT-type repeats, whereas animals have the AC motif as the most common repeat unit [13]. High mutation rates are characteristics of microsatellite markers [12] providing markers with several alleles. SSR are codominant, hypervariable, and Mendelian inherited [14], which is implicated in high heterozygosity levels, increasing the discriminatory power among individuals and populations. Originally, SSR were developed from DNA libraries that required extensive laboratory work. Currently, however, the easiest way of discovering novel microsatellites if though direct sequencing of genomes and transcriptomes generated from NGS platforms [12]. With that available, SNP markers have actually been the most studied markers in recent years. SNP markers are the most abundant polymorphisms along plant and animal genomes. SNP consist on single base-pair changes present in the genome sequence that can occur as transitions or transversions, as nucleotide substitutions [15]. They can reach much higher density than all other types of markers in

**109**

selection [10].

their geographic location [24].

*Landscape Genetics: From Classic Molecular Markers to Genomics*

genomes. Next-generation sequencing can generate large amounts of sequence data,

Microsatellites and SNP markers are powerful tools for population genetic analyses. They have been extensively employed in studies with humans as well as animal and plant models and non-model species. The codominance and multiallelic nature of microsatellites make them suitable for estimating variables such as heterozygosity, inbreeding, gene flow, outcrossing rates, differentiation among populations and population structure [17]. SNP markers are generally employed for determining population structure as well, but with much higher density of markers and therefore genomic coverage to explain such subdivision. A series of studies have used SNP to dissect complex traits with QTL mapping and genome-wide associa-

**3. The concepts of landscape genetics and landscape genomics**

**4. A briefing on statistical approaches in landscape genetics**

In a landscape genetics study, two steps of analyses are normally required. The first involves the analysis of patterns of genetic variation. Next, such patterns are correlated with landscape variables based on statistical methods [22]. To test for association of environmental variables with genetic data, one of the simplest and commonly used methods is the Mantel's test, originally developed for identifying time-space clustering of diseases [23]. The test uses permutations to address the significance of the linear correlation coefficient between two pair-wise similarity or dissimilarity matrices [22]. One of the simplest examples of its application in landscape genetics is to correlate the genetic distances between individuals with

The methods for determining association of genetic data with environmental variables can be broadly categorized into approaches that deal with (i) pair-wise

Landscape genetics is concerned with testing the effects of landscape features on gene flow and genetic population structure. In general, the first studies of landscape genetics involved an exploratory phase, by geographically widespread sampling of populations and analysis of the effects of various landscape variables [18]. Landscape features or variables consist of any biotic, climatic, soil, or other conditions that comprise the habitat of organisms [6]. The population structure means the organization of genetic variation as influenced by a combination of evolutionary forces such as recombination, mutation, drift, natural selection, and historic demographic processes [19]. This leads to the idea that a group of subpopulations that exchange migrants in an occasional fashion are part of metapopulations [6]. The current status of genomic technologies allows the discovery of thousands of SNP markers, which has increased the resolution power for studying the association of environmental variables with specific genomics regions, also with a much deeper understanding of evolutionary processes. Genotyping-by-sequencing has enabled the discovery of SNP markers even in non-model species, which may lack a reference genome so far [20, 21]. This is where the concept of landscape genomics comes forward. Landscape genomics focuses on detecting candidate genes under selection as putative signals of local adaptation. The design of a landscape genomics experiment involves replicated sampling of environmental factors that might be driving selection, augmenting the resolution for detection of candidate loci under

*DOI: http://dx.doi.org/10.5772/intechopen.92022*

tion studies (GWAS) [15].

enabling the detection of thousands of SNP [16].

#### *Landscape Genetics: From Classic Molecular Markers to Genomics DOI: http://dx.doi.org/10.5772/intechopen.92022*

*Methods in Molecular Medicine*

thousands of SNP markers.

tions and individual levels [9, 10].

dispersion of human diseases.

Ultimately, SNP markers have arisen as powerful markers for fine-scale genetic

Nonetheless, the sole use of molecular data provides no definitive responses on evolutionary mechanisms operating in populations. An examination of the ecological factors, that drive the fate of individuals over generations or how current mechanisms impact in their adaptation or acclimation, is a much-needed task to better understand all species. Adequate statistical methods combining genetic and environmental variables are then necessary. Landscape genetics emerged as a field for the improvement of our understanding of the influence of geographical and environmental variables on the genetic structure of populations [6]. It diverges from the traditional basis of population genetics in the sense of more profound tests of the influence of landscape and environmental factors such as altitude, topography, and ground cover on population processes such as gene flow and population structure [7]. The rapid boost in genome-scale analyses also generated the terminology landscape genomics, as proposed by Joost et al. [8]. Landscape genomics differs from landscape genetics in the sense that it has become a powerful approach for scanning genes involved in complex adaptation mechanisms of species at popula-

This chapter is intended to provide brief concepts that cover the subject of landscape genetics and genomics. Furthermore, we outline potential applications of landscape genetic studies in the comprehension of adaptive traits of plants and animals and how such results may assist in the design of conservation strategies for endangered species. It is not our intent to provide an exhaustive panorama of landscape genetics studies so far, but rather contextualize concepts and applications with chosen case studies. Moreover, we briefly contextualize how landscape genetics is contributing in the comprehension of historical human migrations and the

The most popular molecular markers employed in population genetic studies are SSR [4] and SNP. Simple sequence repeats are tandem repeated motifs with 1–6 bp [11] or up to 10 bp [12] with high frequency in genomes of all organisms. Plants commonly have AT-type repeats, whereas animals have the AC motif as the most common repeat unit [13]. High mutation rates are characteristics of microsatellite markers [12] providing markers with several alleles. SSR are codominant, hypervariable, and Mendelian inherited [14], which is implicated in high heterozygosity levels, increasing the discriminatory power among individuals and populations. Originally, SSR were developed from DNA libraries that required extensive laboratory work. Currently, however, the easiest way of discovering novel microsatellites if though direct sequencing of genomes and transcriptomes generated from NGS platforms [12]. With that available, SNP markers have actually been the most studied markers in recent years. SNP markers are the most abundant polymorphisms along plant and animal genomes. SNP consist on single base-pair changes present in the genome sequence that can occur as transitions or transversions, as nucleotide substitutions [15]. They can reach much higher density than all other types of markers in

**2. Molecular markers and population structure studies**

diversity, structure, and association mapping studies. The direct comparison among sequences of specific fragments generated by Sanger sequencing allowed the discovery of the first set of SNP. However, the revolution in sequencing technology of the last decade has provided numberless sequences for comparing individuals and deciphering population genetic mechanisms with high accuracy. The nextgeneration sequencing platforms generate millions of sequences that often result in

**108**

genomes. Next-generation sequencing can generate large amounts of sequence data, enabling the detection of thousands of SNP [16].

Microsatellites and SNP markers are powerful tools for population genetic analyses. They have been extensively employed in studies with humans as well as animal and plant models and non-model species. The codominance and multiallelic nature of microsatellites make them suitable for estimating variables such as heterozygosity, inbreeding, gene flow, outcrossing rates, differentiation among populations and population structure [17]. SNP markers are generally employed for determining population structure as well, but with much higher density of markers and therefore genomic coverage to explain such subdivision. A series of studies have used SNP to dissect complex traits with QTL mapping and genome-wide association studies (GWAS) [15].

### **3. The concepts of landscape genetics and landscape genomics**

Landscape genetics is concerned with testing the effects of landscape features on gene flow and genetic population structure. In general, the first studies of landscape genetics involved an exploratory phase, by geographically widespread sampling of populations and analysis of the effects of various landscape variables [18]. Landscape features or variables consist of any biotic, climatic, soil, or other conditions that comprise the habitat of organisms [6]. The population structure means the organization of genetic variation as influenced by a combination of evolutionary forces such as recombination, mutation, drift, natural selection, and historic demographic processes [19]. This leads to the idea that a group of subpopulations that exchange migrants in an occasional fashion are part of metapopulations [6].

The current status of genomic technologies allows the discovery of thousands of SNP markers, which has increased the resolution power for studying the association of environmental variables with specific genomics regions, also with a much deeper understanding of evolutionary processes. Genotyping-by-sequencing has enabled the discovery of SNP markers even in non-model species, which may lack a reference genome so far [20, 21]. This is where the concept of landscape genomics comes forward. Landscape genomics focuses on detecting candidate genes under selection as putative signals of local adaptation. The design of a landscape genomics experiment involves replicated sampling of environmental factors that might be driving selection, augmenting the resolution for detection of candidate loci under selection [10].

#### **4. A briefing on statistical approaches in landscape genetics**

In a landscape genetics study, two steps of analyses are normally required. The first involves the analysis of patterns of genetic variation. Next, such patterns are correlated with landscape variables based on statistical methods [22]. To test for association of environmental variables with genetic data, one of the simplest and commonly used methods is the Mantel's test, originally developed for identifying time-space clustering of diseases [23]. The test uses permutations to address the significance of the linear correlation coefficient between two pair-wise similarity or dissimilarity matrices [22]. One of the simplest examples of its application in landscape genetics is to correlate the genetic distances between individuals with their geographic location [24].

The methods for determining association of genetic data with environmental variables can be broadly categorized into approaches that deal with (i) pair-wise

#### *Methods in Molecular Medicine*

landscape data and (ii) location-specific landscape data, as reviewed by Balkenhol et al. [22]. The development of methods in landscape genomics, however, expanded the range of tests for detecting loci under selection using genome scans, approaches for candidate gene discovery, QTL mapping and GWAS. Genome scans use two methods for detecting loci under selection, the differentiation outlier methods and the genetic-environmental association test, as reviewed by Storfer et al. [10]. Novel methods are continuously being developed, as more genomes are becoming sequenced or resequenced in populations.

#### **5. Applications of landscape genetics**

Several applications of landscape genetics or genomics can be described. We briefly account for case studies in plant and animal systems within this section. Moreover, a few examples of studies applied to humans are also given. In general, landscape genetics or genomics studies have provided association among geographic, abiotic, and biotic factors and genetic data provided by the screening of molecular markers in populations of diverse organisms. It has increased our power to detail inferences of movement and gene flow and potential adaptation to the landscape populations occur. However, studies for several organisms are still scarce or inexistent.

Cultivated crops such as maize, soybean, rice, and common bean were domesticated from wild progenitors which reflect their current adaptation to distinct environments. Landscape genomics studies have enabled a deeper understanding of processes shaping their distribution across multiple environments. Common bean (*Phaseolus vulgaris* L.) is an exceptional example of a widespread species original from America. Molecular data of wild germplasm identified two major gene pools, the Andean from Argentina to Colombia, and the Mesoamerican from Colombia to Mexico [25, 26]. A third smaller pool of wilds is also distinctive in a narrow area between Peru-Equador [27]. Microsatellites markers were broadly used to screen the genetic structure of wild and domesticated accessions of common bean (*Phaseolus vulgaris* L.), distinguishing from the broadest Andean and Mesoamerican gene pools to further subdivision within each one of them [25]. SNP markers from single fragments sequenced by Sanger also allowed an accurate distinction between Andean and Mesoamerican accessions, as well as their subdivisions [28]. The recognition of a parallel domestication event in each of the two major pools was also possible based on the detection of SNP markers in specific genomic regions of Andean and Mesoamerican genotypes [29]. Recent landscape genomics approaches enabled a more detailed description of the major events that determined the range expansion of *P. vulgaris* in America and how they were accompanied by environmental changes [26]. The climatic variability was also associated with differential drought adaptation and specific SNP markers were statistically related to root and shoot traits varying in a Mesoamerican panel of genotypes originated from regions with distinct precipitation regimes throughout the year [30].

Another application of landscape genomics concerns with the understanding of range expansion and ecological dominance of insect pests. The first step toward that is to know the population structure, gene flow and how natural selection is affecting adaptation. Zucchi et al. [31] described and addressed such problem by examining the population structure of *Piezodous guildiniis*, a soybean pest, in the United States and Brazil. A GBS-based set of SNP markers revealed genetic structure according to their geographic environment of origin. About 10% of loci were under positive selection, and their annotation revealed genes involved in genome

**111**

process.

*Landscape Genetics: From Classic Molecular Markers to Genomics*

reorganization, neuropeptides, and energy mobilization [31]. Addressing such problem is to assist future endeavors at managing pest spreading in cultivated crops. Another equally important questions addressed by landscape genomics are the consequences of climate change and human intervention to natural populations of wilds plants and animals. *Euterpe edulis* Martius is a palm species native to the Atlantic Rain Forest in Brazil, known as heart-of-palm [32]. The species is the list of endangered species to extinction [33]. Several studies have addressed the genetic diversity and structure of natural populations of this palm (for a compilation see [34]). Soares et al. [35] studied the genetic diversity and structure of remnant fragments of *E. edulis* in Bahia state and related the data to landscape metrics such as composition and configuration and local variables including the logging activity as human disturbance variable. No evidence of spatial genetic structure was detected, but distinct genetic clusters could be identified, suggesting a reduction in gene flow between the fragments of this study [35]. Natural populations located in other regions of Brazil, such as in Sao Paulo state, revealed to have high genetic diversity, as shown from microsatellite markers. Adjacent populations that have been generated though germplasm collection for management and cultivation showed similar genetic diversity. Those genetic materials could be used for recovering overex-

Landscape genetics studies with wild animals have been focused in recognizing their patterns of moving across their habitats. On terrestrial lands, landscape genetics of animals has particular features in comparison to aquatic environments or even to terrestrial plants. Landscape patterns interfere with organism behavior, thereby

Genomic technologies have also enabled studies to uncover historical human migrations and the genetic structure and diversity of human populations. For example, a genome-wide study of Malaysian ethnic groups using a SNP array revealed that humans from the peninsular area of Malaysia had higher genetic diversity, which the authors associated with a contact zone for recent human migrations in the Asian continent [38]. Such an example suggests the association between the genetic structure of human populations with geographic variables. In fact, Peter et al. [39] show that genetic differentiation generally tends to increase over higher geographic distances; however, distortions in those patterns also frequently occur.

Landscape genetics also has been employed in epidemiological studies of human diseases. Statistical methods can be used in the identification of hotspot areas of disease movement [40]. This will have important implications in designing strategies for spread containment. One challenge, however, has been the application of landscape genetics methods in vector-borne diseases, which was reviewed by Hemming-Schroeder [40]. A few studies have been dedicated to such goal with human diseases. One interesting example is the correlation found between the genetic structure of *Aedes mcintochi*, a major vector for Rift Valley fever in Kenya,

In 2020, one of the major global health issues concerns the new COVID-19. Sequencing technologies coupled with landscape genomics approaches have the potential to identify dispersal patterns of the virus in order to contain its spreading. Landscape genetic approaches have the power of assisting the decision-making

affecting mating and dispersal and reflecting on population processes [37].

**6. Landscape genetics and human populations and diseases**

The human population structure, then, seems to be quite dynamic.

and mean precipitation values [41].

*DOI: http://dx.doi.org/10.5772/intechopen.92022*

ploited populations [36].

#### *Landscape Genetics: From Classic Molecular Markers to Genomics DOI: http://dx.doi.org/10.5772/intechopen.92022*

*Methods in Molecular Medicine*

or inexistent.

sequenced or resequenced in populations.

**5. Applications of landscape genetics**

landscape data and (ii) location-specific landscape data, as reviewed by Balkenhol et al. [22]. The development of methods in landscape genomics, however, expanded the range of tests for detecting loci under selection using genome scans, approaches for candidate gene discovery, QTL mapping and GWAS. Genome scans use two methods for detecting loci under selection, the differentiation outlier methods and the genetic-environmental association test, as reviewed by Storfer et al. [10]. Novel methods are continuously being developed, as more genomes are becoming

Several applications of landscape genetics or genomics can be described. We briefly account for case studies in plant and animal systems within this section. Moreover, a few examples of studies applied to humans are also given. In general, landscape genetics or genomics studies have provided association among geographic, abiotic, and biotic factors and genetic data provided by the screening of molecular markers in populations of diverse organisms. It has increased our power to detail inferences of movement and gene flow and potential adaptation to the landscape populations occur. However, studies for several organisms are still scarce

Cultivated crops such as maize, soybean, rice, and common bean were domesticated from wild progenitors which reflect their current adaptation to distinct environments. Landscape genomics studies have enabled a deeper understanding of processes shaping their distribution across multiple environments. Common bean (*Phaseolus vulgaris* L.) is an exceptional example of a widespread species original from America. Molecular data of wild germplasm identified two major gene pools, the Andean from Argentina to Colombia, and the Mesoamerican from Colombia to Mexico [25, 26]. A third smaller pool of wilds is also distinctive in a narrow area between Peru-Equador [27]. Microsatellites markers were broadly used to screen the genetic structure of wild and domesticated accessions of common bean (*Phaseolus vulgaris* L.), distinguishing from the broadest Andean and Mesoamerican gene pools to further subdivision within each one of them [25]. SNP markers from single fragments sequenced by Sanger also allowed an accurate distinction between Andean and Mesoamerican accessions, as well as their subdivisions [28]. The recognition of a parallel domestication event in each of the two major pools was also possible based on the detection of SNP markers in specific genomic regions of Andean and Mesoamerican genotypes [29]. Recent landscape genomics approaches enabled a more detailed description of the major events that determined the range expansion of *P. vulgaris* in America and how they were accompanied by environmental changes [26]. The climatic variability was also associated with differential drought adaptation and specific SNP markers were statistically related to root and shoot traits varying in a Mesoamerican panel of genotypes originated from regions

with distinct precipitation regimes throughout the year [30].

Another application of landscape genomics concerns with the understanding of range expansion and ecological dominance of insect pests. The first step toward that is to know the population structure, gene flow and how natural selection is affecting adaptation. Zucchi et al. [31] described and addressed such problem by examining the population structure of *Piezodous guildiniis*, a soybean pest, in the United States and Brazil. A GBS-based set of SNP markers revealed genetic structure according to their geographic environment of origin. About 10% of loci were under positive selection, and their annotation revealed genes involved in genome

**110**

reorganization, neuropeptides, and energy mobilization [31]. Addressing such problem is to assist future endeavors at managing pest spreading in cultivated crops.

Another equally important questions addressed by landscape genomics are the consequences of climate change and human intervention to natural populations of wilds plants and animals. *Euterpe edulis* Martius is a palm species native to the Atlantic Rain Forest in Brazil, known as heart-of-palm [32]. The species is the list of endangered species to extinction [33]. Several studies have addressed the genetic diversity and structure of natural populations of this palm (for a compilation see [34]). Soares et al. [35] studied the genetic diversity and structure of remnant fragments of *E. edulis* in Bahia state and related the data to landscape metrics such as composition and configuration and local variables including the logging activity as human disturbance variable. No evidence of spatial genetic structure was detected, but distinct genetic clusters could be identified, suggesting a reduction in gene flow between the fragments of this study [35]. Natural populations located in other regions of Brazil, such as in Sao Paulo state, revealed to have high genetic diversity, as shown from microsatellite markers. Adjacent populations that have been generated though germplasm collection for management and cultivation showed similar genetic diversity. Those genetic materials could be used for recovering overexploited populations [36].

Landscape genetics studies with wild animals have been focused in recognizing their patterns of moving across their habitats. On terrestrial lands, landscape genetics of animals has particular features in comparison to aquatic environments or even to terrestrial plants. Landscape patterns interfere with organism behavior, thereby affecting mating and dispersal and reflecting on population processes [37].

### **6. Landscape genetics and human populations and diseases**

Genomic technologies have also enabled studies to uncover historical human migrations and the genetic structure and diversity of human populations. For example, a genome-wide study of Malaysian ethnic groups using a SNP array revealed that humans from the peninsular area of Malaysia had higher genetic diversity, which the authors associated with a contact zone for recent human migrations in the Asian continent [38]. Such an example suggests the association between the genetic structure of human populations with geographic variables. In fact, Peter et al. [39] show that genetic differentiation generally tends to increase over higher geographic distances; however, distortions in those patterns also frequently occur. The human population structure, then, seems to be quite dynamic.

Landscape genetics also has been employed in epidemiological studies of human diseases. Statistical methods can be used in the identification of hotspot areas of disease movement [40]. This will have important implications in designing strategies for spread containment. One challenge, however, has been the application of landscape genetics methods in vector-borne diseases, which was reviewed by Hemming-Schroeder [40]. A few studies have been dedicated to such goal with human diseases. One interesting example is the correlation found between the genetic structure of *Aedes mcintochi*, a major vector for Rift Valley fever in Kenya, and mean precipitation values [41].

In 2020, one of the major global health issues concerns the new COVID-19. Sequencing technologies coupled with landscape genomics approaches have the potential to identify dispersal patterns of the virus in order to contain its spreading. Landscape genetic approaches have the power of assisting the decision-making process.
