**2. Predicting telomere length from genotyping datasets**

#### **2.1 Genome-wide association studies of telomere length**

Telomere length variation between individuals can be explained at least in part by genetics. Telomeropathies are Mendelian diseases characterized by impaired telomere maintenance and caused by defects in genes involved in telomere maintenance [9]. Genome-Wide Association Studies (GWAS) of leukocyte telomere length have consistently identified genetic loci within genes with key roles in telomere length regulation. These studies have shed light on the connection between genetic variants and telomere length, allowing for the prediction of telomere length using an individual's genetic information.

In 2021, Codd et al. reported the largest GWAS on telomere length to date, using genetic data from over 472,000 participants in the UK Biobank [10]. The study identified 197 independent genetic variants associated with leukocyte telomere length at 138 genomic loci, with 108 being newly discovered. Genes involved in regulating telomeres were found in 44 loci, including those that encode components of the Shelterin and CTC1-STN1-TEN1 (CST) complexes. The newly discovered loci also included genes involved in the alternative lengthening of telomeres (ALT) pathway and factors that modify key telomere proteins post-translationally. Additionally, genes that regulate telomerase such as TERC and TERT were reported. This study further established a method for predicting telomere length from the identified loci and revealed its relationship to age-related disease outcomes involving multiple biological traits and chronic pathologies.

Also in 2021, Chang et al. reported the largest GWAS of leukocyte telomere length in an Asian population, among 25,533 Chinese Singaporean individuals [11]. The study identified three variants in or near the POT1, TERF1 and STN1 genes that were associated with telomere length and specific to East Asians. Additionally, the authors reported a significantly increased risk of incident lung cancer with increased genetic telomere length. Upon further analysis stratifying on subtypes, this association was found only in lung adenocarcinoma.

#### *Current Technologies for Measuring or Predicting Telomere Length from Genomic Datasets DOI: http://dx.doi.org/10.5772/intechopen.113048*

These studies provide valuable insights into the genetic determinants of telomere length and its association with various health outcomes. Further research is needed to fully understand the mechanisms underlying these associations and to develop potential interventions to improve health outcomes.

#### **2.2 Predicting telomere length using Mendelian randomization**

In this section, we discuss the emerging use of Mendelian randomization (MR) to predict telomere length from germline genotyping data. MR is an approach that uses genetic variants to infer causal relationships between genotype and phenotype. It is based on Mendel's laws of inheritance and causal inference theory, using instrumental variables to account for unmeasured confounding [12]. MR has recently gained popularity, with a growing number of methodologies and applied studies being published, enabled by an increasing availability of genetic data [13].

There are three basic assumptions of MR in telomere length studies: (1) the relevance assumption - the genetic variant(s) used as instruments must be strongly associated with telomere length; (2) the independence assumption - the genetic variant(s) should be independent of the outcome, given telomere length and all confounders; and (3) the exclusion assumption - the genetic variant(s) should only influence the outcome through their effect on telomere length. The first assumption is directly related to our topic of predicting telomere length from germline variants.

To identify germline variants that can be used as instrumental variables for telomere length, a set of screening and filtering criteria can be applied to available genetic datasets. In this section, we outline our approach for selecting genetic instruments for telomere length in European populations. We have chosen to focus on European populations because they currently have the largest available databases among all populations in published genetic datasets.

In January 2023, we conducted a search of telomere length-related studies on the GWAS Catalog (https://www.ebi.ac.uk/gwas/efotraits/EFO\_0004505), and identified 303 SNPs from 20 published GWAS studies. A detailed flowchart has been provided to illustrate each step of our screening and filtering process (**Figure 1**). We then systematically screened these studies using the following exclusion criteria: (1) non-European ethnicity; (2) telomere length measured in patient populations; (3) telomere length measured in cell types other than leukocytes or PBMCs; and (4) inconsistent units of telomere length measurement. After applying these criteria, 9 out of the 20 studies were selected for full-text screening.

During the full-text screening process, we eliminated 4 out of the 9 studies because they were conducted earlier with smaller sample sizes and shared the same populations as the included studies. We also expanded our list of potential genetic instrument candidates by adding more SNPs from the study results and supplementary lists along with the published manuscript. This resulted in a candidate list of 143 SNPs from 5 studies [14–18], which was reduced to a final list of 138 SNPs after meta-analyzing for duplicated SNPs.

From this final candidate list, we applied SNP-specific filtering criteria to identify strong SNPs for telomere length: (1) genome-wide significance with p-value <5×10−8, as defined by the relevance assumption; (2) minor allele frequency (MAF) > 0.01, to avoid potential statistical bias from SNPs with low MAF; and (3) pruning for linkage disequilibrium (LD) at an R<sup>2</sup> coefficient of correlation <0.01, to ensure that the included genetic instruments are independent of each other, and that we have only one representative SNP per region of LD.

#### **Figure 1.**

*Identification of genetic instruments of telomere length from GWAS catalog. In January 2023, the GWAS Catalog was searched for the trait "telomere length" (EFO\_0004505), resulting in 303 SNPs from 20 published studies. After screening the abstracts, seven studies were removed due to non-European populations, one due to being conducted on patients, one due to reporting cell-type specific telomere length, and two due to reporting associations in different units. Of the remaining nine studies, four were removed after full-text screening because they used the same study populations as other studies but had smaller sample sizes. This left five studies with a total of 143 SNPs. After meta-analysis to remove duplicate SNPs and SNP-specific filtering, 30 SNPs had genome-wide significance (P < 5×10−8) and minor allele frequency > 0.01. Of these, 18 were independent (R2 < 0.01) within a 10 Mb region. The final list consisted of 18 independent SNPs as genetic instruments for telomere length, eligible for Mendelian randomization analysis. Abbreviation: SNP, single-nucleotide polymorphism; GWAS, genomewide association study; EU, European; TL, telomere length; QC, quality control; MAF, minor allele frequency. N denotes the sample size for studies, and n denotes the sample size for SNPs.*

### *Current Technologies for Measuring or Predicting Telomere Length from Genomic Datasets DOI: http://dx.doi.org/10.5772/intechopen.113048*

After applying these criteria, we identified 30 SNPs with genome-wide significant p value <5×10−8 and MAF > 0.01. We then performed LD clumping with R2 < 0.01 to identify the top SNP per 10 Mb region, resulting in a total of 18 SNPs as our final genetic instruments for telomere length (**Table 1**).

With our final genetic instruments for predicting telomere length through genetic variants in hand, we can use these SNPs to conduct Mendelian randomization analyses. This approach allows us to infer causal relationships between telomere length and other phenotypes by using the genetic instruments as proxies for telomere length.


*Abbreviation: CHR, chromosome; POS, position; SNPS, single-nucleotide polymorphisms; PVAL, p-value; SE, standard error; REF, reference allele; EFT, effect allele.*

#### **Table 1.**

*Summary statistics of genetic instruments for telomere length.*
