**Abstract**

The gold standard for measuring telomere length is technically challenging, which limits its use in large population studies. Numerous bioinformatics tools have recently been developed to estimate telomere length using high-throughput sequencing data. This allows for scaling up telomere length estimates in large datasets. Telomere length depends substantially on genetics, and many genetic studies have looked at this relationship, which provides an opportunity to predict telomere length from genotyping data. However, in part because environment also significantly affects telomere length, the accuracy of telomere length predictions and estimates made from genomic data remains uncertain. In this chapter, we will summarize currently available bioinformatics tools for predicting or measuring telomere length from genomics datasets, and we will discuss each method's limitations and advantages.

**Keywords:** DNA sequencing, genomics, genetics, population sciences, bioinformatics

### **1. Introduction**

Telomere length, as a biomarker of aging and genome integrity, has frequently been used in aging and health research. Telomere length in blood leukocytes or peripheral blood mononuclear cells (PBMCs) reflects the progressive shortening of telomeres in hematopoietic stem and progenitor cells (HSPCs) and correlates with telomere length in other tissues [1]. Among leukocyte types, naïve T-cells and B-cells have the longest telomeres, whereas NK- and memory T-cells have shorter telomeres within the same individual [2]. However, consistent with blood telomere length as a representative marker for telomere attrition in other tissues, telomere length in all cell types shows an inverse correlation with participant age. Many age related diseases are associated with diminished telomere length in leukocytes or PBMCs, including cardiovascular disease [3] and cancer [4]. Accurate and accessible telomere length techniques are increasingly needed in population studies to evaluate the biological aging process and genome maintenance, and may represent a target for the development of strategies for early detection and prevention of disease.

The gold standard measurement of telomere length is by Terminal Restriction Fragment (TRF) analysis. This approach involves frequent-cutting restriction enzymes that do not recognize the telomeric repeats, followed by Southern

hybridization [5]. The entire process of TRF can take up to one week, making it infeasible to apply on a large population scale. Flow cytometry-based fluorescent in situ hybridization (flow-FISH) is frequently used in clinical settings for patient diagnoses of telomere related disease [6]. An advantage of this approach is that it allows for determination of telomere length for specific cell population, but it requires expertise in flow cytometry and still labor intensive. Currently, the monochrome multiplex–quantitative polymerase chain reaction (MM–qPCR) is widely used in population-based studies because it is less labor intensive and relevantly fast in the turnaround time [7]. However, even this approach requires careful attention to details that may affect the results of the analysis and places demands on sample processing that may not be compatible with population studies [8].

Recently, genomics datasets have been produced from many large population cohorts, enabling new approaches to estimating telomere length through bioinformatics tools. These approaches afford an opportunity to estimate telomere length in large-scale population datasets. In this chapter, we summarize currently available tools for predicting or measuring telomere length from genomics datasets, and we will discuss each method's limitations and advantages.
