**7. Glossary of terms**

**Alleles:** Alternative forms (two or more) of a gene, located at a specific position on a chromosome.

**Annotation:** Identification of the genomic position of intron-exon boundaries, regulatory sequences, repeats, gene names and protein products within a DNA sequence.

**Bioinformatics:** refers to the application of computer science and information technology to developing computationally intensive techniques to increase the understanding of biological processes.

**Complex trait:** Trait influenced by more than one genetic and/or environmental factor.

**cDNA:** Complementary DNA, which is DNA synthesized from a mature mRNA template by reverse transcription.

**DNA sequence:** Linear representation of a DNA strand by the presence of the four nucleotide bases (A, T, C, G).

**Ensembl Genome Browser:** A joint project between EMBL-EBI and the Wellcome Trust Sanger Institute to develop a software system that produces and maintains automatic annotation on selected eukaryotic genomes (www. ensembl.org).

**EST:** Expressed sequence tag, which is a short sequence obtained from one shot sequencing of cDNA, corresponding to a fragment (~500 bp) of an expressed gene.

**Gene:** Name given to stretches of DNA that code for a specific protein for a specific characteristic or function. It represents the heredity unit in living organisms.

**Gene Expression Omibus (GEO):** A database repository of high-throughput gene expression data and hybridization arrays, chips and microarrays.

**Generation time:** The average interval between the birth of an individual and the birth of its offspring.

**Genome:** Is the entirety of an organism's hereditary information. It is encoded either in DNA or RNA (some viruses).

**Genomics:** Discipline in genetics concerning the study of the genomes of organisms.

**Genotype:** The total set of alleles possessed by an organism that determines a specific characteristic or trait. It comprises the entire complex of genes inherited from both parents.

**Heritability:** Proportion of phenotypic variation in a population that is due to genetic variation between individuals.

**High-throughput:** Processes usually performed via increased levels of automation and robotics. In sequencing: involves the application of rapid sequencing technology at the scale of whole genomes.

**Locus:** (pl. loci) Specific location of a gene or DNA sequence on a chromosome.

**Marker-assisted selection (MAS):** The use of DNA markers linked to traits of interest to assist in the selection of individuals for breeding purposes.

**Molecular marker:** Specific fragments of DNA that can be identified within the whole genome. These can be associated with the position of a particular gene or the inheritance of a particular characteristic.

**Molecular pathway:** Series of molecular processes that are connected by their intermediates such that the products of one process may trigger or participate in the other.

**Morphology:** The visible form and structure of living organisms.

**mRNA:** Molecule of RNA encoding a specific protein product. mRNA is transcribed from a DNA template in the cell nucleus then moves to the cytoplasm where it is translated by the ribosomes.

**NCBI:** The National Center for Biotechnology Information, which houses the world's biggest sequence datasets in GenBank and an index of biomedical research articles in PubMed.

**Non-coding:** Components of an organism's genome (DNA sequences) that do not encode for protein sequences.

**Phenotype:** Observable physical or biochemical characteristics of an organism, as determined by both genetic makeup and environmental influences.

**Polymerase Chain Reaction (PCR):** Scientific technique designed to amplify a DNA fragment across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence

**Quantitative trait locus (QTL):** Stretch of DNA containing or linked to genes that underlie a quantitative trait of interest.

**RAD:** Restriction site Associated DNA. RAD tags are the sequences that immediately flank a restriction enzyme site. RAD markers refer to sequence variations identified within RAD tags that can be used as molecular markers (e.g., SNPs).

**Reference genome:** Nucleic acid sequence database, assembled into a whole genome and representative of a species' genetic code. Typically used as a guide on which new genomes are built.

**Restriction enzyme:** An enzyme that cuts DNA at a specific recognition site/sequence, referred to as a restriction site.

**RNA:** Ribonucleic acid. Single stranded molecule transcribed from one strand of DNA. Unlike DNA, the sugar in RNA is ribose and one of the four bases, T (thymine) is replaced by U (uracil). There are several types of RNA mainly involved in protein synthesis (mRNA, tRNA, rRNA).

**Sequence assembly:** Refers to aligning and merging sequence fragments into a longer sequence.

**Synteny:** Physical co-localization of genetic loci on the same chromosome within an individual or species.

**Transcription:** Process involved in the synthesis of mRNA from a DNA template, catalyzed by RNA polymerase.

**Translation:** Process in which the messenger RNA (mRNA) produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein.

**Whole genome sequence:** The complete DNA sequence of the genome of an organism.
