4. Principles and challenges in cognitive testing

#### 4.1. Introduction

Cognitive tests demonstrate cognitive performance. They should be considered an adjunct tool in the assessment and management of an underlying neurodegenerative condition. All tests are based on paradigms on how we learn information. In order to detect deficits, tests are designed to push people until they make errors. A low score does not diagnose dementia. A high score does not exclude dementia. A single score cannot be considered in isolation.

Confidence that cognitive tests accurately reflect subject cognition is important. Tests require a wide response distribution and evenness of scale to enable sensitive detection of clinical changes and assessment of the degree of deficits. Sensitivity to cognitive disease and change over time, enables tracking of disease progression, evaluation of treatment effectiveness, and maintains focus on the symptoms and disease of interest. Measures should be able to capture deficits, have low noise, and relate to biological markers. Characterising early presenters based on neuropsychological test performance should be detailed enough to make sense, but not overly precise—otherwise it can paradoxically complicate assessment and follow-up.

Data is currently lacking in how well tests track with amyloid. Longitudinal examination of different trajectories of cognitive decline over time can validate specific biomarker profiles, help to elucidate underlying mechanisms of disease, and predict clinical outcome. The challenge in observational studies is to be selective yet inclusive of tests that can be operationalised in all participants, and sensitive enough to track changes [7]. Regulatory agencies require that measures are well experienced and understood [35]. Application of technology can enable easier tailoring of cognitive and functioning assessment protocols to meet the needs of unique populations or settings, and extend the possibility of administering assessments and delivering interventions remotely [36].

general, so neuropsychological testing can be quite noisy. Non-memory tests are generally less predictive of dementia in those with more education. Neuropsychological screening tools like the mini-mental state examination are cultural and language biased even with the use of an interpreter [45]. Efficacy can be limited by ceiling effects and variability in subject performance over time. Cognitive testing may be more subjective than biomarker measurements as results can be influenced by the behaviour of persons conducting or taking the test, fatigue of the patient, and time of the day. Cognitive testing is susceptible to attention deficits, so delirium,

Challenges in Dementia Studies

117

http://dx.doi.org/10.5772/intechopen.72866

Cognitive decline in ageing and dementia follow a non-linear trajectory [46]. However, during short time intervals of only 2–3 years, changes may appear to be linear. Acceleration over time (i.e. the non-linearity) is usually clearly seen with data points 7 years and beyond. Cognitive scales may be sensitive to early changes but do not work well later, or sensitive to changes in the later stage and do not work well earlier. While considerable work needs to be conducted to establish which tasks are sensitive at particular stages of the preclinical period, the rule of thumb is that the earlier the test, is the less precise it is. Still there is an increasing interest in developing tools to detect the earliest manifestations of cognitive decline in order to prescribe remediation strategies or measure effectiveness of treatment approaches. The more sensitive

Composite testing smooths individual scores to better average the overall score. A simple approach by deriving composite scores from combining different tests can enable more equality of different tests, reduce noise and facilitate a statistically more simple analysis of relationships between cognitive domains like memory and imaging data. This would simplify studies

The best neuropsychological test batteries are not necessarily the longest or the most comprehensive. A certain degree of precision is required, but there may be no need to be overly precise. People do dread having their neuropsychological deficits pointed out, and it can be emotionally difficult for them to sit through a battery of tests. The size of a battery matters not as much as the quality of the precision of the battery in detecting degrees of cognitive deficits. One way to validate such neuropsychiatric composite scores is to see if similarity of results can be obtained from different cohorts. Memory composite scores like the ADNI-Mem have been found to be comparable with other memory measures in the prediction of cognitive change over time, and could also differentiate changes over time. Such composite scores were associ-

Serial assessments enable better cognitive evaluation than cross-sectional assessment. For example, the trajectory pattern of serial scores helps to differentiate between dementia and delirium. While serial assessments are better than cross-sectional assessments, they become subjected to

depression, and distress can result in scores in the dementia range.

4.4. Non-linear decline trajectory

4.5. Composite scoring

the measure, the less numbers are needed in a trial.

that make comparison between groups.

ated with neuroimaging parameters [47].

4.6. Serial scoring and practice effects

Cognitive tests cannot extract specific unimodal factors alone. They all extract broad based processes. No neuropsychological test is orthogonal because testing is affected by many processes, like allocation of attention resources, language and executive function. All tests should be empirically derived from actual patients, then refined to improve sensitivity, reduce variability, and simplify use. When developing a test, having some overlap between measures to ensure concurrent validity is worthwhile, but there should not be too much correlation either. Some tests are more highly predictive than others. For example the semantic interference test was highly predictive of decline from MCI to dementia over an average 30 month period compared with standard memory tests such as memory for passage and visual reproduction [37].

#### 4.2. The importance of pattern recognition

Cognitive testing is not specific for a neuropathology. External manifestations of results are due to a combination of neuropathology and cognitive reserve. Patterns of deficits on different sub-scores are important for the assessment of underlying pathology, so better testing approaches should distinguish between memory and non-memory cognitive domains. The possibility of a neurodegenerative disease is raised when there is a typical cerebral pattern of spread [38–41]. This possibility is reduced when there is no overlap between deficit patterns on sub-scores and neurodegenerative subtypes. For example, since living items is the most impaired semantic category in AD, relatively poorer scores in this category compared with others raises the odds of AD. The pattern of scores should be interpreted in context to the patient's situation, e.g. poor education, culturally and linguistically diverse background, comorbidities, conditions of the testing environment, hearing aids, glasses, tester, etc.

#### 4.3. Difficulties with cognitive testing

Cognitive measures may not be able to detect subtle changes or effects of underlying neuropathology due to cognitive reserve, ceiling effect, or floor effect. Cognitive measures should be sufficiently sensitive and specific to detect the effects being tested for, while being clinically meaningful at the same time. Delayed logical memory or face-name tests are examples of tests that can well detect amyloid deposition in the brain [42, 43].

Cognition is a heterogeneous construct, so while more sensitive and precise measures may emerge, there will be limits to applying them across different cohorts. Reference norms differ for different patient groups. For example, IQ-adjusted norms are used to predict progressive cognitive decline in highly intelligent older individuals [44]. People who have individualised strategies for learning (that is, those with high cognitive reserve) will do much better in general, so neuropsychological testing can be quite noisy. Non-memory tests are generally less predictive of dementia in those with more education. Neuropsychological screening tools like the mini-mental state examination are cultural and language biased even with the use of an interpreter [45]. Efficacy can be limited by ceiling effects and variability in subject performance over time. Cognitive testing may be more subjective than biomarker measurements as results can be influenced by the behaviour of persons conducting or taking the test, fatigue of the patient, and time of the day. Cognitive testing is susceptible to attention deficits, so delirium, depression, and distress can result in scores in the dementia range.

### 4.4. Non-linear decline trajectory

Data is currently lacking in how well tests track with amyloid. Longitudinal examination of different trajectories of cognitive decline over time can validate specific biomarker profiles, help to elucidate underlying mechanisms of disease, and predict clinical outcome. The challenge in observational studies is to be selective yet inclusive of tests that can be operationalised in all participants, and sensitive enough to track changes [7]. Regulatory agencies require that measures are well experienced and understood [35]. Application of technology can enable easier tailoring of cognitive and functioning assessment protocols to meet the needs of unique populations or settings, and extend the possibility of administering assessments and delivering

Cognitive tests cannot extract specific unimodal factors alone. They all extract broad based processes. No neuropsychological test is orthogonal because testing is affected by many processes, like allocation of attention resources, language and executive function. All tests should be empirically derived from actual patients, then refined to improve sensitivity, reduce variability, and simplify use. When developing a test, having some overlap between measures to ensure concurrent validity is worthwhile, but there should not be too much correlation either. Some tests are more highly predictive than others. For example the semantic interference test was highly predictive of decline from MCI to dementia over an average 30 month period compared with

Cognitive testing is not specific for a neuropathology. External manifestations of results are due to a combination of neuropathology and cognitive reserve. Patterns of deficits on different sub-scores are important for the assessment of underlying pathology, so better testing approaches should distinguish between memory and non-memory cognitive domains. The possibility of a neurodegenerative disease is raised when there is a typical cerebral pattern of spread [38–41]. This possibility is reduced when there is no overlap between deficit patterns on sub-scores and neurodegenerative subtypes. For example, since living items is the most impaired semantic category in AD, relatively poorer scores in this category compared with others raises the odds of AD. The pattern of scores should be interpreted in context to the patient's situation, e.g. poor education, culturally and linguistically diverse background, co-

standard memory tests such as memory for passage and visual reproduction [37].

morbidities, conditions of the testing environment, hearing aids, glasses, tester, etc.

Cognitive measures may not be able to detect subtle changes or effects of underlying neuropathology due to cognitive reserve, ceiling effect, or floor effect. Cognitive measures should be sufficiently sensitive and specific to detect the effects being tested for, while being clinically meaningful at the same time. Delayed logical memory or face-name tests are examples of tests

Cognition is a heterogeneous construct, so while more sensitive and precise measures may emerge, there will be limits to applying them across different cohorts. Reference norms differ for different patient groups. For example, IQ-adjusted norms are used to predict progressive cognitive decline in highly intelligent older individuals [44]. People who have individualised strategies for learning (that is, those with high cognitive reserve) will do much better in

interventions remotely [36].

116 Alzheimer's Disease - The 21st Century Challenge

4.2. The importance of pattern recognition

4.3. Difficulties with cognitive testing

that can well detect amyloid deposition in the brain [42, 43].

Cognitive decline in ageing and dementia follow a non-linear trajectory [46]. However, during short time intervals of only 2–3 years, changes may appear to be linear. Acceleration over time (i.e. the non-linearity) is usually clearly seen with data points 7 years and beyond. Cognitive scales may be sensitive to early changes but do not work well later, or sensitive to changes in the later stage and do not work well earlier. While considerable work needs to be conducted to establish which tasks are sensitive at particular stages of the preclinical period, the rule of thumb is that the earlier the test, is the less precise it is. Still there is an increasing interest in developing tools to detect the earliest manifestations of cognitive decline in order to prescribe remediation strategies or measure effectiveness of treatment approaches. The more sensitive the measure, the less numbers are needed in a trial.

#### 4.5. Composite scoring

Composite testing smooths individual scores to better average the overall score. A simple approach by deriving composite scores from combining different tests can enable more equality of different tests, reduce noise and facilitate a statistically more simple analysis of relationships between cognitive domains like memory and imaging data. This would simplify studies that make comparison between groups.

The best neuropsychological test batteries are not necessarily the longest or the most comprehensive. A certain degree of precision is required, but there may be no need to be overly precise. People do dread having their neuropsychological deficits pointed out, and it can be emotionally difficult for them to sit through a battery of tests. The size of a battery matters not as much as the quality of the precision of the battery in detecting degrees of cognitive deficits.

One way to validate such neuropsychiatric composite scores is to see if similarity of results can be obtained from different cohorts. Memory composite scores like the ADNI-Mem have been found to be comparable with other memory measures in the prediction of cognitive change over time, and could also differentiate changes over time. Such composite scores were associated with neuroimaging parameters [47].

#### 4.6. Serial scoring and practice effects

Serial assessments enable better cognitive evaluation than cross-sectional assessment. For example, the trajectory pattern of serial scores helps to differentiate between dementia and delirium. While serial assessments are better than cross-sectional assessments, they become subjected to practice effects. Practice or re-test effects occur in non-demented adults [48]. They involve episodic memory in learning test content, procedural non-declarative learning for familiarisation with task procedures, and anxiety reduction by desensitisation. Practice effects are not necessarily a nuisance as they themselves comprise a test. For example, one study showed that the loss of short-term practice effects portends a worse prognosis after 1 year in patients with MCI [49]. When the Cogstate was repeated four times a day, having attenuated practice effect in nondemented participants detects MCI [50, 51].

The fundamental consideration with any assessment approach in dementia, whether with clinical bedside tests or with biomarkers is how precise a measure is in determining what it is meant to be detecting. To be used as surrogates for clinical measures, biomarkers need to be validated as reflecting clinical and/or pathological disease processes, taking into account the phase of disease where they have a high degree of specificity and sensitivity [52, 53]. Standardising procedures will reduce measurement errors in clinical trials. They should apply similarly to everyone no matter what race, language or culture they come from. Ideally, the biomarkers and clinical markers must be strongly associated, yet independent of each other, in order to be used as recruitment criteria and as outcome measures, yet avoiding circularity. However validating the relationship between biomarker change and cognitive outcome is an imperfect science. Considerable challenges remain in establishing the relationship between biological and cognitive measures throughout the chronology of the preclinical phase of AD. A measurable biomarker needs to be operable clinically, have significant clinical implications if results are positive, and have clinical utility in terms of improving confidence in diagnosing, prognosticating or guiding treatment options. Unlike cognitive assessments, biomarkers offer more objective results and are considered complimentary to memory testing. They are highly valued for their ability to detect underlying structures or neuropathology in vivo. However the evaluation of biomarkers is an expensive endeavour, and cannot be carried out without

Challenges in Dementia Studies

119

http://dx.doi.org/10.5772/intechopen.72866

The reproducibility of biomarker results can be affected by many factors. For example, discrepancy of biomarkers and cognitive tests can happen because of a plateau of biomarkers prior to cognitive change. Individual biomarkers of amyloid PET, MRI, FDG PET, and CSF in the ADNI cohort vary in their rate of change during disease progression, such that they fit better in sigmoidal models than linear models [54]. An ideal biomarker should have a sensitivity, specificity, as well as positive and negative predictive values above 80% for whatever is it supposed to be testing for [55, 56]. Biomarkers are expensive. Risks, benefits and costs have

The challenges in operationalising biomarkers for clinical practice are: standardization of techniques; harmonising practices between settings; and developing infrastructure for community access to access them. In applying biomarkers in the clinical setting, we need to consider the noise and variability factors, whether these are going to present a critical issue when it comes to trying to apply this in cross-sectional or longitudinal evaluation. Different biomarkers provide different levels of certainty, are sensitive and specific at different disease stages and in different disease subtypes. Cross-sectional data of single time-point measures have less predictability than multiple measurements for seeing progression and outcomes in longitudinal data, which then in turn limits on-going participation. For most biomarkers, biomarker progressions are more associated with cognitive decline than baseline values [57]. This suggests that clinical trials which require recruiting at-risk subjects could be improved by using progression rather than baseline values in biomarkers to enrich the study subjects. Further studies are warranted to estimate the incremental effectiveness of improving clinical

collaboration between pharmaceuticals and public institutions.

trial statistical power by using biomarker progression criteria.

to be discussed with the patient.

5.2. Operationalisation challenges
