**1. Introduction**

N-of-1 trials or single-patient trials focus on one patient and their main goal is to evaluate whether the treatment is effective for the individual. The main motivation for such trials is that each patient serves as his or her own control, and another is that each patient is different from another and there is no average patient. This is in contrast to conventional clinical trials where the aim is to optimize treatment for the average patients. Consequently, their aims are different, and conventional clinical trials are not appropriate for N-of-1 trials. These trials may appear new but they are not, except that they probably were given short shrift and not well publicized. In the last decade or so, there is increasing interest in N-of-1 trials. Duan et al. [1] raised awareness among clinicians and epidemiologists that N-of-1 trials are potentially useful for informing personalized treatment decisions for patients with chronic conditions. A monograph on this topic in healthcare is [2], where their applications to behavioral sciences and many medical settings are discussed, including the economics, ethics, statistical analysis of running such trials, and how to report results to professional audiences. Scuffham et al. [3] showed how N-of-1 trials can improve patient

management and save costs and Kravitz and Duan [4] provided a user's handbook on implementing such trials. A systematic review of the use of N-of-1 trials in the medical literature is given in [5]. There are many ways to analyze and compare results from Nof-1 trials; see for example, [6].

Interestingly, and perhaps, not unexpectedly, results from N-of-1 trials can be combined to generate group mean effects, as [7, 8] demonstrated how it can be done using systematic reviews and meta-analyses on the effects of amphetamine and methylphenidate for attention-deficit hyperactivity disorder. Li et al. [9] provided a systematic review of quality N-of-1 trials published between 1985 and 2013 in the medical literature based on the CONSORT extension for N-of-1 Trials (CENT) where they examined factors that influence reporting quality in these trials. In palliative care, Senior et al. [10] designed a N-of-1 trial of a psychostimulant, methylphenidate hydrochloride (MPH) (5 mg bd), compared to placebo as a treatment for fatigue, with a population estimate of the benefit by the aggregation of multiple SPTs. Forty patients who had advanced cancer was enrolled through specialist palliative care services in Australia.

Multi-crossover single-patient trials are often employed when the focus is to make the best possible treatment decision for an individual patient [2, 11, 12]. From a clinician's perspective, having clear evidence of the value of one treatment over another (or no treatment) is more useful than knowing the average response. The average response gives the clinician the probability that a treatment will be effective, whereas N-of-1 trials give more certainty about whether the treatment for a particular patient will work or not.

In what is to follow, we assume that there are predetermined p periods in the crossover study, and in each period only one of the treatments is administrated. The same treatment may be used in other periods. We first discuss the case when there are two treatments and two periods for N-of-1 trials before extending them to aggregated N-of-1 trials to evaluate the effects of treatments for the average patients. Treatment groups are generically denoted by *A*, *B*, *C*, and so on.

Many researchers studied the optimality of crossover designs [13–18]. Optimal designs have been constructed under a variety of statistical models to provide the most accurate inference of the treatment effects. It is known that the two-treatment design *AB*, *AA* and their duals *BA*, *BB* are found to be universally optimal for twoperiod experiments, with the duality defined as the sequence that switches *A* and *B* with the same effect. Similarly, it is known that the two-sequence design *ABB* and its dual *BAA* and the four-sequence design *ABBA*, *AABB* and their duals *BAAB*, *BBAA* are optimal for three- and four-period experiments, respectively [19] and [20].

A direct application of this two-treatment optimal design results from the literature with *A* replaced as *AB* and *B* as *BA* would suggest that optimal N-of-1 trials can use the four-sequence design with *ABBA*, *ABAB* or their duals for two within-patient comparisons. Similarly, the two-sequence design with *ABBABA* or its dual may be optimal for three within-patient comparisons, and the four-sequence design with *ABBABAAB*, *ABABBABA* or any of their duals may be optimal for four within-patient comparisons.

However, design issues are not always as straightforward to address. For example, Carriere and Li [21] showed that constructing N-of-1 trials for individualized care from sequences in these repeated measurement designs is not always optimal for estimating individual-based treatment effects. Likewise, Guyatt et al. [22] showed that aggregating a series of N-of-1 trials that are optimal for individual patients can also provide an optimal estimate of the treatment effects for the average patient. For example, in a multi-clinic setting in three *AB* pair six-period N-of-1 studies, all eight possible sequences (26*<sup>=</sup>*<sup>2</sup> <sup>¼</sup> 8) have been used, i.e., *ABABAB*, *ABABBA*, *ABBAAB*, *ABBABA* and their duals to estimate both individual-based and average treatment effects. However, we show how these do not lead to optimal aggregated N-of-1 trials for estimating the treatment effects for the average patient.
