**3.2 Centralized access**

In Yu et. al. (2007) the spectrum broker controls the access of secondary users based on a threshold rule computed by means of an MDP formulation with the objective of minimizing the blocking probability of secondary users. In order to cope with the non-stationarity of traffic conditions, the authors propose a finite horizon MDP instead of an infinite horizon one. The drawback is that the policy cannot be computed off-line, imposing a high computational overhead on the system.

Tang et. al. (2009) study several admission control schemes at a centralized spectrum manager. The objective is to meet the traffic demands of secondary users, increasing spectrum utilization efficiency while assuring a grade of service in terms of blocking probability to primary users. Among the schemes analyzed, the best performing one is based on a constrained Markov decision process (CMDP).

There are several approaches to address this type of problems. One of them is to formulate an MDP where the expected cost is obtained as a linear combination (more precisely a convex combination) of the blocking probability of each class of users. By adjusting the weighting factors we can compute a Pareto front for both blocking probabilities. A Pareto front is defined as the set of values corresponding to several coupled objective functions such that, for every point of the set, one objective cannot be improved without worsening the rest of objective values. In this type of access, the Pareto front allows to fix a blocking probability value for the

Dynamic Spectrum Access in Cognitive Radio: An MDP Approach 101

Incoming traffic is characterized by a classic Poisson model. Licensed users arrive with a rate of *λ<sup>L</sup>* arrivals per unit of time. The arrival rate for unlicensed users is denoted by *λU*. The licensed spectrum managed by the central controller is assumed to be divided into channels (or bands) with equal bandwidth. Each user occupies a single channel. The average holding times for licensed and unlicensed users are given by 1/*μ<sup>L</sup>* and 1/*μ<sup>U</sup>* respectively, where *μ<sup>L</sup>* and *μ<sup>U</sup>* denote the departure rate for each class. Because a Poisson traffic model is considered, both the inter-arrival time and the channel holding times are exponentially distributed random variables for both user classes. The model can be easily extended including more user classes, the probability that a user occupies two or more channels, and so on. Essentially the procedure is the same, but the Markov chain would comprise more states as more features are considered in the model. In this model, the state of the Markov chain is determined by the number of channels *k* occupied by licensed users (LU), and the number of channels *s* occupied by secondary users (SU). Because spectrum is a limited resource, there is a finite number *N* of channels. Figure 1 depicts a diagram of the model and its parameters. Note that we can map all the possible combinations of (*k*,*s*) for 0 ≤ *k* ≤ *N*, 0 ≤ *s* ≤ *N* and

licensed users and know the best possible performance for unlicensed users.

<sup>0</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>N</sup>* (*<sup>N</sup>* <sup>+</sup> <sup>1</sup>)

The number in the right hand side of 4 is the total number of states. Let *NT* denote this number.

Fig. 1. Diagram of the priority based access model. The system has *N* channels that can be occupied by *k* licensed users (LU) and *s* secondary users (SU) such that *k* + *s* ≤ *N*. The total

The model described above consists of a continuous-time Markov chain. In the framework of MDPs we have to define the actions and the costs of these actions. Let *g*(*i*, *u*) denote the

departure rates for each type of users depend on *k* and *s*.

<sup>2</sup> <sup>+</sup> *<sup>N</sup>* <sup>+</sup> 1. (4)

*k* + *s* ≤ *N* to a single integer *i* such that

Centralized access has received less attention than decentralized access in cognitive radio research in general and in the application of MDP in particular. On the one hand, decentralized access constitutes a harder research challenge because each agent only has partial and sometimes unreliable information about the wireless network and the spectrum bands. This leads to the harder POMDP problems. On the other hand, although centralized access relies on a spectrum broker which generally has full information about the system state, the dimension of the problem increases proportionally to the total number of managed channels. Therefore, although the MDP or CMDP problem may be solvable, its dimension imposes a serious computational overhead. This drawback may be overcome with an off-line computation of the policies. However, when traffic conditions are non-stationary this approach is not applicable and approximate solutions based on reinforcement learning strategies should be explored. In this work we focus on the application of MDP to centralized access and how it can be exploited to balance GoS of each class of user.
