3. Example of COI research using NHI data

In this section, we present example of estimating cost-of-illness research using NHI data. This example titled "Socioeconomic Cost of Allergies" estimates the socioeconomic costs associated with allergic diseases using NHI data in South Korea [14]. In South Korea, all citizens are compulsory subscribers to the NHI scheme, which is a type of social insurance, and all medical institutions or health professionals are required to submit claim data to the NHI to charge the bill for the medical services. In other words, the NHI has medical information on around 50 million South Koreans. We hope this example will be useful for readers to conduct cost-of-illness studies.

#### 3.1 Study design and cost components

The present example adopts the prevalence-based approach, because it is important to take into account both new and existing patients suffering from allergic diseases during certain periods. As well, this example employs the human capital approach as the value assessment method, because it clearly quantifies losses Methodology of Estimating Socioeconomic Burden of Disease Using National Health Insurance… DOI: http://dx.doi.org/10.5772/intechopen.89895

of productivity due to illness based on patients' income levels. Moreover, it is better suited to ensuring the objectivity of analysis results, as it excludes the researcher's bias. As for perspective, this study adopts the societal perspective and estimates both the direct costs paid by the insurer and patients and society-wide losses of productivity.

As the purpose of this study is to estimate the entire scope of the socioeconomic costs generated by allergic diseases in South Korea, this study estimates both the direct and indirect costs. The direct costs are divided into healthcare and nonhealthcare costs, as in previous studies that adopted the societal perspective. The indirect costs involve losses of productivity. More specifically, the direct healthcare costs include the costs incurred by outpatients and hospitalized patients, encompassing covered costs paid by insurer, copayments made by patients, noncovered costs, and prescription costs. The direct non-healthcare costs involve all expenses associated with visiting medical institutions, whether as outpatients or hospitalized patients, and receiving services for the treatment and management of allergic diseases, including the costs of transportation and caregiving. The indirect costs, or losses of productivity, are estimated by defining the number of hospitalization days as the number of working days lost and the amounts of time spent for outpatient visits, as losses of working time. The losses of future income due to the premature deaths are estimated for patients aged 15–69 (patients outside of this age bracket are excluded, as, in accordance with the law, they constitute the nonworking-age population). Due to the absence of objective data, however, intangible costs are not estimated in this example. The figure below summarizes the components of costs estimated in this study (Table 5).


Table 5.

Components of estimated costs in this example.

#### 3.2 Data source and case definition

To analyze the socioeconomic costs due to allergic diseases in South Korea, this study used the 2014 National Patient Sample (NPS) derived from NHI data that were collected by the HIRA. In South Korea, almost all citizens (98% or higher) are compulsory subscribers to the NHI scheme, which is a type of social insurance, and all medical institutions are required to submit claim data to the HIRA to charge the bill for the medical services they provided when patients visited the medical institution. Consequently, the HIRA has medical information on around 50 million South Koreans. The NPS is the data of patients sampled from the large amount of claim data held by the HIRA, and it is an abridged version of claim data that contain 1-year information regarding medical treatments and prescriptions of the sampled patients. The data contain the information of about 1.4 million patients, who represent a sample of 3% of all patients.

The NHI data is administrative data, and the prevalence rate is influenced by the case definition of disease. In other words, prevalence rates can vary dramatically depending on how the cases are defined. Most of previous studies that made use of administrative data including NHI data generally used primary diagnoses to estimate prevalence rates. This approach, however, carries the risk of either underestimating or overestimating the prevalence rates. This study therefore applied more rigorous criteria in defining prevalence. First, it identified and extracted patients whose primary and secondary diagnoses were indicated using the ICD-10 codes for allergic diseases. Of these patients, this study identified those who had been hospitalized or made at least two outpatient visits each for allergic diseases and had been prescribed drugs commonly used to treat allergic diseases (as indicated in their insurance billing records), such as nedocromil sodium, oral steroids, ventolin, and so on. Only patients meeting these rigorous criteria were admitted into this study as patients.

#### 3.3 Estimation methods

The sources of data for each component of the costs estimated in this study including direct and indirect costs—are as follows.

#### 3.3.1 Direct costs

Direct costs refer to the amounts of spending directly related to illness and are divided into healthcare and non-healthcare costs.

Direct healthcare costs are the costs of preventing, treating, or managing illnesses by using medical institutions and include the costs of outpatient services, hospitalization, and medications (prescriptions). The majority of existing studies that estimate direct healthcare costs rely on administrative and official statistics for their estimations. As there is little controversy over the use of administrative and official statistics in estimating the direct healthcare costs of illness, this study, also, uses the NHI data to estimate the direct healthcare costs of allergic diseases. Depending on who pays them, direct healthcare costs can be further broken down into covered costs paid by insurer, copayments made by patients, and non-covered costs also paid by patients. The formula used to estimate the direct healthcare costs is provided below.

$$\text{DHC} = \sum\_{\mathfrak{s}} \sum\_{\mathfrak{y}} \left[ E\_{\text{ity}}(\mathbf{1} + a) + E\_{\text{ay}}(\mathbf{1} + \beta) \right] \tag{1}$$

where DHC = Direct healthcare costs, s = Sex, y = Age, E = Costs, i = Hospitalized patients, o = Outpatients, α = Non-covered cost ratio (hospitalized patients), β = Non-covered cost ratio (outpatients).

The direct non-healthcare costs are the costs of transportation and caregiving incurred by patients in seeking and receiving the services of medical institutions. This study draws upon the national official statistics data on transportation costs. This data include information on the costs of one-way transportation paid by outpatients and hospitalized patients and estimate the final costs of transportation based on the assumption that some patients would be accompanied by their caregivers. The costs of one-way trips were multiplied by the price-adjusted index and used to estimate the total costs of round-trip transportation. As for the cost of caregiving, this study used the average daily cost of hiring a caregiver, as suggested by the caregivers association. Defining the cost of caregiving as the opportunity cost of caregivers' time during patients' hospitalization, this study applied the average

Methodology of Estimating Socioeconomic Burden of Disease Using National Health Insurance… DOI: http://dx.doi.org/10.5772/intechopen.89895

daily wage for caregivers as the unit cost of caregiving. This unit cost was then multiplied by the number of hospitalization days. The cost of caregiving was also estimated for outpatient visits based on the assumption that each outpatient visit takes up one-third of the caregiver's daily working hours. The formula used to estimate the direct non-healthcare costs is provided below.

$$\text{NHC} = \sum\_{\mathfrak{s}} \sum\_{\mathfrak{y}} \left[ \left( N\_{\text{i}\mathfrak{y}} + N\_{\text{a}\mathfrak{y}} \right) \times \mathbf{C}t \times \mathbf{2} \right] + \sum\_{\mathfrak{s}} \sum\_{\mathfrak{y}} \left[ \left( L\_{\text{i}\mathfrak{y}} \times \mathbf{C}t \right) + \frac{\mathbf{1}}{\mathfrak{s}} \left( N\_{\text{a}\mathfrak{y}} \times \mathbf{C}\_{\mathfrak{c}} \right) \right] \tag{2}$$

where NHC = Direct non-healthcare costs, s = Sex, y = Age, N = Number of visits, i = Inpatients, o = Outpatients, Ct = Cost of transportation, L = Length of stay, Cc = Cost of caregiving.

### 3.3.2 Indirect costs

Indirect costs do not represent actual financial costs paid but the losses of labor and productivity due to illnesses. The indirect costs represent the amounts of working time lost in order to visit and use the services of medical institutions, the loss of future income due to the premature death of patients, and the opportunity cost of caregiving. The opportunity costs so incurred include not only the amount of working time lost but also the amount of leisure time lost. This study draws upon the employment and labor statistics provided by the government in order to estimate the indirect costs. These statistics are part of the official employment and labor statistics that provide information on the average daily and monthly wages, total working hours, and employment rates by sex and age.

The losses of labor (productivity) due to the need for treatment and recovery and losses of future income due to premature death were estimated in the following manner. First, loss of labor can be understood as the opportunity cost of labor incurred by spending time hospitalized or making outpatient visits to medical institutions instead of working. These opportunity costs were thus estimated for the working-age population (ages 15–69). In the case of hospitalized patients, loss of productivity was found by multiplying the daily average wage for each age group by the number of hospitalization days. For outpatients, the daily average wage for each age group was multiplied by the number of outpatient visits made, and the result was divided by 3 (based on the assumption that outpatient visits took up one-third of each patient's daily working hours). The formula used to estimate the losses of productivity is provided below.

$$\text{PL} = \sum\_{\text{s}} \sum\_{\text{y}} \left[ \left( N\_{\text{i}\text{y}} + \frac{1}{3} N\_{\text{a}\text{y}} \right) \times W\_{\text{y}} \times E\_{\text{y}} \right] \tag{3}$$

where PL = Productivity loss, s = Sex, y = Year, N = Number of hospitalization days (visits), i = Inpatients, o = Outpatients, W = Average wage, E = Employment rate.

The loss of future income due to the premature death of patients represents the decrease in expected income that individuals could have earned had they lived to their full life expectancy. To estimate this loss, this study relied upon the raw data for the official statistics on causes of death provided by the government to identify the number of deaths by sex and age and then applied the death rate to the average monthly wage and number of working days for each age group. The loss of future income was again estimated for the working-age population (ages 15–69) only, applying the employment rate of each age group. In order to convert the estimated

loss of future income into a present value, a discount rate was applied, and the sensitivity to changes in the discount rate was checked by applying additional discount rates. The formula used to estimate the loss of future income due to premature death is shown below.

$$\text{LFI} = \sum\_{s} \sum\_{\mathcal{V}} \sum\_{k=1}^{n} \left( N\_{\mathcal{V}} \times \frac{\text{Yy}(t+k) \times \text{Py}(t+k)}{(1+r)^k} \right) \tag{4}$$

where LFI = Loss of future income, s = Sex, y = Age, k = 1,2, … , n (where "n" represents the difference between the life expectancy and actual average age at death for each age cohort), t = Age at death, r = Discount rate, Nsy = Number of premature deaths associated with allergic diseases by sex and age, Ysy(t + k) = Annual average income at t + k by sex and age, Psy(t + k) = Employment rate at t + k by sex and age.
