5. Analysis on the prediction results

Table 8 presents the annotation results by the RF fusion model with the enhancement algorithm, showing a large variation in the prediction accuracy across different activity types. Home, work/school and non-work obligatory activities are better predictable, with the accuracy of 91.3, 79.8 and 78.1%, respectively. Social visit activities show a middle level of predictability of 60.5%. By contrast, leisure activities are only 51.4% recognizable. Overall, prediction accuracy of 76.6% is achieved. Despite the promising results, misclassification exists for each of the activity types, prompting for further examination into the potential reasons for the errors.

(1) Home. Homes are featured with high visit frequencies and spatial-temporal regularities. However, seven homes are misidentified, of which five have lower visit frequencies than 10% on weekdays, i.e. less than 1 in 10 trips on weekdays ending at home. The unusually less visited homes could be due to the fact that the corresponding users spend less time at home and/or they make fewer calls than expected at home. This results in the home visit frequencies less represented by their call records. Alternatively, some of the misclassified locations can be a second home for users who already have a home at different locations. Two in these five users have two labelled homes. While their second homes are occasionally accessed, their main homes are routinely visited and correctly annotated. (2) Work/school. Like homes, work/school locations are also characterized by a high level of routine visits, but these two types differ regarding the time of the visits. While most of the trips to homes are at night and weekends, trips to offices or schools occur during the daytime on weekdays. Of all the work/school


Table 8. Prediction results (%).

The prediction rates of 73.7 and 75.2% were obtained for the transition probability-based and prior probability-based enhancement methods, respectively. Due to the small size of the training set, many locations are labelled as one single known activity of a day, the sequential information is thus not available on these days. With a large dataset, the transition matrix would better represent typical activity and travel behaviour of users. This would lead to the transition probability-based method and the enhancement algorithm as a whole bringing

Table 7. Prediction result comparison between before the enhancement algorithm and after that (%).

Enhancement Before After Difference Before After Difference

Home 91.3 91.3 0 91.3 91.3 0 Work/school 80.9 82.0 1.1 74.2 79.8 5.6 Non-work 37.5 59.3 21.8 53.1 78.1 25.0 Visit 47.4 55.3 7.9 52.6 60.5 7.9 Leisure 45.7 48.6 2.9 37.1 51.4 14.3 Overall accuracy 69.7 74.1 4.4 69.0 76.6 7.6

Table 8 presents the annotation results by the RF fusion model with the enhancement algorithm, showing a large variation in the prediction accuracy across different activity types. Home, work/school and non-work obligatory activities are better predictable, with the accuracy of 91.3, 79.8 and 78.1%, respectively. Social visit activities show a middle level of predictability of 60.5%. By contrast, leisure activities are only 51.4% recognizable. Overall, prediction accuracy of 76.6% is achieved. Despite the promising results, misclassification exists for each of the activity types, prompting for further examination into the potential reasons for the errors. (1) Home. Homes are featured with high visit frequencies and spatial-temporal regularities. However, seven homes are misidentified, of which five have lower visit frequencies than 10% on weekdays, i.e. less than 1 in 10 trips on weekdays ending at home. The unusually less visited homes could be due to the fact that the corresponding users spend less time at home and/or they make fewer calls than expected at home. This results in the home visit frequencies less represented by their call records. Alternatively, some of the misclassified locations can be a second home for users who already have a home at different locations. Two in these five users have two labelled homes. While their second homes are occasionally accessed, their main homes are routinely visited and correctly annotated. (2) Work/school. Like homes, work/school locations are also characterized by a high level of routine visits, but these two types differ regarding the time of the visits. While most of the trips to homes are at night and weekends, trips to offices or schools occur during the daytime on weekdays. Of all the work/school

greater improvement over the current experimental results.

Fusion model MNL RF

108 Smartphones from an Applied Research Perspective

5. Analysis on the prediction results

locations, 10.1% are wrongly predicted as non-work obligatory or social visit activities if they are accessed infrequently during weekdays. All the corresponding users work/study at multiple places, and the misidentified locations are their additional work/school places. Another 10.1% are mistaken as homes, if they have high visit frequencies at weekend. For instance, one of these users has two labelled work locations. They were visited at rates of 32% during weekdays and 42% on Sunday, respectively. While the first one was correctly identified, the second one was wrongly predicted as home. This suggests that the work regime plays an important role in distinguishing work locations from homes. While most people work during weekdays, certain minorities work on different shifts, especially to weekends or nights, generating distinct activity and travel patterns from the main stream of the population. (3) Nonwork obligatory. The activities have low visit frequencies and short duration. The misclassification of the activities can be partially attributed to a combination of heterogeneity within this category. The various detailed types of the activities are likely performed at spatially independent locations and temporally varied preferences. For instance, shopping is mostly done in later time of the day than service or bringing/picking up activities. (4) Social visit. The activities are profiled with a middle level of visit frequencies during weekdays. If the locations are accessed less, they tend to be annotated as leisure or non-work obligatory activities; if more, they are considered as home or work/school places. The limited predictability could be caused by the underlying structure of an individual's social network, in which various degrees of relationship exist, ranging from closed one they visit routinely to the one they just meet occasionally. This generates variations in spatial-temporal features of the locations. (5) Leisure. Leisure activities are conducted in various places and at different time for an individual; they exhibit the lowest level of regularities and thus are the most challengeable to annotate. Apart from the spatial-temporal irregularities, the examination into two falsely predicted leisure locations reveals additional causes for the misclassification. The first one has a visit frequency of 36.3% in both the afternoon and evening on weekdays. It was the second most visited place for the corresponding user who has accessed this place 170 times over 337 days, such that 1 in 2 days he/her was observed there. This location is originally labelled as a restaurant; however, the call records suggest a high probability that he/her may work there instead of eating as a customer. The second location was ranked as the most visited place for the concerned user. He/ she has in total conducted 383 visits over 442 days during both weekdays and weekends as well as at night. Nearly three in 4 days, he/she made calls there. Furthermore, the user has five locations collected in the training dataset, but none of which is labelled as home. This location is documented as sports; however, for this particular user, it is likely that this place is a home rather than a recreation site. While further investigation into the above two typical cases is needed before any definite conclusions are drawn, they nevertheless illustrate that our annotation method based on underlying activity and travel behaviour can effectively predict the activities, which are tailored to each individual. A location may have a single or multiple functions, but people visiting there could have different purposes. The match with geographic information alone is not able to identify this distinction. We shall call the location annotation at the individual level as micro-location-annotation.
