2. Data

The cell phone data is composed of full mobile communication patterns of 80 users over a period of 1 year, collected by a European phone company for billing and operational purposes. The data records the location and time when each user performs a call activity, including initiating or receiving a voice call or message. The locations are represented with cell IDs, each of which has a coverage ranging from a few hundred square metres in cities to a few thousand in rural areas. The users along with their phone numbers and the corresponding cell IDs are all anonymized. Table 1 illustrates typical call records of an individual identified as '10027534' on a day.


a The columns, respectively, denote the user, cell ID, time and duration (in minutes) of the call, the call type including 'voice call' and 'message' and the direction including 'incoming', 'outgoing' and 'missed calls'.

Table 1. Call records of a user.a

Among all the users, 9132 distinct call locations were detected and 259 (2.8% of the total identified locations) were labelled with activities conducted at these places. These labelled locations are used as the ground-truth data for training and validating our models. Activities are divided into five types, including 'work/school', 'home', 'social visit', 'leisure' and 'nonwork obligatory', accounting for 30, 29, 15, 14 and 12% of the training data, respectively. The type of 'work/school' represents all work- or school-related activities outdoors; while 'home' accommodates all time spending at home. 'Social visit' refers to all visit activities, 'leisure' includes recreational activities outside home, e.g. sports and eating/drinking, and 'non-work obligatory' consists of activities like bringing/getting people, shopping and personalized services. If activities in multiple types are executed in the same location for a particular individual, the most frequent activity is selected, such that each location is uniquely linked to an activity type for the individual.
