**2. Related works**

Several techniques have been developed for identifying social land-use functions. Traditionally, land-use identification was inferred by human trajectory patterns as reflected by individual travel surveys recorded by respondents [19–21]. However, self-reported diaries suffer from major disadvantages, including a relatively small number of respondents, difficulty in obtaining a representative sample of the city population, and an experimental period that is usually limited to a few days because of high costs. Moreover, the diaries are self-reported; therefore, they are not considered to be fully reliable.

Sensing technologies, ubiquitous connectivity, and computing power have brought a variety of opportunities for smart cities, and specifically to land-use mapping [22]. Data sources, such as remote-sensing imagery, social media data, taxi trajectories, and mobile phone signals, have been utilized for cheaper and enhanced social land-use mapping research.

Some works have used spectral and textural characteristics. For example, Lu and Weng [23] integrated population density data and remote-sensing systems measuring land surface temperature and spectral reflectance to classify urban lands. Image processing and classification techniques of remote-sensing images were used in numerous research studies to capture physical aspects, such as land surface reflectivity and texture of urban space [24–26] or to accomplish urban land-use mapping [9]. However, inferring land use by analyzing remote-sensing images tells only part of the story because they cannot recognize functional interactions between city segments and social behavior [27–29].

Social media can be seen as complementary to remote-sensing image methodology, as it is valuable for identifying movement patterns and social dynamics [27, 29, 30]. A varied collection of social media data, such as social media check-ins, GPS trajectories, and points of interest (POI), has been used for monitoring urban residents' land-use dynamics [31]. Liu et al. [32] offered an unsupervised method that extracts patterns of temporal activity variations and spatial interactions between places based on taxi trajectories and discovers the common characteristics of lands of similar social function. Long and Thill [33] combined one-week period bus smart card data and household travel survey to analyze jobs–housing relationships in Beijing. Commuting trips from three typical residential communities to six main business zones were mapped and compared to analyze commuting patterns in Beijing, and then validated with those extracted from the survey. Also, Zhou et al. [34] used smart card data. They investigated how a rider allocates time in the vicinity of metro stations spatially and temporally to classify space–time activity patterns that may explain inter-personal and intra-personal behavioral variability. Shen and Karimi [35] used check-in-based data and analyzed the interaction between places in the city to infer their urban structure and related socioeconomic patterns. POIs associated with coordinates and a label such as "restaurant," "shopping center," and "theater" have been extensively leveraged for land-use identification [36]. Their biggest virtue is that they carry semantic information. Some methodologies offer to leverage POI datasets to discover regions of similar social function by classifying together lands of similar POI types' distribution and patterns [27, 37]. However, social media data's main demerit is its sparsity in space and time [29]. Social information hidden in GPS records allowed Khoroshevsky and Lerner [38] to discover mobility patterns and predict users' geographic and semantic locations alike, with no privacy violation by using only the user's own data and no semantic data voluntarily shared by him or by others. By properly selecting an evaluation metric of trajectory clustering and accounting for cluster density, they traded between prediction accuracy and information, providing more clusters that are smaller and denser, showing more meaningful locations, but less predictable, and vice versa. Using semantic mobility patterns determined from POIs in people's daily trajectories, Ben Zion and Lerner [39] could identify and predict person's lifestyle both for a novel trajectory and a novel user.

As all data sources are limited and capture specific aspects of urban dynamics, a recent movement in the research of land-use identification is to rely on several data sources of different types. Both the works of Liu et al. [31] and Hu et al. [8] combined remote-sensing images and social media data. The work of Yuan et al. [3] integrated POI datasets and datasets of 3 months of GPS trajectories generated by 12,000 taxicabs in Beijing to identify lands of different social functions using an unsupervised clustering algorithm. The work of Tu et al. [29] integrates a mobile phone signals dataset with social media data to infer the social function of land use. They estimated individuals' "home" and "work," and then aggregated the individuals

### *Mapping of Social Functions in a Smart City When Considering Sparse Knowledge DOI: http://dx.doi.org/10.5772/intechopen.104901*

together with social knowledge learned from social media check-in data into a collective social land-use map.

Numerous works leverage call detail records (CDR) for capturing spatiotemporal movement patterns and city dynamics [17, 30, 40]. CDR holds data of mobile phone signals collected and stored by telecom operators mainly for billing reasons [41]. They contain communication properties, such as start time and call duration, type of communication (call, SMS, internet), as well as the cell tower from which the communication originated. CDR also includes the location at which the communication occurred, calculated by triangulating the signal strengths from surrounding cell towers [4, 41, 42]. Its greatest virtue, as a location tool for human behavior evaluation, is that it is routinely produced by the telecom equipment when users make a phone call, send or receive a message, or browse web pages; hence, it is a low-cost and efficient location estimation source [43]. The respondents in an experiment are unaware of it, and are, thus, not interrupted by it, but still, their personal information is not violated, as the actual user identification is ciphered. CDR contains an enormous amount of data and covers the major part of civilized areas in the world, depicting a variety of users. However, CDRs have two prominent limitations as a source for tracking human activity: First, they are sparse in time because they are generated only when a user engages in cellular communication. Second, they are coarse in space because they record location only at the granularity of a cell tower [30, 44]; CDRrendered coordinates have a varied inaccuracy of 50–350 m, depending on the density and arrangement of the towers. Another shortcoming is their lack of semantic information [30, 45].

Although incorporating several data resources is beneficial for achieving a high accuracy rate [9], in this work, we focused on achieving solid land-use identification with a simple and efficient methodology that requires only one data resource and little prior knowledge that can be obtained by domain experts. We wish to extract the most out of the information embodied in CDRs, and it can also be integrated with additional resources in future works. Several other works have already used CDRs as their main data resource for land-use identification. Toole et al. [40] utilized them for a supervised land-use classification method with a dataset consisting of CDRs for a period of three weeks in the greater Boston area. They classified urban space into five categories—residential, commercial, industrial, parks, and other, and relayed possession of ground truth land use as obtained by a zoning map. For the classification, they used Breiman's [46] random forest classification algorithm and post-processed the classification results with a neighbor smoothing algorithm. However, even with smoothing performed, in classifying the five land-use classes, the accuracy was relatively low, 54%. Pei et al. [18] also relied on CDRs and offered a semi-supervised algorithm for classifying the land of Singapore into the same five classes as Toole et al. [42]. They relied on the classification of a small number of labeled places, choosing 200 places to be labeled based on a few criteria aimed to ensure reliable labeling, and labeled them based on Singapore locals and Google Earth. They used the fuzzy c-means algorithm [47] and assumed possession of the "real" land-use labels of a small number of area segments. Their results also showed a relatively low detection rate of 58%. Zinman and Lerner [17] divided the space and time into spatiotemporal units, derived a varied collection of features to illuminate the social behavior of the units, and classified, with accuracies ranging from 84% to 91%, units in 62 days of cellular data recorded in nine cities in the Tel Aviv district according to their land use using a leveled hierarchy of semantic categories that include different levels of detail resolution.
