**3. Web Map Server workload**

In order to deal with this complexity some cache management algorithms have been created. However, the efficiency of the designed algorithms usually depends on the service's workload. Because of this, prior to diving into the details of the cache management policies, a workload characterization of the WMS services need to be shown. Lets take some real-life examples for such characterization: trace files from two different tiled web map services, Cartociudad1 and IDEE-Base2, provided by the National Geographic Institute (IGN)3 of Spain, are presented in this chapter.

Cartociudad is the official cartographic database of the Spanish cities and villages with their streets and roads networks topologically structured, while IDEE-Base allows viewing the Numeric Cartographic Base 1:25,000 and 1:200,000 of the IGN.

Available trace files were filtered to contain only valid web map requests according to the WMS-C recommendation. Traces from Cartociudad comprise a total of 2.369.555 requests

<sup>1</sup> http://www.cartociudad.es

<sup>2</sup> http://www.idee.es

<sup>3</sup> http://www.ign.es/ign/main/index.do?locale=en

**Figure 2.** Percentile of requests for the analyzed services.

received from the 9*th* December of 2009 to 13*th* May in 2010. IDEE-Base logs reflect a total of 16.891.616 requests received between 15*th* March and 17*th* June in 2010.

Google Maps and Microsoft Bing Maps.

<sup>4</sup> http://mapbox.com/mbtiles-spec/ <sup>5</sup> http://couchdb.apache.org/

**Number of requests (%)** 

**Number of requests (%)** 

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Web Map Tile Services for Spatial Data Infrastructures: Management and Optimization 29

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

services implement the WMS-C, TMS and KML service interfaces. GeoWebCache and MapProxy also offer the WMTS service from OGC. In addition, GeoWebCache can recombine and resample tiles to answer arbitrary WMS requests, and can also be used to serve maps to

All these services offer the possibility of storing map image tiles directly in the file system. TileCache and GeoWebCache also support the MBTiles speficication<sup>4</sup> for storing tiled map data in a SQLite database for immediate use and for transfer. MapProxy supports the Apache CouchDB5, a document-oriented database that can be queried and indexed in a MapReduce

**Figure 4.** Distribution of requests along the different resolution levels for IDEE-Base service.

**Figure 3.** Distribution of requests along the different resolution levels for Cartociudad service.

**Resolution level** 

**Resolution level** 

It must be noted that the performance gain achieved by the use of a tile cache will vary depending on how the tile requests are distributed over the tiling space. If those were uniformly distributed, the cache gain would be proportional to the cache size. However, lucky for us, it has been found that tile requests usually follow a heavy-tailed Pareto distribution, as shown in Figure 2. In our example, tile requests to the Cartociudad map service follow the 20:80 rule, which means that the 20% of tiles receive the 80% of the total number of requests. In the case of IDEE-Base, this behaviour is even more prominent, where the 10% of tiles receive almost a 90% of total requests. Services that show Pareto distributions are well-suited for caching, because high cache hit ratios can be found by caching a reduced fraction of the total tiles.

Figure 3 and Figure 4 show the distribution of tile requests to each resolution level of the tile pyramid for the analyzed services. The maximum number of requests is received at resolution level 4 for both services. This peak is due to the fact that this is the default resolution on the initial rendering of the popular clients in use with this cartography, as it allows the visualization of the whole country on a single screen. As can be observed, the density of requests (requests/tile) is higher at low resolution levels than at higher ones. Because of this, a common practice consists in pregenerating the tiles belonging to the lowest resolution levels, and leave the rest of tiles to be cached on demand when they are first requested.

#### **4. Tile cache implementations**

With the standardization of tiled web map services, multiple tile cache implementations have appeared. Between them, the main existent implementations are: TileCache, GeoWebCache and MapProxy. A comparison between these implementations is summarized in Table 1.

As can be seen, TileCache and MapProxy are both implemented in Python (interpreted language), while GeoWebCache is implemented in Java (compiled language). These three

**Figure 3.** Distribution of requests along the different resolution levels for Cartociudad service.

**Figure 4.** Distribution of requests along the different resolution levels for IDEE-Base service.

services implement the WMS-C, TMS and KML service interfaces. GeoWebCache and MapProxy also offer the WMTS service from OGC. In addition, GeoWebCache can recombine and resample tiles to answer arbitrary WMS requests, and can also be used to serve maps to Google Maps and Microsoft Bing Maps.

All these services offer the possibility of storing map image tiles directly in the file system. TileCache and GeoWebCache also support the MBTiles speficication<sup>4</sup> for storing tiled map data in a SQLite database for immediate use and for transfer. MapProxy supports the Apache CouchDB5, a document-oriented database that can be queried and indexed in a MapReduce

4 Will-be-set-by-IN-TECH

0 10 20 30 40 50 60 70 80 90 100

IDEE−Base Cartociudad

requests (%)

received from the 9*th* December of 2009 to 13*th* May in 2010. IDEE-Base logs reflect a total of

It must be noted that the performance gain achieved by the use of a tile cache will vary depending on how the tile requests are distributed over the tiling space. If those were uniformly distributed, the cache gain would be proportional to the cache size. However, lucky for us, it has been found that tile requests usually follow a heavy-tailed Pareto distribution, as shown in Figure 2. In our example, tile requests to the Cartociudad map service follow the 20:80 rule, which means that the 20% of tiles receive the 80% of the total number of requests. In the case of IDEE-Base, this behaviour is even more prominent, where the 10% of tiles receive almost a 90% of total requests. Services that show Pareto distributions are well-suited for caching, because high cache hit ratios can be found by caching a reduced fraction of the total

Figure 3 and Figure 4 show the distribution of tile requests to each resolution level of the tile pyramid for the analyzed services. The maximum number of requests is received at resolution level 4 for both services. This peak is due to the fact that this is the default resolution on the initial rendering of the popular clients in use with this cartography, as it allows the visualization of the whole country on a single screen. As can be observed, the density of requests (requests/tile) is higher at low resolution levels than at higher ones. Because of this, a common practice consists in pregenerating the tiles belonging to the lowest resolution levels,

With the standardization of tiled web map services, multiple tile cache implementations have appeared. Between them, the main existent implementations are: TileCache, GeoWebCache and MapProxy. A comparison between these implementations is summarized in Table 1.

As can be seen, TileCache and MapProxy are both implemented in Python (interpreted language), while GeoWebCache is implemented in Java (compiled language). These three

and leave the rest of tiles to be cached on demand when they are first requested.

0

**4. Tile cache implementations**

**Figure 2.** Percentile of requests for the analyzed services.

16.891.616 requests received between 15*th* March and 17*th* June in 2010.

20

40

60

tiles (%)

tiles.

80

<sup>4</sup> http://mapbox.com/mbtiles-spec/

<sup>5</sup> http://couchdb.apache.org/

#### 6 Will-be-set-by-IN-TECH 30 Cartography – A Tool for Spatial Analysis Web Map Tile Services for Spatial Data Infrastructures: Management and Optimization <sup>7</sup>


former. MapProxy offers three different ways to describe the extent of a seeding or cleanup task: a simple rectangular bounding box, a text file with one or more polygons in WKT format,

Web Map Tile Services for Spatial Data Infrastructures: Management and Optimization 31

These three services support both metatiling and meta-buffer methods. The meta-buffer adds

When a request of a tile in an unsupported coordinate reference system (CRS) is received, both GeoWebCache and MapProxy supports the reprojection on the fly from one of the available CRSs to the specified one. The former achieves this using GeoServer, while the latter offers it

Significant improvements can be achieved by using a cache of map tiles, like the ones discussed above. However, adequate cache management policies are needed, especially in local SDIs with lack of resources. In this section, our contributions to the main cache strategies

Anticipating the content that users will demand can guide server administrators to know which tiles to pregenerate and to include in their server-side caches of map tiles. With this objective in mind, a predictive model that uses variables known to be of interest to Web map users, such as populated places, major roads, coastlines, and tourist attractions, is presented

In contrast, we propose a descriptive model based on the mining of the service's past history [7]. Past history can be easily extracted, for example, from server logs. The advantage of this model is that it is able to determine in advance which areas are likely to be requested in the

In order to experiment with the proposed model, real-world logs from the IDEE-Base nation-wide public web map service have been used. Request logs were divided in two time ranges of the same duration. The first one was used as source to make predictions and the second one was used to prove the predictions created previously. Due to the difficulty of working with the statistics of individual tiles, the simplified model presented in Section 2 has been used. Concretely, the experiment was conducted with the simplified model to the grid

Figure5 shows the heatmaps of requests extracted from the web server logs of IDEE-Base service, propagated to level 12 through the proposed model. These figures demonstrate that some entities such as coast lines, cities and major roads are highly requested. These elements could be used as entities for a predictive model to identify priority objects, as explained in [8]. These figures show that near levels are more related than distant ones, but all of them share certain similarity. This relationships between resolution levels encourages the use of statistics collected in a level to predict the map usage patterns in another level with detailer resolution. For example, as shown in Figure5(c) and Figure5(e), resolution levels 14 and 16 are very

future based exclusively on past accesses, and it is therefore very simple.

are presented: cache population (or *seeding*), cache replacement and tile prefetching.

or polygons from any data source readable with OGR (e.g. Shapefile, PostGIS).

extra space at the edges of the requested area.

**5. Cache management algorithms**

cell defined by the level of resolution 12.

**5.1. Cache population**

natively.

in [8].

**Table 1.** Comparison of features between different open-source tile cache implementations: TileCache, GeoWebCache and MapProxy.

fashion, as backend to store tiles. Moreover, TileCache can store map tiles in the cloud through Amazon S36 or to maintain them in memory using Memcached7.

GeoWebCache maintain tile metadata, such as the last access time or the number of times that each tile has been requested. By using this metadata, it supports the LRU and LFU replacement policies. TileCache supports LRU by using the operating system's time of last access.

These services allow to specify a geographic region for automatically seeding tiles. For example, TileCache can be configured to seed a particular regions defined by a rectangular bounding box or a circle by specifying its center and radius. GeoWebCache supports only the

<sup>6</sup> http://aws.amazon.com/es/s3/

<sup>7</sup> http://memcached.org/

former. MapProxy offers three different ways to describe the extent of a seeding or cleanup task: a simple rectangular bounding box, a text file with one or more polygons in WKT format, or polygons from any data source readable with OGR (e.g. Shapefile, PostGIS).

These three services support both metatiling and meta-buffer methods. The meta-buffer adds extra space at the edges of the requested area.

When a request of a tile in an unsupported coordinate reference system (CRS) is received, both GeoWebCache and MapProxy supports the reprojection on the fly from one of the available CRSs to the specified one. The former achieves this using GeoServer, while the latter offers it natively.
