**4. The structure of a survey database**

which estimates the trajectory may produce errors. To correct these errors, the comparison of the overlapping areas helps—like the loop closure method in traditional cave surveying—to minimize the differences. The software uses best fitting algorithms to automatically localize similar patches of scanned cave parts [23]. The point cloud data—similarly to the data types

When new survey is done with modern instrumentation, usually the subject is a cave, where spelunkers worked previously and produced several kinds of archive data. The newly measured and the archive data both provide valuable information for scientist, thus, they should be integrated with each other. The two datasets can be paired along well defined spatial constraints—like identifiable morphology or artifacts (**Table 1**). For example, if some points of the archive survey are marked permanently in the cave, the installed artifacts can be identified on the LiDAR point cloud as regular-shaped objects. If the markings are too small, more apparent objects can be mounted

In some cases, the structure of the archive dataset (i.e., column sequence in the data table) may have similar characteristics to the new one, but the reference systems of them are different. To avoid errors in later phases, the two sets should be checked at overlapping parts before

Archive data is not necessarily old data. Point clouds of several TLS survey sometimes are given to scientists to process the data and extract new information from it, but the surveys may come from different groups, who worked with different instruments. It is also possible that the point cloud data (las-files) are not accessible, only the 3D model of the cave—derived from the point cloud. Such models can be created in several ways—basically using stochastic methods—and the

temporarily on the cave wall, where old markings are found (e.g., uniform-sized disks).

Map Survey database Identify station locations on the archive map

Survey database Survey database Match the data structure; harmonize the coordinate system

Map Point cloud Identify characteristic points on the archive map (based on

Survey database Point cloud Identify the possible locations of the archive stations—based

3D model Point cloud Compare the locations of the base stations; check the point

Documentation Survey database Depending on the type of the document: matching

available

orientation

**Table 1.** Typical setting of various types of archive and new data types, and the necessary actions to put them into one

(check the validity on overlapping areas)

on notes, and/or permanent markings

morphology, or permanent markings on the cave wall)

cloud processing methods if only the mesh (3D model) is

descriptions with the new survey, photo localization, section

of a TLS—is a .las-file or a zip-compressed .laz-file.

**Archive data New data Action**

unifying the two databases.

34 Cave Investigation

data system.

**3.3. Combined data types of archive and new surveys**

The GIS works effectively if the links between the different subsystems are well defined, and the users understand these definitions enough to maintain the links. The regulations should be built in the programs as deeply as possible (error handling) and be documented in manuals. The subsystems can be connected into a data system if:


The data transfers occur in those cases when the user attempts to work with data, which was created or stored in different program from which she/he uses momentarily. The most basic act of data transfer is opening a file with a program. If the file has improper syntax, this simple act may result in only error messages. Obviously, one can prepare for this by saving the survey data in proper file format, but what is "proper" depends on the software used as the component of the system. Ideally, the whole sequence of the processes is planned prior the survey, but it should be documented in written format at least during work; otherwise, the process is not reproducible. These documentations contain the list of the system components (*component programs (CP)*), the connections between the components (*input-output (IO)—formats*) and the location and the naming system of the files (**Table 2**).

In some cases—at least partially—the component program itself creates the documentation, which can be in various forms (e.g., text, rtf, and xml). In a component program equipped with GIS functionality, the user connects the different data types in a workspace, and the connections are saved in workspace (ws) file. In a general purpose GIS workspace such as the QGIS, an xml-type file is created containing the <datasource> tag (**Figure 6**). If the data-source file is renamed or moved, the data will not be loaded into the workspace, unless the user redefines the connection. This ws-file also contains the coordinate system, the querying, and the visualization parameters of the data source. Alternatively, the user can create log files, where one can track the changes made in the different system components—providing more control over the whole process—however, this requires great self-discipline, and such logs are very rarely published.

From the technical aspects, a well-written documentation is usually more valuable than the published result of the survey (i.e., a map or a 3D model of the cave) because it contains the stepwise methods, providing the benefit of reproducibility of the results. But writing a thorough description of the methodology not only benefits third party readers, but the scientist too who actually work on the project, because the documentation itself may also form the basis of scientific publications. In the process of reconstructing the results of an archive survey, the logs are also quite valuable.

#### **4.1. Database management**


Due to the modularity of the GIS, the survey database is neither a homogenous table (i.e., an Excel worksheet), nor a uniform file, although the data modules may take such common forms.

**Table 2.** Main contents and scopes of a GIS documentation describing the data structure of the system.

**Figure 6.** Part of the xml-type QGIS workspace file (.qms) showing the syntax of the data-source definition (the relative folder route from the workspace file and the name of the data-source file are in bold).

Original form of the data modules are differentiated into *raster*, *vector,* and *alphanumeric* attribute types. If the original data is assigned into a workspace file, where the different types of data are linked, a GIS database is created despite of the diversity of original forms.

Database management involves the structuring, the maintenance, and the querying of the data through one or more user interface (program component) such as a GIS application (**Figure 7**). In a GIS, it is very rarely a linear sequence of tasks, but rather an iterative process. The iteration involves mainly the positioning of the data: if new measurements are available, the calculated coordinates of the existing—processed—data may change due to the fitting methods. Also, the quantity of the processed archive data may increase. To avoid discrepancies in the database, the relations of the different data modules should contain links pointing to each other. These links allow one to manage the system without changing the database records manually. This is usually done automatically within the program component that manages the different data types (e.g., closing a loop in Therion will modify the passage geometry of the whole cave map [3]).

The data physically is stored on a hard drive, and it is evident that the data access must be ensured throughout the managing procedures. To ensure this, the folder structure is recorded in the workspace file, but it is also important to notice that the component programs—installed on the computer—have their own logic in storing the data. For example, the default file saving path of a GIS program can be the same where the file settings were installed, and when the program is uninstalled or reinstalled with a new version, this folder can be overwritten or removed. These folders are the *system folders*, which belong to the component programs together with those folders where the executables are installed. To avoid loss of data, one should not store or save acquired data or workspace files in system folders.

#### **4.2. Processing sequences**

not reproducible. These documentations contain the list of the system components (*component programs (CP)*), the connections between the components (*input-output (IO)—formats*) and the

In some cases—at least partially—the component program itself creates the documentation, which can be in various forms (e.g., text, rtf, and xml). In a component program equipped with GIS functionality, the user connects the different data types in a workspace, and the connections are saved in workspace (ws) file. In a general purpose GIS workspace such as the QGIS, an xml-type file is created containing the <datasource> tag (**Figure 6**). If the data-source file is renamed or moved, the data will not be loaded into the workspace, unless the user redefines the connection. This ws-file also contains the coordinate system, the querying, and the visualization parameters of the data source. Alternatively, the user can create log files, where one can track the changes made in the different system components—providing more control over the whole process—however, this requires great self-discipline, and such logs are very rarely published. From the technical aspects, a well-written documentation is usually more valuable than the published result of the survey (i.e., a map or a 3D model of the cave) because it contains the stepwise methods, providing the benefit of reproducibility of the results. But writing a thorough description of the methodology not only benefits third party readers, but the scientist too who actually work on the project, because the documentation itself may also form the basis of scientific publications. In the process of reconstructing the results of an archive sur-

Due to the modularity of the GIS, the survey database is neither a homogenous table (i.e., an Excel worksheet), nor a uniform file, although the data modules may take such common forms.

quantitative data types

connections, table/file names, attribute types, etc. Handling the different conceptual categories in qualitative data types (i.e., different nomenclature in source data). Handling the various measuring systems, units and calibrations in

to publishing the results), and naming the programs, which

indicating certain actions they involve. Logic and syntax of file

and automatic), and the definition of the acceptable error range. Description of the possible validation methods

and folder naming for input and output files

1 Database management Definition of database structures (RDB andxml), queries,

2 Processing sequences Sequence of the actions of the work (from measuring the data

4 Quality control Description of the possible errors of each action (both manual

**Table 2.** Main contents and scopes of a GIS documentation describing the data structure of the system.

were involved 3 Automated processes List of scripts and programs that perform automatic processes,

**Scope of the documentation Description**

location and the naming system of the files (**Table 2**).

vey, the logs are also quite valuable.

**4.1. Database management**

36 Cave Investigation

The well-documented sequence of actions in the work (from measuring the data to publishing the results), naming the programs which were involved, is like a cookbook: one can achieve the results without it with experience, but for those, who are not familiar with the whole procedure,

**Figure 7.** Flowchart of the database management in a GIS. The structuring is done when the acquired data is organized into different types of databases; the management includes the definition of the common spatial context, and the links between the databases. The querying tackles spatial, logical, topological, and temporal relations.

the stepwise aid is necessary. The processing sequences are the main components of a technical documentation. Basically, two scenarios can be distinguished:


In the first case, the project obviously involves archive data even if the data is not so old, and the main task is to revise the different sources. The processing sequence describes what and how we modify the data to incorporate it in a GIS program. However, the data quality—in this case—cannot be truly modified, by filtering out the unmarked records or outliers, it can be enhanced. Modifications are done in the qualitative and quantitative categorization of the data (measure units, coordinate system, and nomenclature). The definition of qualitative categories may also be different in the data (e.g., what is considered as the minimum size of a conduit?).

In the second case, the planning should include the preparation of templates (e.g., tables), and user guides. Moreover, the incoming data should be error-checked, prior to passing it to the subsequent phase of the work.

#### **4.3. Automated processes**

In most cases, the automatic processes help the users in time consuming, repeatedly occurring tasks, such as recalculating the spatial positions of the survey stations after loop closure [3]. Not so long ago, the cave maps have had to be updated manually after a new survey track was added to the existing system, even if the loop closure was calculated by the software. Fortunately, this task is also automatized in some of the cave surveying programs [3]. The automatic methods in cave data processing are developed in three main areas:


the stepwise aid is necessary. The processing sequences are the main components of a technical

**Figure 7.** Flowchart of the database management in a GIS. The structuring is done when the acquired data is organized into different types of databases; the management includes the definition of the common spatial context, and the links

• The data acquisition is already done and the GIS is aimed to incorporate the available in-

• The data is yet to be acquired, and the project can identify core components of the GIS,

In the first case, the project obviously involves archive data even if the data is not so old, and the main task is to revise the different sources. The processing sequence describes what and how we modify the data to incorporate it in a GIS program. However, the data quality—in this case—cannot be truly modified, by filtering out the unmarked records or outliers, it can be enhanced. Modifications are done in the qualitative and quantitative categorization of the data (measure units, coordinate system, and nomenclature). The definition of qualitative categories may also be different in the data (e.g., what is considered as the minimum size of a conduit?). In the second case, the planning should include the preparation of templates (e.g., tables), and user guides. Moreover, the incoming data should be error-checked, prior to passing it to the

documentation. Basically, two scenarios can be distinguished:

between the databases. The querying tackles spatial, logical, topological, and temporal relations.

formation into a coherent system.

which are planned.

38 Cave Investigation

subsequent phase of the work.

(3) data management

Determining the spatial position is one of the most challenging tasks even in contemporary surveys. The polygon network ideally has to be connected to two distinct surface points (entrances or wells) to fit the lower order stations in between them, but the two way measuring can also improve the quality of the data. The fitting (calculation of the station's position) is done by linear algebraic equations and automatized in the surveying program. The logic is the same if we work with LiDAR.

Modeling of the cave is done for several reasons, but the aim is usually to create an irregular shape in the virtual space. Subsequently, one can calculate the parameters of it, or simply use it as a visualization of the hardly accessible location. The models thus, are *parametric* or *realistic* ones. Parametric models can automatically be generated from the survey records extruding 2D geometrical shapes along the station-target vectors [10, 11]. The visual representation of such a model is schematic compared to a realistic one (**Figure 8**). To produce a parametric

**Figure 8.** Volumetric model of the Szemlő-hegy Cave (Hungary) with regular-sized control objects (cubes). Such cubes can be used as references to calculate the relative proportion of the macro-porosity (passages) within a certain volume of the incorporating rock using statistical methods [11]. North is parallel with the y axis.

model, one must not even have to visualize the model to obtain the results, which are numbers indicating the volume, surface, and rate of void in the incorporating rock.

Creating realistic models—however—is a more common aim among cavers. Besides the table-, and the map view, the cave surveying programs provide 3D visual representation of survey results for a long time (helping the cavers to understand the passage structure more easily). In the popular surveying programs, the modeling is also based on the extrusion of a geometrical object along the station-target vector, but to enhance the model resolution the vertex number is increased in the surveys regarding the transversal sections. Instead of just 4 (LRUD method), 6 or 12 equally distributed radial vectors are measured around a station perpendicular to the station-target vector [7]. The sections are placed along the polyline network, and the more the vertices are measured, the better the realistic model will be. The edges of the adjoining shapes are also smoothed automatically using tangential curves and radial base functions.

When working with a LiDAR data, the automatic methods help fitting the point clouds of different stations to each other. The fitting result is described with statistical parameters and in some cases with a new attribute of each of the points showing the quality (*quality map*).

Automatic processes in data management are responsible mainly for the data loading and updating. This automatism occurs when the database is located on a server, while the GIS interface is on a client computer. If a working GIS is established and the database connections are defined, SQL scripts can update the client side regularly querying the database server. The data upload process is also automatized in this case: the data logged in a survey management program (or just in an Excel sheet) is written in a certain file format, which can be data-mined with scripts (primitive programs developed for repeatedly occurring tasks). The script code can work with any type of data (raster, vector, or alphanumeric). It extracts the data from the structured file and uploads it to the server-side database. The data mining scripts only work well if the files are located on the predefined path/folder; otherwise, the data is not loaded in the main database.

#### **4.4. Quality control**

One of the most crucial issues of archive data processing is the estimation of errors, which are present in the sources. Errors mostly affect the spatial positioning of the base data, thus, it is important to find ways to compare the existing data to something we can surely decide whether it can be trusted. It is also important to know how the errors were originally put into the data. The QC of the archive data is usually based on new (control) measurements.

For example, if loop correction was done with survey management software, the geographic position of the passages might have changed drastically. On the printed editions of the map, this was not always fully tracked. The farther in the past we go back, the bigger is the chance we find cave maps with uncorrected parts edited manually after new survey sequences. In fact, most of the result maps of archive surveys inevitably bear such kind of inhomogeneous errors distributed over the whole area.

This produces many possibilities for subsequent misinterpretation and first of all, we have to obtain a consistent database of the archive survey tracks to calculate loop closures. Though, this task is only a matter of digitization of the field notes in most cases, it is quite problematic if the survey database (records of the measurements) is not available. In the latter case, the polygonal network of the survey has to be reconstructed from the maps, and there is no way to tell what the error of the survey is, until conducting a new one. However, it is observed that the estimated error is not less than 1%, but sometimes reaches 5–10% of the distance from the base station [6].

Correcting the geometry of an archive cave map is often a first step toward building the GIS. Regarding the inhomogeneity of the spatial errors over the mapped area, the correcting method must also apply to spatially varying functions to modify the misplaced parts of the map. The simplest—and most adequate—method for this is based on the irregular network of triangles among ground control points (GCPs). The GCPs are usually the stations of the archive survey and can—ideally—be identified on both the scanned archive map and the line plot map, which is created from the survey database. The two maps make two differing geometric manifestations of each triangle in the network, although the corner points (stations) are literally the same. The comparison of the triangle pairs—using the Euclidian coordinates as variables—results a first order 2D function of transformation, which can be applied on all points within the area of the triangle (including the nodes). Usually the line plot map is accepted as the base and the scanned paper map is the one to be modified triangle-by-triangle. This method is also referred as the "rubber sheet" method. The absolute difference between the two positions of the triangle's points can be expressed as an attribute in the database making it possible to compile 2D "error maps" as quality indicator of the archive map.

After correcting it, even an archive map can be used to extract enough spatial parameters for 3D cave modeling. The volumetric parameters of the cave model can be calculated from the transversal and longitudinal sections. The error is estimated from the comparison of the archive section and a newly measured one. For this comparison, basically the geometrical parameters (perimeter, area) are used. The experienced rate of difference depends on the resolution of the source material (the lager, the better) and the shape of the passage profile. This implies a very important thing in cave data processing, which is the uniqueness of each cave.

In the case of new surveys, the QC is usually maintained largely by the processing programs by logging the accuracy of TLS measurements and the surface fitting parameters. However, the user must consciously handle these logs (often presented as a simple message after finishing a work phase) to track and report the confidence of the created model. In this case, the user created logs and documentation of the data processing can be the proper form of having a control on the quality.
