**3. Log collecting for BitTorrent peers**

The log collection approach implying a less intrusive activity but providing a great deal of protocol parameters is the use of logging information from clients. Each client typically presents status information (the dubbed *status message*) consisting of periodic information such as download speed, upload speed, number of connection and, if enabled, a set of enhanced pieces of information (the dubbed *verbose message*). Types of messages and their content have been thoroughly described in Section 2. All or most of BitTorrent clients provide status messages but some sort of activation or instrumentation is required to provide verbose messages.

#### **3.1 Using and updating BitTorrent clients for logging**

Throughout experiments we have used multiple open-source clients. All of them provided basic status information, while some were updated or altered to provide verbose information as well. Transmission, Aria2, Vuze, Tribler, libtorrent-rasterbar and the mainline client had been used to provide status parameters, while Tribler and libtorrent-rasterbar had also been instrumented to provide verbose parameters.

As building the setup, deploying peers and collecting information and subjecting it to dissemination is a lengthy process, this has to be automated Deaconescu et al. (2009).

Several approaches had been put to use to collect status information, depending on the client implementation:


BitTorrent Environments 9

Protocol Measurements in BitTorrent Environments 293

Although our system processes and stores all protocol message types, the most important messages for our swarm analysis are those related to changing a peer's state (choke/unchoke) and requesting/receiving data. Correlations between these messages are the heart of provisioning information about the peers' behaviour and BitTorrent clients' performance.

Logging information is typically stored in log files. In libtorrent-rasterbar's case, logging is using a whole folder and logging information for each remote peer is using a single file in that folder. Usually information is redirected from standard output and error towards the output

As in a given experiment logging information occupies a large portion of disk space, especially verbose messages, files and folders are compressed in archive files. There would generally be a log archive for each client session. When information is to be processed, logging archives are going to be provided to the data processing component. A log archive contains both status

Logging information may be stored in archive files for subsequent use or it may be processed live – that is parsing and interpreting parameters as log files are being generated. When running a live/real-time processing component, compressing logging information may not be required. However, in order to still preserve the original files, some experimenters may

The usefulness of a live processing component is based primarily on relieving the burden of space consumption, in case archiving is disabled. Most of the logged information is not useful, due to the fact that some peers may not be connected to other peers and status information, though provided, consists of parameters that are equal to zero – no connections means 0 KB/s download speed, 0 KB/s upload speed and others. On a certain occasion, a log file that had been used for more than 3 weeks, occupied more than 1GB of data but resulted in

Either when using live parsers or subsequent analysis, parameters are parsed for rapid use. The post-parsing storage is typically a relational database. The advantage of such a storage facility is its rapid access for post processing. When inquiring about given swarm parameters, the user would query the database and rapidly obtain necessary information. If that wouldn't be enabled, each inquiry would require a new parsing activity, resulting in large overhead and CPU consumption. Database storage is the final step of the logging and parsing stage. Parameter analysis, interpretation and advising activities would not be concerned of logging

In order to collect information specific to a swarm, one must have access to all clients and logging information from those clients. As such, either all clients are accessible to the experimenter, or users would subsequently provide logging information to the experimenter.

**3.2 Storage**

messages and verbose messages.

just 27KB valuable information.

choose to retain access to the log archives.

information, but only query the database.

**3.3 Experiments**

file.


In order to examine BitTorrent transfer parameters at a protocol implementation level, we propose a system for storing and analysing logging data output by BitTorrent clients. It currently offers support for hrktorrent/libtorrent<sup>7</sup> and Tribler8.

Our study of logging data takes into consideration two open-source BitTorrent applications: Tribler and hrktorrent9 (based on libtorrent-rasterbar). While the latter needed minimal changes in order to provide the necessary verbose and status data, Tribler had to be modified significantly.

The process of configuring Tribler for logging output is completely automated using shell scripts and may be reversed. The source code alterations are focused on providing both status and verbose messages as client output information.

*Status message* information provided by Tribler includes transfer completion percentage, download and upload rates. In the modified version, it also outputs current date and time, transfer size, estimated time of arrival (ETA), number of peers, and the name and path of the transferred file.

In order to enable *verbose message* output, we took advantage of the fact that Tribler uses flags that can trigger printing to standard output for various implementation details, among which are the actions related to receiving and sending BitTorrent messages. The files we identified to be responsible for protocol data are changed using scripts in order to print the necessary information and to associate it to a timestamp and date. Since most of the protocol exchange data was passed through several levels in Tribler's class hierarchy, attention had to be paid to avoid duplicate output and to reduce file size. In contrast to libtorrent-rasterbar, which, at each transfer, creates a separate session log file for each peer, Tribler stores verbose messages in a single file. This file is passed to the verbose parser, which extracts relevant parts of the messages and writes them into the database.

Unlike Tribler, hrktorrent's instrumentation did not imply modifying its source code but defining TORRENT\_LOGGING and TORRENT\_VERBOSE\_LOGGING macros before building (recompiling) libtorrent-rasterbar. Minor updates had to be delivered to the compile options of hrktorrent in order to enable logging output.

<sup>7</sup> http://www.rasterbar.com/products/libtorrent/

<sup>8</sup> http://www.tribler.org/trac

<sup>9</sup> http://50hz.ws/hrktorrent/

Although our system processes and stores all protocol message types, the most important messages for our swarm analysis are those related to changing a peer's state (choke/unchoke) and requesting/receiving data. Correlations between these messages are the heart of provisioning information about the peers' behaviour and BitTorrent clients' performance.

### **3.2 Storage**

8 Will-be-set-by-IN-TECH

• **Transmission** has a fully featured CLI and was one of the clients that were very easy to automate. Detailed debugging information regarding connections and chunk transfers can

• **aria2** natively provides a CLI and it was easy to automate. Logging is also enabled through

• **hrktorrent** is a lightweight implementation over **libtorrent-rasterbar** and provides the necessary interface for automating a BitTorrent transfer, albeit some minor modifications

• **BitTorrent Mainline** provides a CLI and logging can be enabled through minor

In order to examine BitTorrent transfer parameters at a protocol implementation level, we propose a system for storing and analysing logging data output by BitTorrent clients. It

Our study of logging data takes into consideration two open-source BitTorrent applications: Tribler and hrktorrent9 (based on libtorrent-rasterbar). While the latter needed minimal changes in order to provide the necessary verbose and status data, Tribler had to be modified

The process of configuring Tribler for logging output is completely automated using shell scripts and may be reversed. The source code alterations are focused on providing both status

*Status message* information provided by Tribler includes transfer completion percentage, download and upload rates. In the modified version, it also outputs current date and time, transfer size, estimated time of arrival (ETA), number of peers, and the name and path of the

In order to enable *verbose message* output, we took advantage of the fact that Tribler uses flags that can trigger printing to standard output for various implementation details, among which are the actions related to receiving and sending BitTorrent messages. The files we identified to be responsible for protocol data are changed using scripts in order to print the necessary information and to associate it to a timestamp and date. Since most of the protocol exchange data was passed through several levels in Tribler's class hierarchy, attention had to be paid to avoid duplicate output and to reduce file size. In contrast to libtorrent-rasterbar, which, at each transfer, creates a separate session log file for each peer, Tribler stores verbose messages in a single file. This file is passed to the verbose parser, which extracts relevant parts of the

Unlike Tribler, hrktorrent's instrumentation did not imply modifying its source code but defining TORRENT\_LOGGING and TORRENT\_VERBOSE\_LOGGING macros before building (recompiling) libtorrent-rasterbar. Minor updates had to be delivered to the compile options

be enabled by setting the TR\_DEBUG\_FD evironment variable.

currently offers support for hrktorrent/libtorrent<sup>7</sup> and Tribler8.

and verbose messages as client output information.

messages and writes them into the database.

of hrktorrent in order to enable logging output.

<sup>7</sup> http://www.rasterbar.com/products/libtorrent/

<sup>8</sup> http://www.tribler.org/trac <sup>9</sup> http://50hz.ws/hrktorrent/

CLI arguments.

significantly.

transferred file.

have been necessary.

modifications of the source code.

Logging information is typically stored in log files. In libtorrent-rasterbar's case, logging is using a whole folder and logging information for each remote peer is using a single file in that folder. Usually information is redirected from standard output and error towards the output file.

As in a given experiment logging information occupies a large portion of disk space, especially verbose messages, files and folders are compressed in archive files. There would generally be a log archive for each client session. When information is to be processed, logging archives are going to be provided to the data processing component. A log archive contains both status messages and verbose messages.

Logging information may be stored in archive files for subsequent use or it may be processed live – that is parsing and interpreting parameters as log files are being generated. When running a live/real-time processing component, compressing logging information may not be required. However, in order to still preserve the original files, some experimenters may choose to retain access to the log archives.

The usefulness of a live processing component is based primarily on relieving the burden of space consumption, in case archiving is disabled. Most of the logged information is not useful, due to the fact that some peers may not be connected to other peers and status information, though provided, consists of parameters that are equal to zero – no connections means 0 KB/s download speed, 0 KB/s upload speed and others. On a certain occasion, a log file that had been used for more than 3 weeks, occupied more than 1GB of data but resulted in just 27KB valuable information.

Either when using live parsers or subsequent analysis, parameters are parsed for rapid use. The post-parsing storage is typically a relational database. The advantage of such a storage facility is its rapid access for post processing. When inquiring about given swarm parameters, the user would query the database and rapidly obtain necessary information. If that wouldn't be enabled, each inquiry would require a new parsing activity, resulting in large overhead and CPU consumption. Database storage is the final step of the logging and parsing stage. Parameter analysis, interpretation and advising activities would not be concerned of logging information, but only query the database.

#### **3.3 Experiments**

In order to collect information specific to a swarm, one must have access to all clients and logging information from those clients. As such, either all clients are accessible to the experimenter, or users would subsequently provide logging information to the experimenter.

BitTorrent Environments 11

Protocol Measurements in BitTorrent Environments 295

possibilities in current implementations. Thus it will provide feedback to client and protocol implementations and swarm "tuning" suggestions, which in turn will enable high

Due to various types of modules employed (such as parser implementations, storage types, rendering engines) a data processing framework may provide different architectures. A

• **Parsers** – receive log files provided by BitTorrent clients during file transfers. Due to differences between log file formats, there are separate pairs of parsers for each client.

• **Database Access** – a thin layer between the database system and other modules. Provides

• **SQLite Database** – contains a database schema with tables designed for storing protocol

• **Rendering Engine** – consists of a GUI application that processes the information stored in

SQLite Database

> Access Module

Rendering Engine

Status Parser Database

performance swarms and rapid content delivery in peer-to-peer systems.

sample view of the infrastructure consists of the following modules:

support for storing messages, updating and reading them.

the database and renders it using plots and other graphical tools.

Verbose Log

As shown in Figure 1, using parsers specific to each type of logging file, messages are sent as input to the *Database Access* module that stores them into an SQLite database. In order to analyse peer behaviour, the *Rendering Engine* reads stored logging data using the *Database*

Once all logging and verbose data from a given experiment is collected, the next step is the analysis phase. The testing infrastructure provides a GUI (*Graphical User Interface*) statistics

The GUI is implemented in Python using two libraries: *matplotlib* – for generating graphs and *TraitsUi* – for handling widgets. It offers several important plotting options for describing peer

• *download/upload speed* – displays the evolution of download/upload speed for the peer; • *acceleration* – shows how fast the download/upload speed of the peer increases/decreases; • *statistics* – displays the types and amount of verbose messages the peer exchanged with

BitTorrent Swarm Verbose Parser

*Access* module and outputs it to a graphical user interface.

behaviour and peer interaction during the experiment:

Status Log

Each pair analyses status and verbose messages.

messages content and peer information.

Fig. 1. Logging System Overview

engine for inspecting peer behaviour.

other peers.

Some remote information may be replaced by that provided by a tracker log file. A tracker logs information regarding the overall swarm view, albeit its periodicity is quite large (typically 30 minutes – 1800 seconds).

An intermediate approach to collecting logging information is a form of aggregation of information on the client side. This information may be either sent to a logging service or stored to be subsequently provided to the user. The former approach it taken by the Logging Service withing the P2P-Next project.

Typical experiments are those that allow full control to the user and provide all information rendered by clients. Deployment, log activation, log collection/archiving and even parsing are accomplished in full automation. One would create a configuration file and run experiments. Log archive files typically result from the experiment and, after gathering all, may be subjected to analysis.

The inclusion of tracker information had been enabled in the use of UPB Linux Distro Experiments10 as described by Bardac et al. (2009). Tracker log files are parsed live and provide overall swarm parameters. Various information from tracker log files are provided as graphic images that show the evolution of swarm parameters.

Tracker logs have also been enabled in certain experiments. These experiments rely on extensive logging information (verbose messages) provided by seeders and tracker information. The lack of complete access to the all clients in the swarm is balanced out by the usage of verbose logging on the seeders' side. However, remote peers' intercommunication is not logged in anyway. Such that a form of aggregation and collection of remote peers' intercommunication messages is still required.

#### **3.4 Monitoring and post-processing**

Log processing, as described in Section 4 refers to parsing and interpreting BitTorrent protocol parameters. Data is parsed into an easy to be accessed database that is provided to the user.

As described above, one may choose to store logging information and then enable analysis. We dub this approach *post-processing*. The other approach is for live analysis of the provided parameters, resulting in client and swarm monitoring. The two approaches may, of course, be combined: while doing parsing of information, it is also stored in a database while various parameters are also monitored.

An overview of a typical architecture for data processing is presented in Figure 4. Separate parsers are used for live parsing and classical parsing. Classical parsing results in a database "output", while live parsing results in both a database "output" and the possibility of deploying live client and swarm monitoring.

### **4. Protocol data processing engine**

As client instrumentation provides in-depth information on client implementation, it generates extensive input for data analysis. Coupled with carefully crafted experiments and message filtering, this will allow the detection of weak spots and of improvement

<sup>10</sup> http://torrent.cs.pub.ro/

10 Will-be-set-by-IN-TECH

Some remote information may be replaced by that provided by a tracker log file. A tracker logs information regarding the overall swarm view, albeit its periodicity is quite large (typically 30

An intermediate approach to collecting logging information is a form of aggregation of information on the client side. This information may be either sent to a logging service or stored to be subsequently provided to the user. The former approach it taken by the Logging

Typical experiments are those that allow full control to the user and provide all information rendered by clients. Deployment, log activation, log collection/archiving and even parsing are accomplished in full automation. One would create a configuration file and run experiments. Log archive files typically result from the experiment and, after gathering all,

The inclusion of tracker information had been enabled in the use of UPB Linux Distro Experiments10 as described by Bardac et al. (2009). Tracker log files are parsed live and provide overall swarm parameters. Various information from tracker log files are provided as

Tracker logs have also been enabled in certain experiments. These experiments rely on extensive logging information (verbose messages) provided by seeders and tracker information. The lack of complete access to the all clients in the swarm is balanced out by the usage of verbose logging on the seeders' side. However, remote peers' intercommunication is not logged in anyway. Such that a form of aggregation and collection of remote peers'

Log processing, as described in Section 4 refers to parsing and interpreting BitTorrent protocol parameters. Data is parsed into an easy to be accessed database that is provided to the user. As described above, one may choose to store logging information and then enable analysis. We dub this approach *post-processing*. The other approach is for live analysis of the provided parameters, resulting in client and swarm monitoring. The two approaches may, of course, be combined: while doing parsing of information, it is also stored in a database while various

An overview of a typical architecture for data processing is presented in Figure 4. Separate parsers are used for live parsing and classical parsing. Classical parsing results in a database "output", while live parsing results in both a database "output" and the possibility of

As client instrumentation provides in-depth information on client implementation, it generates extensive input for data analysis. Coupled with carefully crafted experiments and message filtering, this will allow the detection of weak spots and of improvement

graphic images that show the evolution of swarm parameters.

intercommunication messages is still required.

deploying live client and swarm monitoring.

**4. Protocol data processing engine**

**3.4 Monitoring and post-processing**

parameters are also monitored.

<sup>10</sup> http://torrent.cs.pub.ro/

minutes – 1800 seconds).

Service withing the P2P-Next project.

may be subjected to analysis.

possibilities in current implementations. Thus it will provide feedback to client and protocol implementations and swarm "tuning" suggestions, which in turn will enable high performance swarms and rapid content delivery in peer-to-peer systems.

Due to various types of modules employed (such as parser implementations, storage types, rendering engines) a data processing framework may provide different architectures. A sample view of the infrastructure consists of the following modules:


#### Fig. 1. Logging System Overview

As shown in Figure 1, using parsers specific to each type of logging file, messages are sent as input to the *Database Access* module that stores them into an SQLite database. In order to analyse peer behaviour, the *Rendering Engine* reads stored logging data using the *Database Access* module and outputs it to a graphical user interface.

Once all logging and verbose data from a given experiment is collected, the next step is the analysis phase. The testing infrastructure provides a GUI (*Graphical User Interface*) statistics engine for inspecting peer behaviour.

The GUI is implemented in Python using two libraries: *matplotlib* – for generating graphs and *TraitsUi* – for handling widgets. It offers several important plotting options for describing peer behaviour and peer interaction during the experiment:


BitTorrent Environments 13

Protocol Measurements in BitTorrent Environments 297

The dubbed post processing framework is used for storing logging information provided by various BitTorrent clients into a storage area (commonly a database). An architectural view of

The two main components of the framework are the parser and the storage components. Parsers process log information and extract measured protocol parameters to be subject to analysis; storers provide an interface for database or file storing – both for writing and reading. Storers thus provide an easy to access, rapid to retrieve and extensible interface to parameters. Storers are invoked when parsing messages – for storing parameters, and when analyzing

Within the parser component, a LogParser module provides the interface to actual parser implementations. There are two kinds of parsers: log parsers and real time log parsers. The former are used for data already collected and subsequently provided by the experimenter. Another approach involves using parsers at the same time as the client generates logging information. This real time parsing approach possesses three important advantages: monitoring may be enabled for status messages, less space is wasted as messages are parsed in real time, and processing time is reduced due to the overlapping of the parsing time and the storing time. The disadvantage of a real time parser is a more complex implementation as it has to consider the current position in the log file and continue from that point when data is available. At the same time, all clients must be able to access the same database, probably

The storage component is interfaced by the SwarmAccess module. This module is backed by database-specific implementations. This may be RDMBS systems such as MySQL or SQLite or file-based storage. Parameters are stored according to the schema described in Figure 5. Configuration of the log files and clients to be parsed is found in the *SwarmDescription* file. All data regarding the current running swarm is stored in this file. Client types in the description

**4.1 Post processing framework for real-time log analysis**

the framework is described in Figure 4.

Fig. 4. Post-Processing Framework Architecture

located on a single remote system.

parameters – for retrieving/reading/accessing parameters.

The last two options are important as they provide valuable information about the performance of the BitTorrent client and how this performance is influenced by protocol messages exchanged by the client.

A sample GUI screenshot may be observed in Figure 2 and Figure 3:

Fig. 2. Rendering Engine for BitTorrent Parameters: Client Analysis

Fig. 3. Rendering Engine for BitTorrent Parameters: Client Comparison

The *acceleration* option measures how fast a BitTorrent client is able to download data. High acceleration forms a basic requirement in live streaming, as it means starting playback of a torrent file with little delay.

The *statistics* option displays the flow of protocol messages. We are interested in the choke/unchoke messages.

The GUI also offers two modes of operation: *Single Client Mode*, in which the user can follow the behaviour of a single peer during a given experiment, and *Client Comparison Mode*, allowing for comparisons between two peers.

12 Will-be-set-by-IN-TECH

The last two options are important as they provide valuable information about the performance of the BitTorrent client and how this performance is influenced by protocol

A sample GUI screenshot may be observed in Figure 2 and Figure 3:

Fig. 2. Rendering Engine for BitTorrent Parameters: Client Analysis

Fig. 3. Rendering Engine for BitTorrent Parameters: Client Comparison

torrent file with little delay.

choke/unchoke messages.

allowing for comparisons between two peers.

The *acceleration* option measures how fast a BitTorrent client is able to download data. High acceleration forms a basic requirement in live streaming, as it means starting playback of a

The *statistics* option displays the flow of protocol messages. We are interested in the

The GUI also offers two modes of operation: *Single Client Mode*, in which the user can follow the behaviour of a single peer during a given experiment, and *Client Comparison Mode*,

messages exchanged by the client.

#### **4.1 Post processing framework for real-time log analysis**

The dubbed post processing framework is used for storing logging information provided by various BitTorrent clients into a storage area (commonly a database). An architectural view of the framework is described in Figure 4.

Fig. 4. Post-Processing Framework Architecture

The two main components of the framework are the parser and the storage components. Parsers process log information and extract measured protocol parameters to be subject to analysis; storers provide an interface for database or file storing – both for writing and reading. Storers thus provide an easy to access, rapid to retrieve and extensible interface to parameters. Storers are invoked when parsing messages – for storing parameters, and when analyzing parameters – for retrieving/reading/accessing parameters.

Within the parser component, a LogParser module provides the interface to actual parser implementations. There are two kinds of parsers: log parsers and real time log parsers. The former are used for data already collected and subsequently provided by the experimenter. Another approach involves using parsers at the same time as the client generates logging information. This real time parsing approach possesses three important advantages: monitoring may be enabled for status messages, less space is wasted as messages are parsed in real time, and processing time is reduced due to the overlapping of the parsing time and the storing time. The disadvantage of a real time parser is a more complex implementation as it has to consider the current position in the log file and continue from that point when data is available. At the same time, all clients must be able to access the same database, probably located on a single remote system.

The storage component is interfaced by the SwarmAccess module. This module is backed by database-specific implementations. This may be RDMBS systems such as MySQL or SQLite or file-based storage. Parameters are stored according to the schema described in Figure 5.

Configuration of the log files and clients to be parsed is found in the *SwarmDescription* file. All data regarding the current running swarm is stored in this file. Client types in the description

BitTorrent Environments 15

Protocol Measurements in BitTorrent Environments 299

When parsing log files, one has to know the ID of the client session that has generated the log file. In order to automate the process, there needs to be a mapping between the log file (or log

At the same time, the client session ID needs to exist in the client\_sessions table in the database, together with information such as BitTorrent client type, download speed limitation, operating system, hardware specification etc. This information needs to be supplied by the

A swarm description file is to be supplied by the experimenter. This file consists of all required

As we consider the INI format to be best suited for this, as it is fairly easy to create, edit and update, it was chosen to populate initial information. The experimenter may easily create an INI swarm description file and provide it to the parser together with the (compressed) log

The swarm description file is to be parsed by the experimenter and SQL queries will populate the database. One entry would go into the swarms table and a number of entries equal to the number of peers in the swarm description file would go into the client\_sessions table. As a result of these queries, swarm IDs and client sessions IDs are going to be created when running SQL insert queries (due to the AUTO\_INCREMENT options). This IDs are essential for the message parsing process and are going to be written down in the Logfile-ID-Mapping-File. The swarm description file parser is going to parse that file and also generate a logfile-id

• creating and running SQL insert queries in the swarms and client\_sessions tables • create a logfile-id mapping file consisting of mappings between client session IDs and

A logfile-id mapping file is to be generated by the swarm description parser and will subsequently be used by the message parser (be it status messages or verbose messages). The mapping file simply maps a client session ID to a log file or a compressed set of log files. A sample file is stored in the repository. The message parser doesn't need to know client session information; it would just use the mapping file and populate entries in the \*\_messages

The message parser is going to use the logfile-id mapping file and the log file (or compressed set of log files) to populate the \*\_messages tables in the database (status\_messages,

There is a separation between the *experimenter* – the one running trials and collecting

experimenter in a form that is both easy to create (by the experimenter) and parse.

swarm and peer information including the name/location of the log file/archive.

mapping file. The parser is responsible for three actions:

peer\_status\_messages, verbose\_messages.

The workflow of the entire process is highlighted in Figure 6.

information and the *parser* – the one interpreting the information.

• parsing the swarm description file

**4.1.2 Logfile-ID mapping**

files.

log/file

tables.

archive) and the client session ID.

file also determine the parser to be used. Selection of the storage module is based on the configuration directives in the *AccessConfig* file. For an SQLite storage, this contains the path to the database file; for an MySQL file, it contains the username, password, database name specific to database connection inquiries.

The user/developer is interfaced by a *Glue Module* that provides all methods deemed necessary. The user would call methods in the Glue Module for such actions as parsing a session archive, a swarm archive, for updating a configuration file, for retrieving data that fills a given role.

#### **4.1.1 Parameter storage database**

The database schema as shown in Figure 5 is used for relational database engines such as MySQL or SQLite.


#### Fig. 5. Database Schema

The database schema provides the means to efficiently store and rapidly retrieve BitTorrent protocol parameters from log messages. The database is designed to store parameters about multiple swarms in the swarms table; each swarm is identified by the .torrent file its clients are using.

Information about peers/clients that are part of the swarm are stored in the client\_sessions table. Each client is identified by its IP address and port number. Multiple pieces of information such as BitTorrent clients in use, enabled features and hardware specifics are also stored.

Three classes of messages result in three tables: status\_messages, peer\_status\_messages and verbose\_messages. The peer\_status\_messages table stores parameters particular to remote peers connected to the current client, while the status\_messages stores parameters specific to the current client (such as download speed, upload speed and others). Each line in the \*\_messages tables points to an entry in the client\_sessions table, identifying the peer it belongs to – the one that generated the log message.

### **4.1.2 Logfile-ID mapping**

14 Will-be-set-by-IN-TECH

file also determine the parser to be used. Selection of the storage module is based on the configuration directives in the *AccessConfig* file. For an SQLite storage, this contains the path to the database file; for an MySQL file, it contains the username, password, database name

The user/developer is interfaced by a *Glue Module* that provides all methods deemed necessary. The user would call methods in the Glue Module for such actions as parsing a session archive, a swarm archive, for updating a configuration file, for retrieving data that

The database schema as shown in Figure 5 is used for relational database engines such as

The database schema provides the means to efficiently store and rapidly retrieve BitTorrent protocol parameters from log messages. The database is designed to store parameters about multiple swarms in the swarms table; each swarm is identified by the .torrent file its clients

Information about peers/clients that are part of the swarm are stored in the client\_sessions table. Each client is identified by its IP address and port number. Multiple pieces of information such as BitTorrent clients in use, enabled features and

Three classes of messages result in three tables: status\_messages, peer\_status\_messages and verbose\_messages. The peer\_status\_messages table stores parameters particular to remote peers connected to the current client, while the status\_messages stores parameters specific to the current client (such as download speed, upload speed and others). Each line in the \*\_messages tables points to an entry in the client\_sessions table, identifying the peer it belongs to – the one that generated the log

specific to database connection inquiries.

**4.1.1 Parameter storage database**

fills a given role.

MySQL or SQLite.

Fig. 5. Database Schema

hardware specifics are also stored.

are using.

message.

When parsing log files, one has to know the ID of the client session that has generated the log file. In order to automate the process, there needs to be a mapping between the log file (or log archive) and the client session ID.

At the same time, the client session ID needs to exist in the client\_sessions table in the database, together with information such as BitTorrent client type, download speed limitation, operating system, hardware specification etc. This information needs to be supplied by the experimenter in a form that is both easy to create (by the experimenter) and parse.

A swarm description file is to be supplied by the experimenter. This file consists of all required swarm and peer information including the name/location of the log file/archive.

As we consider the INI format to be best suited for this, as it is fairly easy to create, edit and update, it was chosen to populate initial information. The experimenter may easily create an INI swarm description file and provide it to the parser together with the (compressed) log files.

The swarm description file is to be parsed by the experimenter and SQL queries will populate the database. One entry would go into the swarms table and a number of entries equal to the number of peers in the swarm description file would go into the client\_sessions table. As a result of these queries, swarm IDs and client sessions IDs are going to be created when running SQL insert queries (due to the AUTO\_INCREMENT options). This IDs are essential for the message parsing process and are going to be written down in the Logfile-ID-Mapping-File.

The swarm description file parser is going to parse that file and also generate a logfile-id mapping file. The parser is responsible for three actions:


A logfile-id mapping file is to be generated by the swarm description parser and will subsequently be used by the message parser (be it status messages or verbose messages). The mapping file simply maps a client session ID to a log file or a compressed set of log files. A sample file is stored in the repository. The message parser doesn't need to know client session information; it would just use the mapping file and populate entries in the \*\_messages tables.

The message parser is going to use the logfile-id mapping file and the log file (or compressed set of log files) to populate the \*\_messages tables in the database (status\_messages, peer\_status\_messages, verbose\_messages.

The workflow of the entire process is highlighted in Figure 6.

There is a separation between the *experimenter* – the one running trials and collecting information and the *parser* – the one interpreting the information.

BitTorrent Environments 17

Protocol Measurements in BitTorrent Environments 301

We consider two kinds of messages, dubbed *status messages* and *verbose messages* that may

Various approaches to collecting messages are presented, with differences in the method intrusiveness and quantity and quality of data: certain methods may require important updates to existing clients and, as such, access to the source code, while others may only

Collection, parsing, storage and analysis of logging information is the primary approach employed for protocol parameter measurements. A processing framework has been designed, implemented and deployed to collect and process status and verbose messages. Multiple parsers and multiple storage solutions are employed. Two types of processing may be used: post-processing, taking into account a previous collection of logging information into a log archive, and real-time processing when data may be monitored as it is parsed in real time. Protocol parameters are presented to the user through the use of a rendering engine that provides graphical representation of parameter evolution (such as the evolution of download speed or upload speed). The rendering engine makes use of the database results from the

Bardac, M., Milescu, G. & Deaconescu, R. (2009). Monitoring a BitTorrent Tracker for Peer-to-Peer System Analysis, *Intelligent Distributed Computing* pp. 203–208.

Das, S. & Kangasharju, J. (2006). Evaluation of Network Impact of Content Distribution

Deaconescu, R., Milescu, G., Aurelian, B., Rughinis, R. & Tapus, N. (2009). A Virtualized

Locher, T., Moor, P., Schmid, S. & Wattenhofer, R. (2006). Free Riding in BitTorrent is Cheap,

Naicken, S., Livingston, B., Basu, A., Rodhetbhai, S., Wakeman, I. & Chalmers, D. (2007). The

Pouwelse, J. A., Garbacki, P., Epema, D. H. J. & Sips, H. J. (2005). The Bittorrent P2P

Pouwelse, J. A., Garbacki, P., Wang, J., Bakker, A., Yang, J., Iosup, A., Epema, D. H. J., Reinders,

URL: *http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.3191*

Mechanisms, *Proceedings of the 1st International Conference on Scalable Information*

Infrastructure for Automated BitTorrent Performance Testing and Evaluation, *International Journal on Advances in Systems and Measurements* 2(2&3): 236–247. Iosup, A., Garbacki, P., Pouwelse, J. & Epema, D. (2006). Correlating Topology and

Path Characteristics of Overlay Networks and the Internet, *Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid*, CCGRID '06, IEEE

state of peer-to-peer simulators and simulations, *SIGCOMM Comput. Commun. Rev.*

File-Sharing System: Measurements And Analysis, *4th International Workshop on*

M., van Steen, M. R. & Sips, H. J. (2008). TRIBLER: A Social-based Peer-to-Peer

processing framework and provides a user friendly interface to the experimenter.

URL: *http://www.springerlink.com/index/r528521241850jnl.pdf*

Computer Society, Washington, DC, USA, pp. 10–. URL: *http://portal.acm.org/citation.cfm?id=1134822.1134925*

*Fifth Workshop on Hot Topics in Networks (HotNets-V)* . URL: *http://www.sigcomm.org/HotNets-V/program.html*

be extracted from clients and parsed, resulting in the required parameters.

need access to information provided as log files.

**6. References**

*Systems* pp. 35–es.

37(2): 95–98.

*Peer-to-Peer Systems (IPTPS)*.

#### Fig. 6. Workflow of Log Parsing Considering ID Mapping

Trials are run and the experimenter provides a log file or set of log files or archive of log files (the data) and a swarm description file (INI format) consisting of characteristics of clients in the swarm, the file used and the swarm itself (the metadata).

The swarm description file is used to provide an intermediary logfile-id mapping file, as described above. This file may be provided as a file system entry (typically INI), as an in-memory information or it may augment the existing swarm description file (only the client session ID needs to be added).

The logfile-ID mapping, the swarm description file and the log file(s) are then used by the message parser and the description parser to provide actual BitTorrent parameters to be stored in the database. The parsers would instantiate a specific storage class as required by the users and store the information there.

#### **5. Conclusion**

In order to provide thorough analysis of Peer-to-Peer protocols and applications, realistic trials and careful measurements were presented to be required. Clients and applications provide necessary parameters (such as download speed, upload speed, number of connections, protocol message types) that give an insight to the inner workings of clients and swarms.

Protocol analysis, centered around BitTorrent protocol, relies on collecting protocol parameters such as download speed, upload speed, number of connections, number of messages of a certain type, timestamp, remote peer speed, client types, remote peer IDs. We consider two kinds of messages, dubbed *status messages* and *verbose messages* that may be extracted from clients and parsed, resulting in the required parameters.

Various approaches to collecting messages are presented, with differences in the method intrusiveness and quantity and quality of data: certain methods may require important updates to existing clients and, as such, access to the source code, while others may only need access to information provided as log files.

Collection, parsing, storage and analysis of logging information is the primary approach employed for protocol parameter measurements. A processing framework has been designed, implemented and deployed to collect and process status and verbose messages. Multiple parsers and multiple storage solutions are employed. Two types of processing may be used: post-processing, taking into account a previous collection of logging information into a log archive, and real-time processing when data may be monitored as it is parsed in real time.

Protocol parameters are presented to the user through the use of a rendering engine that provides graphical representation of parameter evolution (such as the evolution of download speed or upload speed). The rendering engine makes use of the database results from the processing framework and provides a user friendly interface to the experimenter.

#### **6. References**

16 Will-be-set-by-IN-TECH

Trials are run and the experimenter provides a log file or set of log files or archive of log files (the data) and a swarm description file (INI format) consisting of characteristics of clients in

The swarm description file is used to provide an intermediary logfile-id mapping file, as described above. This file may be provided as a file system entry (typically INI), as an in-memory information or it may augment the existing swarm description file (only the client

The logfile-ID mapping, the swarm description file and the log file(s) are then used by the message parser and the description parser to provide actual BitTorrent parameters to be stored in the database. The parsers would instantiate a specific storage class as required by the users

In order to provide thorough analysis of Peer-to-Peer protocols and applications, realistic trials and careful measurements were presented to be required. Clients and applications provide necessary parameters (such as download speed, upload speed, number of connections, protocol message types) that give an insight to the inner workings of clients and swarms.

Protocol analysis, centered around BitTorrent protocol, relies on collecting protocol parameters such as download speed, upload speed, number of connections, number of messages of a certain type, timestamp, remote peer speed, client types, remote peer IDs.

Fig. 6. Workflow of Log Parsing Considering ID Mapping

the swarm, the file used and the swarm itself (the metadata).

session ID needs to be added).

and store the information there.

**5. Conclusion**


URL: *http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.3191*

Pouwelse, J. A., Garbacki, P., Wang, J., Bakker, A., Yang, J., Iosup, A., Epema, D. H. J., Reinders, M., van Steen, M. R. & Sips, H. J. (2008). TRIBLER: A Social-based Peer-to-Peer

**15** 

*Iran* 

*1Golestan University,* 

*2Ferdowsi University of Mashhad,* 

**Wide Area Measurement Systems** 

Mohammad Shahraeini1 and Mohammad Hossein Javidi2

In last two decades, power industries have been deregulated, restructured and decentralized in order to increase their efficiency, to reduce their operational cost and to free the consumers from their choices of electricity providers (Eshraghnia et al., 2006). As a result of these changes, in comparison with the traditional power systems, new competitive power industries face specific challenges that are related to their generation, operation and planning. As a consequence of these challenges, new intelligent systems should be introduced and established in the power systems in order to tackle such challenges. Wide Area Measurement Systems (WAMS) is a new term, which has been introduced to power system literatures in late 1980s. Recently, they are commercially available in power systems

To be able to monitor, operate and control power systems in wide geographical area, WAMS combines the functions of metering devices (i.e. new and traditional) with the abilities of communication systems (Junce & Zexiang, 2005). The overall capability of this particular combination is that data of the entire system can be obtained at the same time and the same place i.e. the control center. This data, which are obtained from the entire system, can be used by many WAMS functions, effectively. These facts indicate that nowadays, WAMS has been a great opportunity to overcome power systems' challenges related to the

This chapter is allocated to an in-depth survey of WAMS. To carry out this survey, WAMS process is firstly defined and classified into the three main interconnected sub-processes including data acquisition, data transmitting and data processing. These sub-processes are respectively performed by measurement systems, communication systems and WAMS

This chapter is organized as follows. Section 2 provides a basic background and history of WAMS. The definition of the WAMS is given in this section as well. In Section 3, the WAMS process is investigated and divided into three sub-processes. Section 4, 5 and 6 review the pre-mentioned sub-processes, one by one. Finally, this chapter ends with a brief summary

In this section, a brief background and history of WAMS are provided. Then, the accurate

**1. Introduction** 

applications.

**2. Background** 

and conclusions in Section 7.

definition of WAMS will be introduced.

for purposes of monitoring, operation and control.

restructuring, deregulation and decentralization.

System: Research Articles, *Concurr. Comput. : Pract. Exper.* 20: 127–138. URL: *http://portal.acm.org/citation.cfm?id=1331115.1331119*

Pouwelse, J., Garbacki, P., Epema, D. & Sips, H. (2004). A Measurement Study of the BitTorrent Peer-to-Peer File-Sharing System.

URL: *http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.3.4761*

