**2.1.1.2 Data processing**

Data processing is accomplished using a High-Energy Physics (HEP) data grid. The objective of the high-energy physics data grid is to construct a system to manage and process highenergy physics data and to support the high-energy physics community (Cho, 2007).

For data processing, Taiwan has the only WLCG Tier-1 center and Regional Operation Center in Asia since 2005. ASGC has also been serving as the Asia Pacific Regional Operational Center to maximize grid service availability and to facilitate extension of e-Science (Lin & Yen, 2009). In Japan, a Tier-2 computing center supporting the A Toroidal LHC Apparatus (ATLAS) experiment has been running at the University of Tokyo. There is another Tier-2 center at Hiroshima University for the A Large Ion Collider Experiment (ALICE) (Matsunaga, 2009). At KEK, collaborating institutes operate a grid site as members of the WLCG. These institutes try to use their grid resources for the Belle and Belle II experiments. The Belle II experiment, which will start in 2015, will use distributed computing resources.

We explain the history of data processing for the CDF experiment. The CDF is an experiment on the Tevatron, at Fermilab. The CDF group ran its Run II phase between 2001 and 2011. CDF computing needs include raw data reconstruction, data reduction, event simulation, and user analysis. Although very different in the amount of resources needed, they are all naturally parallel activities. The CDF computing model is based on the concept of a Central Analysis Farm. The increasing luminosity of the Tevatron collider has caused the computing requirement for data analysis and Monte Carlo production to grow larger than available dedicated CPU resources. In order to meet demand, CDF has examined the possibility of using shared computing resources. CDF is using several computing processing systems, such as CAF, Decentralized CDF Analysis Farm (DCAF), and grid systems. The

The e-Science Paradigm for Particle Physics 81

Grid middleware

The Pacific CDF Analysis Farm is a distributed computing model on the grid. It is based on the Condor glide-in concept, where Condor daemons are submitted to the grid, effectively creating a virtual private batch pool. Thus, submitted jobs and results are integrated and are shared in grid sites. For work nodes, we use both LCG and Open Science Grid (OSG) farms. The head node of Pacific CDF Analysis Farm is located at the Academia Sinica in Taiwan. Now it has become a federation of one LCG farm at the KISTI in Korea, one LCG farm at the

A data analysis using collaborative tools is for collaborations around the world to analyze and publish the results in collaborative environments. We installed an operator EVO server

(USA) etc OSG Condor over Globus CDF VO

LCG WMS (Workload

(Korea) etc LCG, OSG Condor over Globus CDF VO

Method VO (Virtual

Management System CDF VO

Organization)

Fig. 4. The scheme of the Pacific CDF analysis farm.

Analysis Farm Head node Work node

Fermilab (USA)

> CNAF (Italia)

AS (Taiwan)

Table 1. Comparison of grid farms for CDF.

**2.1.1.3 Data analysis using collaborative tools** 

USSD

IN2P3 (France) etc

KISTI

University of Tsukuba in Japan and one OSG and two LCG farms in Taiwan.

Grid CDF

North America CDF Analysis Farm

European CDF Analysis Farm

Pacific CDF Analysis Farm

Korea group has built a DCAF for the first time. Finally, we have constructed a CDF grid farm at KISTI using an LCG farm.

In 2001, we have built a CAF, which is a cluster farm inside Fermilab in the United States. The CAF was developed as a portal. A set of daemons accept requests from the users via kerberized socket connections and a legacy protocol. Those requests are then converted into commands to the underlying batch system that does the real work. The CAF is a large farm of computers running Linux with access to the CDF data handling system and databases to allow the CDF collaborators to run batch analysis jobs. In order to submit jobs we use a CAF portal with two special features. First, we can submit jobs from anywhere. Second, job output can be sent directly to a desktop or stored on a CAF File Transfer Protocol (FTP) server for later retrieval (Jeung et al., 2009).

In 2003, we have built a DCAF, a cluster farm outside Fermilab. Therefore, CDF users around the world enabled to use it like CAF at Fermilab. A user could submit a job to the cluster either at Central Analysis Farm or at the DCAF. In order to run the remote data stored at Fermilab in USA, we used SAM. We used the same GUI used in Central Analysis Farm (Jeung et al., 2009).

In 2006, we have built CDF grid farms in North America, Europe, and Pacific Asia areas. The activity patterns at HEP required a change in the HEP computing model from clusters to a grid in order to meet required hardware resources. Dedicated Linux clusters on the Farm Batch System Next Generation (FBSNG) batch system were used when CAF launched in 2002. However, the CAF portal has gone from interfacing to a FBSNG-managed pool to Condor as a grid-based implementation since users do not need to learn new interfaces (Jeung et al., 2009).

We have now adapted and converted out a workflow to the grid. The goal of movement to a grid for the CDF experiment is a worldwide trend for HEP experiments. We must take advantage of global innovations and resources since CDF has a lot of data to be analyzed. The CAF portal may change the underlying batch system without changing the user interface. CDF used several batch systems. The North America CDF Analysis Farm and the Pacific CDF Analysis Farm is a Condor over Globus model, whereas the European CDF Analysis Farm is a LCG (Large Hadron Collider Computing Grid) Workload Management System (WMS) model. Table 1 summarizes the comparison of grid farms for CDF (Jeung et al., 2009). Fig. 4 shows the CDF grid farm scheme (Jeung et al., 2009). Users submit a job after they input the required information about the job into a kerberized client interface. The Condor over Globus model uses a virtual private Condor pool out of grid resources. A job containing Condor daemons is also known as a glide-in job. The advantage of this approach is that all grid infrastructures are hidden by the glide-ins. The LCG WMS model talks directly to the LCG WMS, also known as the Resource Broker. This model allows us to use grid sites where the Condor over Globus model would not work at all and is adequate for grid job needs. Since the Condor based grid farm is more flexible, we applied this method to the Pacific CDF Analysis Farm (Jeung et al., 2009).

The regional CDF Collaboration of Taiwan, Korea and Japanese groups have built the CDF Analysis Farm, which is based on grid farms. We called this federation of grid farms the Pacific CDF Analysis Farm.

80 Particle Physics

Korea group has built a DCAF for the first time. Finally, we have constructed a CDF grid

In 2001, we have built a CAF, which is a cluster farm inside Fermilab in the United States. The CAF was developed as a portal. A set of daemons accept requests from the users via kerberized socket connections and a legacy protocol. Those requests are then converted into commands to the underlying batch system that does the real work. The CAF is a large farm of computers running Linux with access to the CDF data handling system and databases to allow the CDF collaborators to run batch analysis jobs. In order to submit jobs we use a CAF portal with two special features. First, we can submit jobs from anywhere. Second, job output can be sent directly to a desktop or stored on a CAF File Transfer Protocol (FTP)

In 2003, we have built a DCAF, a cluster farm outside Fermilab. Therefore, CDF users around the world enabled to use it like CAF at Fermilab. A user could submit a job to the cluster either at Central Analysis Farm or at the DCAF. In order to run the remote data stored at Fermilab in USA, we used SAM. We used the same GUI used in Central Analysis

In 2006, we have built CDF grid farms in North America, Europe, and Pacific Asia areas. The activity patterns at HEP required a change in the HEP computing model from clusters to a grid in order to meet required hardware resources. Dedicated Linux clusters on the Farm Batch System Next Generation (FBSNG) batch system were used when CAF launched in 2002. However, the CAF portal has gone from interfacing to a FBSNG-managed pool to Condor as a grid-based implementation since users do not need to learn new interfaces

We have now adapted and converted out a workflow to the grid. The goal of movement to a grid for the CDF experiment is a worldwide trend for HEP experiments. We must take advantage of global innovations and resources since CDF has a lot of data to be analyzed. The CAF portal may change the underlying batch system without changing the user interface. CDF used several batch systems. The North America CDF Analysis Farm and the Pacific CDF Analysis Farm is a Condor over Globus model, whereas the European CDF Analysis Farm is a LCG (Large Hadron Collider Computing Grid) Workload Management System (WMS) model. Table 1 summarizes the comparison of grid farms for CDF (Jeung et al., 2009). Fig. 4 shows the CDF grid farm scheme (Jeung et al., 2009). Users submit a job after they input the required information about the job into a kerberized client interface. The Condor over Globus model uses a virtual private Condor pool out of grid resources. A job containing Condor daemons is also known as a glide-in job. The advantage of this approach is that all grid infrastructures are hidden by the glide-ins. The LCG WMS model talks directly to the LCG WMS, also known as the Resource Broker. This model allows us to use grid sites where the Condor over Globus model would not work at all and is adequate for grid job needs. Since the Condor based grid farm is more flexible, we applied this method to

The regional CDF Collaboration of Taiwan, Korea and Japanese groups have built the CDF Analysis Farm, which is based on grid farms. We called this federation of grid farms the

farm at KISTI using an LCG farm.

server for later retrieval (Jeung et al., 2009).

the Pacific CDF Analysis Farm (Jeung et al., 2009).

Pacific CDF Analysis Farm.

Farm (Jeung et al., 2009).

(Jeung et al., 2009).

Fig. 4. The scheme of the Pacific CDF analysis farm.


Table 1. Comparison of grid farms for CDF.

The Pacific CDF Analysis Farm is a distributed computing model on the grid. It is based on the Condor glide-in concept, where Condor daemons are submitted to the grid, effectively creating a virtual private batch pool. Thus, submitted jobs and results are integrated and are shared in grid sites. For work nodes, we use both LCG and Open Science Grid (OSG) farms. The head node of Pacific CDF Analysis Farm is located at the Academia Sinica in Taiwan. Now it has become a federation of one LCG farm at the KISTI in Korea, one LCG farm at the University of Tsukuba in Japan and one OSG and two LCG farms in Taiwan.

### **2.1.1.3 Data analysis using collaborative tools**

A data analysis using collaborative tools is for collaborations around the world to analyze and publish the results in collaborative environments. We installed an operator EVO server

The e-Science Paradigm for Particle Physics 83

For new computing-experimental tools, we have worked on a Belle II data handling system. The Belle II experiment will begin at KEK in 2015. Belle II computing needs to include raw data reconstruction, data reduction, event simulation, and user analysis. The Belle II experiment will

Therefore, we have very large disk space requirements and potentially unworkably long analysis times. Therefore, we suggested a meta-system at the event-level to meet both requirements. If we have good information at the meta-system level, we can reduce the CPU

The collider will cause the computing requirement for data analysis and Monte Carlo production to grow larger than available CPU resources. In order to meet these challenges, the Belle II experiment will use shared computing resources as the Large Hadron Collider (LHC) experiment has done. The Belle II experiment has adopted the distributed computing

In the Belle experiment (Abashian et al., 2002), we use a metadata scheme that employs a simple "index" file. This is a mechanism to locate events within a file based on predetermined analysis criteria. The index file is simply the location of interesting events within a larger data file. All these data files are stored on a large central server located at the KEK laboratory. However, for the Belle II experiment, this will not be sufficient as we will distribute the data to grid sites located around the world. Therefore, we need a new metadata service in order to construct the Belle II data handling system (Kim, et al. 2011;

2 This section is based on the paper titled "The embedment of a metadata system at grid farms at the Belle II experiment" by S. Ahn et al. in Journal of the Korean Physical Society, Vol. 59, No. 4, pp. 2695-2701, (2011).

have a data sample about 50 times greater than that collected by the Belle experiment.

model with several computing processing systems such as grid farms (Kuhr, 2010).

**2.1.2 New computing-experimental tools2**

time required for analysis and save disk space.

Fig. 6. Data handling scenario at the Belle II experiment.

Ahn, et al., 2010).

at KISTI. Using this environment, we study high energy physics for CDF and Belle experiments. EVO is the next version of its predecessor, Virtual Room Videoconferencing System (VRVS). The first release of EVO was announced in 2007. The EVO system is written in the Java programming language. The EVO system provides a client application named "Koala." The Koala plays two client roles in order to communicate with two types of servers. The first type is a central server located in Caltech and handles videoconferencing sessions. Participants can use a Koala to enter a session that another participant created or book a new session. Once a participant is in a session, the Koala starts to play the role of another type of client that now communicates with one of the networked servers that handle the flow of media streams. The second type of server comprising a network is called "Panda." When a Koala is connected to a specific Panda, the Koala initiates a video tool called "vievo" and an audio tool called "rat," both of which have their origins in the "MBone" project. EVO has improved upon VRVS with the following new features: support for Session Initiation Protocol (SIP), including ad-hoc or private meetings, encryption, private audio discussion inside a meeting, and whiteboard. In 2007, we constructed the EVO system at KISTI since the Korean HEP community is large enough to have its own EVO Panda servers. The configuration of two servers by the Caltech group enables the first Korean Panda servers to run. Fig. 5 shows communications between KISTI Panda servers and other Panda servers in the EVO network. Since its introduction in 2007, KISTI Panda servers have served many communities such as the Korean Belle community and the Korean CDF community.

Fig. 5. Communications between KISTI "Panda" servers and other "Panda" servers in the EVO network.

82 Particle Physics

at KISTI. Using this environment, we study high energy physics for CDF and Belle experiments. EVO is the next version of its predecessor, Virtual Room Videoconferencing System (VRVS). The first release of EVO was announced in 2007. The EVO system is written in the Java programming language. The EVO system provides a client application named "Koala." The Koala plays two client roles in order to communicate with two types of servers. The first type is a central server located in Caltech and handles videoconferencing sessions. Participants can use a Koala to enter a session that another participant created or book a new session. Once a participant is in a session, the Koala starts to play the role of another type of client that now communicates with one of the networked servers that handle the flow of media streams. The second type of server comprising a network is called "Panda." When a Koala is connected to a specific Panda, the Koala initiates a video tool called "vievo" and an audio tool called "rat," both of which have their origins in the "MBone" project. EVO has improved upon VRVS with the following new features: support for Session Initiation Protocol (SIP), including ad-hoc or private meetings, encryption, private audio discussion inside a meeting, and whiteboard. In 2007, we constructed the EVO system at KISTI since the Korean HEP community is large enough to have its own EVO Panda servers. The configuration of two servers by the Caltech group enables the first Korean Panda servers to run. Fig. 5 shows communications between KISTI Panda servers and other Panda servers in the EVO network. Since its introduction in 2007, KISTI Panda servers have served many communities such as the Korean Belle community and the Korean

Fig. 5. Communications between KISTI "Panda" servers and other "Panda" servers in the

CDF community.

EVO network.
