The KLOE Data Handling System

Introduction

The KLOE Data Handling system (DH) is responsible for the moving around KLOE events. Most of the time this means locating a file registered in the KLOE relational database, moving it to a user accessible disk area and then read out the events, but the DH system is also capable of accessing any YBOS file or even a memory buffer, both locally and remotely.

The DH system is composed of the following pieces:

A very important role is also played by the KLOE relational database, especially the bookkeeping and system catalog sections.

Why a DH system?

To achieve its goals, KLOE is acquiring huge amounts of data; the estimated needs are around 200G events for a total storage size of 1 PB. To efficiently manage all these data, a multi-layer scheme is used.

Most of the data in the KLOE environment are produced as a result of the readout of the FEE after the trigger system has identified a possibly interesting event. These data are collected together, formatted and written to a raw file on an online disk pool. Some events, like BhaBha events, are easy to identify after they have been formatted and they are also very interesting for monitoring and calibration purposes. For these reasons they get duplicated during the data acquisition process and written to separate raw files that make another source of data. Whichever the source, after a raw file reaches a set maximum size, it is closed and marked read-only, ready-to-be-analyzed and ready-to-be-archived.

Another source of data are simulated or MC events. They are produced by MC generators, that simulate physics dynamics and create events that contain the same information it would be read-out by the FEE (plus some MC specific data). These events are written to mcraw files on a temporary disk area. When the desired amount of events has been created, the file is closed and marked read-only and ready-to-be-analyzed.

Once a raw file is declared ready-to-be-analyzed, it is processed by an offline reconstruction process. Each event contained in the input file is reconstructed and assigned to one or more classes or streams. For each stream that was marked as to be saved, the event is written in a dedicated datarec file (once for every stream) on an offline disk pool. Once all the events from the input file has been processed, all the related datarec files are closed and marked read-only and ready-to-be-archived. A similar process takes place also when a mcraw file is declared ready-to-be-analyzed, but streaming is normally not performed; that is, all the events are written in a mc file. Moreover, when the mc file is closed, the mcraw file is deleted from its disk area, since all its data are contained in the new mc file.

When a file is declared ready-to-be-archived, it is processed by the archiver process that copies it to the tape library. Once archived, the file can be deleted from its disk pool to make space for new files.

When a raw, datarec or mc file is declared read-only, it can be read-out by an analysis process. However, the disk pool where the file was written may not be (and normally is not) visible by the target machine and moreover the file could have been already canceled from the disk pool, so a copy of it is created on a recall disk pool by the data handling system. Once the copy is created (or an existing copy located), the requesting process can access it. The copy is guaranteed to stay on that disk pool as long as at least one process is accessing it, but can be deleted from the disk pool shortly afterward to make space for a copy of another file.

All these activities are obviously unfeasible to be carried on by the final user.

See also the following figure:

KID

KID, or KLOE Integrated Dataflow, is the main interface to the data handling system. Its major advantage it the ease of use; it allows a user to access one or more sources of KLOE events using a simple URI, such as:

The library itself is very modular; the core services are just URI parsers, that redirect calls to plug-ins. The standard library comes complete with several plug-ins, the most used being:

but users can add their own ones, just following simple rules.

For more details, see the dedicated page.

The recall system

Since copies of the KLOE files can be located in several places and not all of them are accessible by the users, the recall system is normally contacted to obtain a location for the needed files (i.e. a recall disk area). If the file is already on a recall disk area, just the area index will be returned, else the recall system will take care of making a copy on the least used recall area, using the fastest available source.

The recall daemon

The recall daemon is the heart of the data handling system. All the request regarding file movement, except the archiving ones, are managed by this process.

A process needing some files sends to the daemon the list of those, together with a list of possible recall areas. If a file is already on one of these areas, it just notifies the client where to find it. Else, it makes a copy of the file, using one of the file management daemons, on one of the specified areas and then notifies the client.

The programming interfaces

The main API for the recall system is a C library; it implements the whole communication protocol to the recall daemon. On top of it, a tcl command has also been implemented.

For more details on both the daemon and the APIs, see the dedicated page.

The command line tools

KID, and occasionally the APIs of the recall system, are very useful when writing applications that must access the KLOE data, but some command line interface is also needed, especially for non-programmers.

Currently three command line tools are available:

The syntax is similar for all of the above commands. The user is asked to select the file type (raw, datarec or mc) and give a where part for an SQL query on the logger.xxx_data table (raw_data, datarec_data or mc_data). Moreover, the search can be restricted only to files that are already on a disk area.

The complete syntax is:

The output of kls is a set of lines, where each line contains a filename and a list of locations where the file resides. Files can be located:

An example of use:

kls datarec "stream_code='bgg' and version=16 and run_nr=22653"
Found 24 files:
 bgg022653N_ALL_f05_1_1_1_16.000    0.2Mb offline recalled: {f0a 14}
 bgg022653N_ALL_f05_1_1_1_16.001    0.2Mb offline recalled: {f0a 14}
 bgg022653N_ALL_f05_1_1_1_16.002    0.2Mb offline recalled: {f0a 14}, {f0b 14}
 bgg022653N_ALL_f05_1_1_1_16.003    0.1Mb on tape
 bgg022653N_ALL_f05_1_1_1_16.004    0.1Mb offline
 bgg022653N_ALL_f05_1_1_1_16.005    0.2Mb on tape
 bgg022653N_ALL_f05_1_1_1_16.006    0.2Mb on tape
 bgg022653N_ALL_f05_1_1_1_16.007    0.2Mb offline
 bgg022653N_ALL_f06_1_1_1_16.000    0.2Mb on tape
 bgg022653N_ALL_f06_1_1_1_16.001    0.2Mb offline
 bgg022653N_ALL_f06_1_1_1_16.002    0.2Mb offline recalled: {f0a 15}
 bgg022653N_ALL_f06_1_1_1_16.003    0.1Mb offline
 bgg022653N_ALL_f06_1_1_1_16.004    0.1Mb offline recalled: {f0a 14}, {f0a 15}
 bgg022653N_ALL_f06_1_1_1_16.005    0.1Mb offline
 bgg022653N_ALL_f06_1_1_1_16.006    0.2Mb on tape
 bgg022653N_ALL_f06_1_1_1_16.007    0.2Mb on tape
 bgg022653N_ALL_f07_1_1_1_16.000    0.2Mb offline
 bgg022653N_ALL_f07_1_1_1_16.001    0.2Mb offline
 bgg022653N_ALL_f07_1_1_1_16.002    0.2Mb on tape
 bgg022653N_ALL_f07_1_1_1_16.003    0.1Mb offline
 bgg022653N_ALL_f07_1_1_1_16.004    0.1Mb offline
 bgg022653N_ALL_f07_1_1_1_16.005    0.1Mb on tape
 bgg022653N_ALL_f07_1_1_1_16.006    0.2Mb on tape
 bgg022653N_ALL_f07_1_1_1_16.007    0.2Mb on tape

Limiting the output only to files on disk:

kls -d datarec "stream_code='bgg' and version=16 and run_nr=22653"
Found 14 files:
 bgg022653N_ALL_f05_1_1_1_16.000    0.2Mb offline recalled: {f0a 14}
 bgg022653N_ALL_f05_1_1_1_16.001    0.2Mb offline recalled: {f0a 14}
 bgg022653N_ALL_f05_1_1_1_16.002    0.2Mb offline recalled: {f0a 14}, {f0b 14}
 bgg022653N_ALL_f05_1_1_1_16.004    0.1Mb offline
 bgg022653N_ALL_f05_1_1_1_16.007    0.2Mb offline
 bgg022653N_ALL_f06_1_1_1_16.001    0.2Mb offline
 bgg022653N_ALL_f06_1_1_1_16.002    0.2Mb offline recalled: {f0a 15}
 bgg022653N_ALL_f06_1_1_1_16.003    0.1Mb offline
 bgg022653N_ALL_f06_1_1_1_16.004    0.1Mb offline recalled: {f0a 14}, {f0a 15}
 bgg022653N_ALL_f06_1_1_1_16.005    0.1Mb offline
 bgg022653N_ALL_f07_1_1_1_16.000    0.2Mb offline
 bgg022653N_ALL_f07_1_1_1_16.001    0.2Mb offline
 bgg022653N_ALL_f07_1_1_1_16.003    0.1Mb offline
 bgg022653N_ALL_f07_1_1_1_16.004    0.1Mb offline

See also the KID page for more examples, especially the dbraw, dbfastraw, dbdatarec, dbfastdatarec, dbmc and dbfastmc protocols, and the dedicated pages.

The archiving daemon

The archiving daemon is in charge of archiving the newly produced files, i.e. of making a copy of those files on disk. This process is asynchronous of all the other processes; its task is to make sure all the files are archived, while keeping the number of tape mounts as low as possible and also enough space on the DAQ and offline disk pool at all times.

For more details, see the dedicated page.

The file management daemons

Three types of file management processes are currently in place:

Each of them has its own tasks and does not interact directly with the others. See their dedicated pages for more details.


Send comment to Igor Sfiligoi.