Research Projects‎ > ‎KamLAND‎ > ‎Berkeley Projects‎ > ‎

Data Processing

KamLAND Data Processing

[ Current Production Status ]

Data Collection

The signals from the photo-multiplier tubes (PMTs) in the KamLAND detector are read out by the electronics. The electronics processes and formats the data and is in turn read out by a data-acquisition (DAQ) system. KamLAND electronics stores waveform data coming from the PMTs, this is similar to what one sees on an oscilloscope. The DAQ system performs basic data validation and writes the data out to disk (in a format called SF-file). The data files are later copied onto tape and shipped to the KamLAND data processing facilities in Japan and in the US. We write approximately 120GB per day, 365 days a year. The amount of data is enormous and we are looking for a few rare events, it therefore needs considerable resources to do the data mining and find the interesting events.

The US data processing facility is located at NERSC in Oakland, CA. This is a large scientific computing facility funded by the Department of Energy. It has a large Linux computing cluster (called PDSF) and a High Performance Storage System (HPSS). The data is initially copied from the tapes shipped from Japan into the HPSS system. It is unfeasible to copy the data from the experimental site in Japan to HPSS directly, the network bandwidth is simply too small for the large data volume.

Data Processing

Once the data is in HPSS we can start processing it on the PDSF cluster. We use software written in C++ that is based on the ROOT framework to do the processing (the software is called AKat). First, the data is converted from the raw signals coming from the electronics (which look like oscilloscope traces) into a more useful quantity (time (T) and charge (Q)), which tell us the size and the timing of the pulses were detected by the PMTs. This data is stored into an intermediate analysis file that we named TQ-file. A second pass goes over the TQ files and does the actual reconstruction of the physical events, this data is stored in so-called RECON files. The main reason for breaking up the analysis into a two-step process is to  separate the time-consuming part of generating TQ information from the reconstruction part that will often change rapidly during the analysis. Reconstructing events from TQ files is a fairly quick process.

There are essentially two types of information that are stored in the RECON files: vertex objects contain the position of the event in the balloon volume and the energy seen by the detector. Track objects contain information about muons passing through the detector (about 1 in 100 events is a muon in KamLAND, the muons come from cosmic rays penetrating through the mountain). The muons need to be stored as additional information since they can produce particles (spallation products) during their passage through the detector that mimic the signal KamLAND is looking for. The events in the detector are vetoed for a certain time after a muon passes through it.

The anti-neutrino  signal in KamLAND is a delayed coincidence (see Physics Impact) between two different detector events: first a photon from the positron in the reaction is detected and about 200µs later a photon from the capture of the neutron is registered. This means that one other step has to occur in the analysis; the prompt and delayed event have to be correlated.

The correlation information is extracted from the RECON files and stored as event 'multiplets' in so-called Coincidence files. These are the files from which we extract the anti-neutrino candidate list, by performing appropriate cuts (see Data Analysis).

 File Sizes

The following table lists the reduction of file sizes as the information if further extracted. As an example, the total file sizes of run1449 (a 24 hour run on October 2, 2002) are listed:

File Type
File Size

So there is a reduction by a  factor of 3000 in file size when going from the data that is collected by the detector to the coincidence format!