Instruction and Duties of the offline shifter (Oct 2004): ===================================================================== MONITORING OF DATAREC ===================== REMIND: $DR/ = /kloe/soft/off/datarec/ 1) Monitoring ------------- A monitoring of datarec is done every two hours by the script $DR/kumac/stat_update.csh as a batch process called by crontab. It process the runs (usually a bunch of runs), for which at least 6 files have been reconstructed in the last 3 hours, and it runs two programs: 1) datarec_sum.csh which retrieves information on counters (for example the numbers of klcrash) from datarec log files, and store them in the file "runstat" in the area /runcond/datarec/newstat. Note that that for each run, 7 lines are added at the end of "runstat" file; 2) histos_run.kumac which retrieves kinematical information on the run (for examples sqrt(s)) from the histos produced by datarec, and store them in the file "histogram_history", in the same area /runcond/datarec/newstat. Note that that for each run, 20 lines are added at the end of "histogram_history" file; Note that exist a README.runstat and README.history which list the variables stored for each file. In the same area (/runcond/datarec/newstat) exist the following other files: 1)stat_update.log: it is the log file of the program stat_update.csh Usually a bunch of runs are processed, like: Running datarec_summary for run 32500 Running datarec_summary for run 32501 Running datarec_summary for run 32502 Running datarec_summary for run 32503 Running histos for run 32500 Running histos for run 32501 Running histos for run 32502 Running histos for run 32503 The line "Running datarec_summary..." is written out by datarec_sum.csh and "Running histos , is written by histos_run.kumac. 2)stat_update.lock : it is a temporary file created by stat_update.csh, and removed at its end. It avoid that a following stat_update.csh to perform any operation. DUTIES OF THE OFFLINE SHIFTER ----------------------------- logon on fibm (35,12,11) and go to the directory: /runcond/datarec/newstat Possible indication of problems: There is the lock file stat_update.lock whith a date (and time) well in advance respect to the current time. Usually a problem arises if the the time of the lock file is more than 1 hours ago, since stat_update.csh shouldn't take more than 1 hour to complete its process. Example: 0 -rw-r--r-- 1 kloerec kloe 0 Oct 25 05:54 stat_update.lock 2000 -rw-r--r-- 1 kloerec kloe 2046332 Oct 25 06:16 runstat 208 -rw-r--r-- 1 kloerec kloe 211929 Oct 25 06:16 rejected 7936 -rw-r--r-- 1 kloerec kloe 8098086 Oct 25 06:36 histogram_history 1500 -rw-r--r-- 1 kloerec kloe 1532046 Oct 25 13:54 stat_update.log This must not happen in the working procedure and it indicates that the monitoring is not properly working. Actions to take: --------------- 1)Look to the last run fully reconstructed which has been processed by the monitoring program: (do, for example, "tail -60 runstat"). Check that the same run is also present in histogram_history. We will call this run run_first ========= Look to the last run that has been recontructed by datarec. Do, for example, "list_runs", and look to the last run with the number XX (example 20) in the "ana" column. We will call this run run_last ======== Verify that run_last > run_first. 2) telnet on fibm35, as kloerec: 2.1) remove lock file (stat_update.lock) 2.2) kill the monitoring process: ps -fu kloerec |grep stat ps -fu kloerec |grep cern example: kloerec 21732 10320 0 06:01:00 - 0:00 /bin/tcsh -f /kloe/soft/off/datarec/kumac/stat_update.csh kloerec 34854 21732 74 06:42:46 - 564:44 /cern/pro/bin/pawX11 -n -b /kloe/soft/off/datarec/kumac/histos_run.kumac Update runstat file: /kloe/soft/off/datarec/kumac/make_datarec_sum.csh run_first run_last dbv update histogram_history file: /kloe/soft/off/datarec/kumac/make_histos.csh run_first run_last dbv dbv is the database version (currently is 20). NOTE that while the update of runstat file hould takes few minutes, the update of histogram_history can take more than one hour.