INSTRUCTIONS TO RUN PHYSMON (OFFLINE) for 2004 data - June 2005 =============================================================== A) HOW TO SUBMIT PHYSMON JOBS: 1) telnet fibm11 as kloerec 2) fibm11:~> submit_physmon run1 run2 3) check the log file submit_physmon_run1_run2_yyyy-mm-dd-xxxx in the home directory; it should contain the job submission list. Remeber that the log files are in /analysis/physmon/2004/logs/physmon_XXX.log on even(fibm12) and odd(fibm11) machine. B) HOW TO CHECK THE JOB STATUS: 1) fibm11:~> llq -c krec | more With this command the running and pending jobs are listed; select the PID number of a running job to perform next step. At present, 24 slots (fibm13 to fibm37) are allocated to run PHYSMON. This slot number may change according to the load required to run the different off-line jobs. 2) fibm11:~> llq -l fibm11.xxxxxx.0 | more ( fibm11.xxxxxx.0 is the PID selected at point 1) ) you get something like: =============== Job Step fibm11.lnf.infn.it.259655.0 =============== Job Step Id: fibm11.lnf.infn.it.259655.0 Job Name: physmon_26037 Step Name: 0 Structure Version: 9 Owner: kloerec Queue Date: Fri Apr 8 10:36:49 DFT 2005 Status: Running Dispatch Time: Fri Apr 8 11:52:06 DFT 2005 Completion Date: Completion Code: User Priority: 50 user_sysprio: 0 class_sysprio: 20 group_sysprio: 0 System Priority: -5660612 q_sysprio: -5660612 Notifications: Complete Virtual Image Size: 4 kilobytes Checkpoint: Restart: yes Hold Job Until: Cmd: /kloe/soft/off/datarec/script/physmon.pl Args: 26037 Env: In: /dev/null Out: /analysis/physmon/logs/physmon_26037.log Err: /analysis/physmon/logs/physmon_26037.err =============== Omissis.. =============== Retreive the actual run number from the last lines above; at present, a nrun.log file and a nrun.err file are generated: however, not all problems detected in the *.log file generate or are written in the corresponding *.err file; for this reason a script to check the *.log files must be used (see below). C) HOW TO CHECK THE PHYSMON *.log FILES: The following script /analysys/physmon/2004/check_errors.csh was created to search for a set of known message error. At the moment only 5 types of error message were found in logs and than flag in the script. Instructions to modify the script to search for new message errors can be found inside the script itself. To check a given run range: 1) fibm11:~> /analysys/physmon/2004/check_errors.csh first_run last_run (if you want check just one run, set last_run = first_run) Runs with a known error message and the error type itself are listed in the output file /analysis/physmon/2004/check_err_first_run_last_run_{11,12}.list 2) Check the previuos file; only runs with some known problem and the problem itself are listed in the *.list file. Remember to delete it when diagnostics is over. 3) repeat points 1) and 2) on an "odd" machine D) HOW TO CHECK THE DC RELATED OUTPUT FILES OF PHYSMON Scripts for scanning the physmon output for the DC are in /analysis/physmon/2004/ More specifically: the scripts check_dead.csh and check_eff.csh are used to check the content of the 0xxxxx_dcdeads.txt and 0xxxxx_dceffic.txt files produced by physmon (xxxxx is the run number). Both scripts act in a similar way; first a cleanup of the eventually existing old produced files is made, than the list of the *.txt files found in the given run range is written in the files dcdead_file_{11,12}.list or dceff_file_{11,12}.list To actually make the check: 1) > check_xxxx.csh nrun1 nrun2 (xxxx = dead or eff) (nrun1(2) is the first(last) run to be checked; if you want check just one run, set run1 = run2 ) 2) Look at dcdead_chs.list and/or dceff_low files. 3) Rename these last *.list files; the original dcdead_chs.list or dceff_low.list files are deleted each time you run the *.csh.