Instructions for running MC production: MM 26-Sep-2005 ========================================================= Overview: ========= The procedure for MC production consists of two parts: the script which processes an individual run (mcprod.pl) and the script for job submission (submit_mcprod.pl). Both scripts are in /kloe/soft/off/datarec/script. submit_mcprod.pl puts all requested jobs, one per physical run number, on the kmc queue. Load Leveler executes the jobs off the queue with mcprod.pl, which handles the actual production for an individual run. mcprod.pl does the following things: - Decides how many MC runs are needed to simulate the physical run, how to break down the raw files for the physical run into the right number of MC runs, and how many events to generate for each MC run - Runs geanfi to generate .mco files, which exist only temporarily, and are not stored in the tape library - Runs datarec to add background to and reconstruct the .mco files, resulting in .mcr files, which ARE stored in the tape library mcprod.pl alternately performs generation and reconstruction for as many MC files as are needed, as described above. Once the complete set of .mcr files exists, mcprod.pl produces one or more DST types from the set of .mcr files, as requested. There is also a script called mcprod_dst.pl which only makes the DST's from closed .mcr files. The structure of mcprod_dst.pl is similar to that of mcprod.pl. (There is also a submit_mcprod_dst.pl.) With minor a few minor modifications, this script can be used to sum the DST's for the same run and card type but generated in two different batches. Important! ========== Keep in mind are that a single process can take many hours to complete (up to 14 hrs for a 200 nb-1 run simulated with the all_phys card at 1:5 scale), and that the cleanup of a partially completed job is, at the moment, fairly laborious. The script is very robust, but if the machines go down, the resulting mess will take some patience to clean up. So: - Don't kill any running processes - Stop all submission and take jobs IN WAITING out of the queue several hours before any planned shutdown, where "several" depends on the nature of the job and ranges from 2-3 hours for eps_ppg at 5:1 scale, to about 14 hours for all_phys of 2002 data at 1:5 scale. Queues and job submission: ========================== In a typical configuration, the kmc queue has 87 slots, 4 each for each machine from fibm23 to fibm33 (old machines), 3 on fibm34 (old machine), and 4 each for fibm35 to fibm44 (new machines). Fabio Fortugno can change this allocation, even when jobs are running and/or in the queue. When a new batch of runs is submitted, job submission takes about a half hour. The actual submission takes only a few minutes for a large interval of, say, 1000 runs, but in order to avoid conflicting demands on the DB2 database, the job submitter submits jobs at 30 second intervals whenever there are free CPU's in the kmc queue. Once all of the CPU's are occupied, the submitter puts any remaining jobs on the queue all at once (within a minute or two), and exits. The job submitter takes three arguments: the name of the MC job (technically, the name of the .conf file, which in most cases is the card name), and the first and last physical run number to simulate. The DB is checked to make sure that each run number in the interval has been processed by datarec and has not yet been simulated with the specified card and with an MC run number in the range corresponding to the production campaign. Finer checks are made that the level of submission of an individual job. It should be totally safe to submit a physical run number which has already been simulated--it will just be skipped. The jobs are executed in the order in which they were placed on the queue. The model is one-pass; if a job fails, it is not attempted again. Jobs are executed until the queue is clear. The idea is to issue a cleanup pass at the end, once all problems that caused crashes have been resolved. This doesn't require any detailed bookkeeping on the part of the operator, since jobs that were successfully processed are simply skipped when resubmitted. Job parameters, temporary files, and log files: =============================================== Job parameters are furnished in the corresponding .conf file. The conf file is in /runcond/mcprod. If you are submitting all_phys jobs, for example, the first parameter of the submit_mcprod command is the string all_phys, and the scripts will reference /runcond/mcprod/all_phys.conf If I have done everything right, there should be no temporary files to worry about. There is a timestamp file in /runcond/mcprod, but its presence or absence should not affect the success of a job. Log files come out in the place specified in the .conf file, typically /analysis/mcprod/logs. Note that /analysis is physically a different disk for jobs processed on even- and odd-numbered machines. There are actually two log files. The traditional .log file captures all output sent to stdout. There is also an error file, .err, which captures output sent to stderr. The idea was that only mcprod.pl and the operating system would write to stderr, so that .err would contain only messages relevant to crash conditions, for jobs that crash. Since Load Leveler deletes empty .err files at job end, the idea was that the presence or absence of the .err file by itself would signal the failure or success of each job. However, there is an S_I subroutine in A_C which writes a garbage message ("reading link value : Invalid argument") to stderr, which contaminates the .err files. An .err file for a successful job will therefore have only these messages. Any other writing in the .err file indicates a potentially fatal job condition. In most cases, mcprod.pl will state the error condition clearly in the .err file. It is not usually necessary for the offline expert to maintain a detailed list of the run numbers for which processing was unsuccessful. There are scripts and database queries to help with this---see M. Moulson for details. The most important things for the expert to keep track of are: 1) That the jobs are running well in general. The expert should have a rough idea of what types of errors come up and how often. The expert should be aware as to whether there are runs that will need to be resubmitted for processing or for DST-only production. There are scripts for analyzing the log files in bulk and reporting the frequency of errors by type. 2) That there are no problems with the archiving of files. The most common problem is that we run out of allocated cassettes for MC output. mcprod will refuse to start if the output areas are full; the result is that the queue will empty. Once there is enough space in the output areas, the MC production can simply be restarted in the usual way. 3) That there are no "new" errors that crop up. Error recovery: =============== In case of an error in generation or reconstruction, the job will simply stop, and mcprod.pl will write the reason for the stop into the .err file. Files on /datarec from a previous crashed job should be cleaned up automatically. So, if a job dies without writing anything to the database, it can simply be resubmitted. The only thing you may want to do is to delete the old log files. If this is not done, there will be no impact on the success of the job, but if the job was submitted on an even-numbered node the first time, and is resubmitted on an odd-numbered node, the log files on the /analysis disk connected to the even-numbered node will hang around, and may confuse diagnostic attempts in the future. If one or more files pass the generation stage, the situation is much more complicated. The tools to gracefully handle recovery when there are entries in the database have partially been developed. At the moment, if any process from a physical run crashes before that entire run has been generated and reconstructed, the cleanest option is to delete the files processed successfully and reprocess the entire run. This is not only because the tools to jump-start a job don't exist yet. There are also details concerning the consistency of the packing of raw files into MC files and the division of the disks among odd- and even-numbered machines that will make such tools hard to develop. In particular, if a run has one or more files generated but no files reconstructed, there is a way of deleting the corresponding entries in the DB. If one or more files are reconstructed as well, those files will need to be annihilated. For now, I would suggest referring such cases to me for cleanup. A crashed DST job will not stop mcprod. This is because it is difficult to distinguish a crashed DST job from a successful DST job which happens to select no events for a particular DST type. If all files have been generated and reconstructed, but one or more DST's have not been created, first check the .log file to make sure that there has actually been a crash. If so, keep track of the information in the logbook. The mcprod_dst script can be used to generate DST's for otherwise successful jobs. Basic instructions for job submission and monitoring: ===================================================== To submit jobs, you must be kloerec. Jobs must be sumbitted from one of fibm11-14 in order to use Load Leveler. The system is not configured for use with the Sun machines. On fibm11 and fibm12, the alias submit_mcprod is defined and points to /kloe/soft/off/datarec/script/submit_mcprod.pl It is best to submit jobs from the home directory of kloerec on fibm11 or fibm12. In any case, kloerec must be able to write to the directory from where you submit without an AFS token, or else the jobs will die before they even start. To submit jobs: submit_mcprod conf_name run1 (run2) Various jobs parameters will be printed, and you will be asked to review them and confirm the submission. Make sure the card name is correct, in particular. You need to type 'yes' to continue. This means that the submitter must start out as an interactive process. Once you type 'yes,' the submitter forks into the background and tells you the PID of the submitting process. At this point, you can close the session if you like. The process in the background writes its output to a logfile in the current directory with a name such as submit_rad04_26501_27137_2004-03-12-1734 where the name includes (in order) the card name, the first and last physical runs to be simulated, and the submission start time and date in the format YYYY-MM-DD-hhmm. This is for your convenience; it is NOT important to keep these files. In fact, you should periodically delete them in order to keep the home directories of kloerec on fibm11 and fibm12 from filling up with junk. For the process that forks into the background, there will be a pause for a few moments at the beginning while the database is interrogated. If you tail -f the logfile, you will see the jobs being submitted at 30 sec intervals. When all CPU's are full, the pace of submission will accelerate greatly. The submitter will then exit. To see the list of jobs queued, use the command llq -c kmc This will report every job on the queue with the number and machine from which it was submitted. It is possible to get more information about a single job by using the -l flag. See the man page for llq. Of particular utility is the combination llq -l -c kmc | grep mcprod To STOP jobs, use the command llcancel. Note that this does two things: 1) it purges the queues of jobs in waiting, and 2) sends a SIGTERM to all jobs actually running. Its action is more or less immediate. This is not what you want (see the paragraph titled "Important!", above). What you usually want is to dequeue jobs that are in waiting. Unfortunately, llcancel can only be issued job by job, so you'll have to do a little fussing with llq, grep, llcancel, and whatever scripting skills you have to issue llcancel only for jobs in waiting. In the future, we can write scripts to be a little more selective. In fact, you are invited to do so. As an alternative to the above, you can use the X-based interface to LoadLeveler, xloadl. For example, you can dequeue idle jobs easily by starting xloadl, and then: - Click the 'Select' tab - Click 'By user' - Click 'Idle jobs for...' - 'Enter user name: kloerec' will appear. Click 'OK' - Click the 'Actions' tab - Click 'Select all' - Click the 'Actions' tab - Click 'Cancel' - Click 'OK' To see the general status of the machines, without all the verbosity of llq, you can use: llstatus which will basically only tell you how many jobs remain to be submitted, and on which machines there are jobs actually running. To submit DST-only jobs, use: submit_mcprod_dst conf_name run1 (run2) All of the above notes apply equally in this case. There is a much smaller interval between the submission of consecutive DST jobs, since the MC run number does not have to be obtained from the database. Configuration file: =================== Any line that starts with a whitespace-terminated word containing only the characters a-z, A-Z, or _ (underscore) is interpreted as the name of a perl variable to be set for mcprod.pl. All other lines are interpreted as comments. Currently defined variables (and typical values) are offline_root /kloe/soft/off/datarec The root directory for datarec code output_dir /datarec/mc The directory for reconstructed output .mco files and mcr files are temporarily placed in a subdirectory underneath this directory named mcprod_NNNNN, where NNNNN is the physical run number begin simulated. output_dir_dst /datarec The directory where DST's go when they're finished. temp_dir /runcond/mcprod/temp Location of temporary files. This is only used for the Load Leveler .cmd files at the moment of job submission. log_dir /analysis/mcprod/logs This is where the logfiles come out. Note the usual caveat about odd/even numbered machines. geanfi_exe_name gbatchfm.exe The name of the geanfi executable image, relative to the enviroment variable GEANFI_EXE. The script begins with a setup -e geanfi test, so you get the test directory. To change this, change the version in the setup command at the start of the script (both mcprod.pl and, for consistency, submit_mcprod.pl, although in the latter case only the version is checked). geanfi_min 172 Minimum version for geanfi executable. The script will exit if the geanfi executable located above does not have this version number compiled in (the binary is probed). datarec_exe_name datarec.exe Name of datarec executable relative to $offline_root/AIX/mc/default, where $offline_root is defined above. dbv_min 17 Minimum version for datarec executable. The script will exit if the datarec executable located above does not have this DBV number compiled in (the binary is probed). inp_datarec_dbv 17 This parameter is only used by mcprod_dst. It specifies the minimum datarec DBV for the .mcr files from which to make DST's. Note that ALL .mcr files must have the same DBV and background type, and that the DBV must be greater than inp_datarec_dbv and less than or equal to the DBV of the datarec executable in use. Version mixing is not allowed. bgg_dbv_min 16 Minimum version for bgg files. Each run being processed must have a set of bgg files processed at this DBV level. Note that this is active ONLY for bgg background. For lsb background, the lsb files must have the same DBV as the bhabha files in the reconstructed data for the run. The choice between lsb background and bgg background is simply based on the run number under simulation. Runs > 28000 use lsb background; for smaller run numbers, bgg background is used. vlab_min 210 If the raw file has fewer VLAB's than this, it is not simulated. The VLAB cross section is 431 ev/nb-1. If bgg background is being used (see above), there are additional considerations. bgg files are made only if there are at least 210 VLAB's, so in this case, lowering vlab_min further will have no effect. Actually, it's worse than that: since the runs with less than 210 VLAB's will have no bgg's, mcprod will tell you that the background is not finished for the run and will refuse to process the run number. mcprod_begin_run 0 If the DB contains a MC run for mccarad_code (below) corresponding to the physical run number AND this MC run number is higher than mcprod_begin_run, the physical run will not be simulated, as this means that it has already been processed in the current campaign. Therefore, this number should be set to the first MC run number produced in the campaign. LSF 0.2 Luminosity scale factor for this campaign. The number of events generated for each raw file is LSF times the VLAB luminosity for the raw file, times a cross section which is encoded directly in mcprod.pl. For the all_phid and all_phys cards, the cross section is 3100 ub. For neu_kaon, the cross section is 3100 ub * 0.338. For any other card, the cross section will have to be added to mcprod.pl. mccard_code all_phys Name of MC card for this campaign mcback_id 2 This parameter is only used by mcprod_dst.pl, and only for runs < 28000 (i.e., when bgg background is being used). It specifies the MC background code, for identifying .mcr files for DST production. 2 means DBV-16 bgg background. For lsb background, mcback_id is the DBV version of the lsb file, and this is known by the mcprod_dst.pl script. The parameter is assigned in geanfi and is therefore not necessary for mcprod.pl. mccard_name (not specified at present) Name of this campaign, if different from mccard_code. This is used to name files, log files, temporary directories, etc. If absent, it is taken to be the same as mccard_code. min_mc_events 1000 Minimum number of events to generate in a run The algorithm for packing raw files will generally create n MC runs each with an approximately equal number of events; so the minimum will usually only be a problem if the total number of events to generate for a run is very small. In any case, if one or more MC runs to be generated for the given run number have less than min_mc_events, an exception will be raised and the whole run will not be processed uic_path uic/mcprod Directory where the .uic files are found, relative to offline_root. Note that mcprod.pl uses its own set of .uic files recon_uic_name mcprod_recon_bkg.uic Name of the .uic file (in the above directory) to be used for reconstruction filter_card = Yes This variable must be set (i.e., have a defined value) if the production card invokes a geanfi output filter, otherwise mcprod will expect geanfi to generate as many events as requested. For example, this card is present in the filt_3p configuration file. If the variable is not present, or if it is set to 'No' (case insensitive), the number of events output by geanfi must be equal to the number requested, and the number of events reconstructed must be at least 95% of the number generated. If it is present and set to any other value, these checks are suspended. dst_force_val = 1 Setting this to 1, 2, etc. adds 10000, 20000, etc. to datarec_number, which allows multiple, distinct dst campaigns for the same mccard_id. This can be used in conjunction with mccard_name and recon_uic_name to perform multiple production campaigns using the same mccard_id but different reconstruction paths. In that case, the mcr and dst files will be distinguished by name, and the dst files additionally by datarec_nr range. This is not generally used for production but may be useful for reconstruction tests. If the card is not present, the default value is zero. make_dst mk0 m3p mra mkc Specifies the DST's to generate for this campaign. The the three-letter codes correspond to .uic filenames such as dstmk0.uic. Note that a particular type of .uic file may make more than one DST (see dstmra.uic, for example). The DST's are made in the order listed. It is usually a good idea to make the mkc DST's last, as they take the longest to process. This way, the .mcr files stay in the recalled area for the other DST jobs, which are shorter. If the make_dst variable is not defined, or if the list is empty, no DST's will be made. How to install a new card in the production scheme: =================================================== 1. You have to recompile geanfi. The card name needs to be added to the DATA statement in the subroutine CARD_NM. The data statement initializes a character array of length 500. The index of the array is the database variable MCCARD_ID and the value of the array is the corresponding database variable MCCARD_CODE. In the source code, the DATA statement is formatted into 5 blocks of length 100 each. These block correspond to the five card groups: 1) phi -> all simulation cards (1-99), 2) neutral kaon cards (100-199), 3) charged kaon cards (200-299), 4) radiative phi decay cards (300-399), 5) cards for simulation of continuum processes (400-499). The group number corresponds to the database variable MCCARD_GROUP. Example: ppgphok3 The name is specified as the 13th element in the 5th block of the DATA statement, so that it is in the 413th position in the array. MCCARD_ID is therefore 413 and MCCARD_GROUP is 5. Be sure to decrement the number of empty initializers at the end of the block of the DATA statement where you add the card. This preserves the alignment of the card codes and groups. Also, pay attention to which version of geanfi (test or development) you modify. The mcprod.pl script may need to be changed. It is sufficient to modify the 'setup geanfi' command that the beginning of the script to switch between test and development versions. 2. Write the card to the database: a. Prepare the card. Take out specifications such as BEAM, which are supplanted by geanfi for automated generation. Use a command such as: dbonl "select * from descript.mccard_descript where mccard_id=2" to examine another card used with mcprod if you have any doubts. Write the card file to a location accessible from fibm01 or krunc, such as /runcond/datarec/temp b. Log on to fibm01 or krunc as daq and: packman SQLremote cd $SQLREMOTEBIN First, create the database entry for the new card: ./insert_card mccard_id mccard_group mccard_code where the parameters are as described in step 1). Once the database entry has been created, you must fill it: ./update_card mccard_id filename nr_trigs comment Here, filename is the name of the temporary file you wish to load (in /runcond/datarec/temp, for example), nr_trigs is the number of triggers in the TRIG statement in said file, and comment is a comment. 3. Specify the cross section for the card in the mcprod_cs.pl script. See instructions in the next section of this document. 4. Write a configuration file called mccard_code.conf (where obviously, mccard_code is the name of the card you are installing). See the description of the .conf file in the previous section ('Configuration file'). Remember to modify the DST type selection. 5. If you need to define any new DST types, make sure to put the dstXXX.uic files are in the mcprod .uic area. Here, XXX is a (usually) three-letter code specifying the DST type. This may be a datarec stream (kpm, ksl) or it may be a .uic file specific to a particular type of generation (for example, ppg--- dstppg.uic makes only mrc DST's, as opposed to dstrad.uic which makes both mrc and mrn DST's). How to register a cross section in the mcprod script: ===================================================== Production cross sections are handled by a separarte script, /kloe/soft/off/datarec/script/mcprod_cs.pl In order to run a new card, the cross section for that card must be explicitly added to the script. For energy-dependent cross sections, you must add two hashes for the script: one with the values of sqrt(s) (called %E, in perl notation), and one with the corresponding cross section values. For example, if you have 5 CS measurements for the card named blurf, you would add the following lines to the script: $E{'blurf'} = "1017 1018 1019 1020 1021"; # In MeV $CS{'blurf'} = "100 200 250 225 150"; # In nb The script will perform a linear interpolation between the points, so in most cases, 5 measurements are not enough. The values for all_phys and neu_kaon are actually from the results of the fit to KLOE data, tabulated in 0.1 MeV intervals. The linear interpolation then works well. If you can get away with a single reference cross section, you can add a single line to the script with that cross section: $CS{'blurf'} = "600"; The %E hash is not necessary. At present, a reference value of this sort is used for the eps_ppg card.