Large-scale MC production:			   M. Moulson 
	Frequently Asked Questions (FAQ)                  31-Mar-2004
	=============================================================

	-------------------------------------------------------------
	Q. How do I submit MC jobs using the new mechanism?
	-------------------------------------------------------------
	A. The offline group is responsible for the production 
	   of MC jobs using the new machinery. If you are the 
	   offline expert and need help (or if you are just 
	   interested), see the following web page:

	   http://www.lnf.infn.it/kloe/private/mc/mcinst.txt

	   Individuals cannot submit these jobs, but any requests
	   will be considered. Forward your request to M. Moulson
	   or C. Bloise and it will be discussed at the next 
	   offline meeting. 

	   Note that the new production mechanism is not well 
	   suited for the production of small jobs (i.e., 
	   rare signal samples). The old MC procedure is still
	   active for this purpose. 


	-------------------------------------------------------------
	Q. How do I interface with the MC output for analysis?
	-------------------------------------------------------------
	A. One significant feature of the new MC production 
	   machinery is the production of MC DST's. MC DST's
	   are very similar to data DST's.

	   For the time being, there are 5 MC DST streams:
	   
	   stream_id 61: stream_code: mkc analogous to dkc
	             62		      mk0	       dk0
		     63		      m3p	       d3p
		     64		      mrn	       drn
		     65		      mrc	       drc

	   There is now a new protocol for accessing MC DST's.
	   It is called dbmcdst. It uses the logger.dtr_mcs_data
	   view. The SQL query that you use from A/C with 
	   "input_url" can use any of the fields in this view.

	   You can obtain the list of field names with 

	   dbonl fields logger.dtr_mcs_data

	   Here are some examples:

!-- all KSKL MC DST's for 2002 data, any card
input url "dbmcdst:run_nr>23400 and dtr_stream_id=62"

!-- or alternately...
input url "dbmcdst:run_nr>23400 and dtr_stream_code='mk0'"

!-- all neutral rad MC DST's for the stated run range in 2001 data, any card
input url "dbmcdst:run_nr between 19000 and 21000 and dtr_stream_code='mrn'"

!-- same as above but only for files generated with the all_phys card
input url "dbmcdst:run_nr between 19000 and 21000 and dtr_stream_code='mrn' 
and mc_mccard_id=2"

!-- or alternately...
input url "dbmcdst:run_nr between 19000 and 21000 and dtr_stream_code='mrn' 
and mc_mccard_code='all_phys'"

!-- all files generated with eps_ppg in run range, any DST type
input url "dbmcdst:mc_mccard_code='eps_ppg' and run_nr between 20010 and 20020"

!-- specified by filename
input url "dbmcdst:dtr_filename='mrc_20235_eps_ppg_17.dst'"

           The command kls has also been updated yet to use
	   the new protocol:

!-- all neutral rad MC DST's for the stated run range in 2001 data, any card
kls mcdst "run_nr between 19000 and 21000 and dtr_stream_code='mrn' and 
mc_mccard_code='all_phys'"


	-------------------------------------------------------------
	Q. What modules do I have to run when analyzing MC DST's?
	-------------------------------------------------------------
	A. Only your analysis code, plus whatever that requires. 
	   In many cases, EMCDBINI and DCDBINI are good to put 
	   in your path, but it really depends on what you're doing
	   in your analysis. There are no modules that are always 
	   required. In particular, the MC DST's don't contain 
	   any data encoded with the SQZ library, so you don't need
	   KBKMDD. (The .mcr files, however, do require KBKMDD
	   to be read.)


	-------------------------------------------------------------
	Q. How is the streaming handled for MC DST's?
	-------------------------------------------------------------
	A. In the old reconstructed MC files (.mcr), the event 
	   classification decision was recorded in ECLS/ECLO, but 
	   not enforced. This allowed studies of event classification 
	   efficiency.

	   The same is true for the new .mcr files. However, we now
	   have MC DST's, which are streamed.

	   MC DST's are streamed on the union of the event type in 
	   MC truth and the event classification decision. In other 
	   words, an mk0 DST will contain all events generated 
	   as KS/KL events, plus any events reconstructed and 
	   classified as KS/KL events. So, if you can do your 
	   analysis from a single stream in data, you should be 
	   able to do your MC analysis from the corresponding MC
	   DST stream, including event classification efficiencies.
	   (The TSKT and FILFO decisions are similarly recorded 
	   but not enforced.)

	   Note that, exactly as for data, additional stream-specific
	   algorithms are run for MC events in the DST-making phase.
	   A single .mcr is written for each MC run. This is then 
	   analyzed by four separate datarec processes to make the 
	   DST's. 

	   For mkc, the retracking performed for charged kaons 
	   is applied (obviously, only for those events recognized 
	   as K+/K-; unrecognized K+/K- events in MC truth are 
	   passed but do not receive the special treatment).

	   For mk, the t0 step 1 algorithm is applied for KL tag 
	   events.

	   Radiative events are divided into the charged and 
	   neutral streams during the DST phase. For neutral radiatives, 
	   the t0 step 1 algorithm is applied.


	-------------------------------------------------------------
	Q. Do I have to worry about overlapping events in MC DST's?
	-------------------------------------------------------------
	A. Yes, if you have to worry about them when analyzing data.
	   Otherwise, no. For example, if you look at more than one 
	   stream when analyzing data, there may be overlaps. In MC 
	   DST's, the part of the streaming based on MC truth should 
	   be completely orthogonal, so there is really no difference 
	   from data in this respect.
	   

	-------------------------------------------------------------
	Q. Is there any documentation on the MC DST banks?
	-------------------------------------------------------------
	A. I have a partially written memo describing the formats
	   of all banks used in DST's: regular DST's, kpm DST's, 
	   and MC DST's. Finishing this memo is a priority for 
	   me (although I do have a lot of priorities these days...).
	   Anyway, this is on the way.

	   In the meantime, please keep in mind that the routines
	   in the TLS library which fill the structures used in 
	   PROD2NTU have all been updated to transparently read 
	   full reconstructed files or DST files, and this is 
	   true for both MC and data DST's. In certain cases
	   (particularly for the trigger), the information in 
	   the DST's does not allow the structures to be completely
	   filled. However, these routines should return whatever
	   information is available.

	   Note that this means that 1) PROD2NTU should work 
	   out of the box for MC DST's, and 2) the PROD2NTU  
	   subroutines can be used by people doing analysis 
	   in their own code, without any detailed knowledge 
	   of the underlying bank structure.


	-------------------------------------------------------------
	Q. What improvements in the simulation have been implemented?
	-------------------------------------------------------------
	A. Too many to list here. Documentation is on the way.
	   In the meantime, for the full story, consult the 
	   extensive list of presentations and meeting summaries at

	   http://www.lnf.infn.it/kloe/private/mc/pres

	   Of particular note is the short list below of things 
	   that vary run-by-run in the campaign.


	-------------------------------------------------------------
	Q. What changes run-by-run in the simulation?
	-------------------------------------------------------------
	A.
	   -. Inserted background both in the EmC and DC
	   -. sqrt(s)
	   -. The production cross section (sometimes, see below)
	   -. Mean phi momentum (z)
	   -. Position of luminous region
	   -. Extent of luminous region in x and z
	   -. Beam energy spread (in 3 large run groups)
	   -. Dead and hot DC wires
	   -. Trigger thresholds (in large run groups) 


	-------------------------------------------------------------
	Q. What should I know about that doesn't change run-by-run 
           in the simulation?
	-------------------------------------------------------------
	A. The production cross section, sometimes. For continuum 
	   channels (such as eps_ppg), a reference cross section is used.
	   Note that the phi BR's change with sqrt(s) in any case--
	   that is part of M. Antonelli's new phi generator. So, if 
	   we're simulating a run with sqrt(s) = 1018.0 MeV, the MC 
	   output contains a larger fraction of rho-pi events than 
	   it does when we're simulating a run with sqrt(s) = 1019.5. 


	-------------------------------------------------------------
	Q. How is the number of events to generate determined?
	-------------------------------------------------------------
	A. The number of events generated when an individual run
	   is simulated is calculated using the VLAB luminosity 
	   for that run and the cross section for the card that		
	   governs the generation. For the cross section, either 
	   a reference value is used (for example, for continuum 
	   processes or other situations where we don't have a 
	   good ready parameterization of the energy dependence
	   of the cross section), or the cross section is 
	   parameterized by sqrt(s). At the moment, fits to KLOE
	   data are used to obtain the energy-dependent cross 
	   sections for the all_phys and neu_kaon cards.
	   
	   The actual number of events generated is then obtained
	   by multiplying the the luminosity scale factor (LSF) 
	   for the campaign. For example, in the all_phys running, 
	   we use an LSF of 1:5. The number of MC events generated
	   for the 430 pb-1 2001-2002 data set (counting good data 
	   only) would then correspond to the number expected for
	   86 pb-1 of data. Of course, these events are distributed
	   in run space in the same way as the events in data are. 


	-------------------------------------------------------------	
	Q. What is the significance of the MC run number?
	   How does it differ from the run number simulated?	  
	-------------------------------------------------------------
	A. The significance of the MC run number has not changed.
	   The MC run number (mcrun_nr in most DB tables) is 
	   just a progressive index of the MC runs for each 
	   MC card. For technical reasons, it turned out to be 
	   easiest to keep it this way.


	-------------------------------------------------------------	
	Q. Where are the MC and data run numbers in the YBOS file?
	-------------------------------------------------------------
	A. The MC run number is in the LRID bank. As such 
	   it is loaded into the jobsta common by A/C.
	   (This means that it is the MC run number that A/C
	   shows you in the status report when you process an MC 
	   DST.) The run number being simulated is in a new bank 
	   called BRID. The BRID bank is actually a copy of the 
	   LRID bank from the inserted background event, and 
	   must be present in every event. BRID is identical 
	   in format to LRID. It is not loaded by A/C, however. 
	   You can BLOCAT BRID and obtain the simulated run 
	   number at zero offset from inddat in the IW array.

	   Note that the BRID is only present in EVENT records!
	   This is because in principle the source of inserted 
	   events can change during an MC run (an MC run could 
	   in theory correspond to more than one physical run, 
	   though we're not currently doing things that way).
	   This means that, if you want the simulated run number
	   for your analysis, you have to look for it every event, 
	   NOT in a Begin Run routine. 

	   Of course, if you use the simulated run number to 
	   access HepDB for the boost, etc., make sure to 
	   call the DB only when the run number from BRID 
	   changes. Read the run number from BRID every event, 
	   but make the database calls only when the run number 
	   actually changes.


	-------------------------------------------------------------	
	Q. What do we do with the reconstructed MC files?
	-------------------------------------------------------------
	A. The maximum number of events for a given MC run 
	   is determined by the desire to keep reconstructed 
	   MC file (.mcr file) size below 1GB, and is established
	   empirically for each card. A given run in data can 
	   require more than one MC run to simulate. 

	   Background events are available for each raw data 
	   file taken. When the generation of a run has to be 
	   split up, it is split up into groups of the raw files
	   of which it consists. Each group of raw files is 
	   simulated separately, as far as background insertion 
	   is concerned. The exact mechanism is complex, but 
	   ensures that the background spectra in the sum of the 
	   .mcr files for any simulated physical run faithfully 
	   reproduces the background spectra in the data for that 
	   run. 

	   The .mcr files are recombined by physical run number 
	   when MC DST's are made.

	   The reconstructed MC files are tagged by their 
	   MC run number and are archived. If you really need 
	   to use them, you can see them with kls or kid using the	
	   dbmc protocol. It will take some SQL gymnastics to 
	   get the right combination to fully reproduce each
	   run and/or to get the correspondence to physical 
	   run numbers.

	   Note that the .mcr files in principle contain 
	   everything necessary for re-reconstruction. 
	   All of the information exists to allow the background 
	   to be removed and recover the pristine .mco file.
	   This allows us to reconstruct with no background, or 
	   with a different source of background, if need be.   


	-------------------------------------------------------------	
	Q. What about the .mco files?
	-------------------------------------------------------------
	A. They're toast. See above; we don't need 'em.
	   By the way, we never kept them before, either.		
 
	
	-------------------------------------------------------------
	Q. How is the background obtained?
	-------------------------------------------------------------
	A. This will be documented more extensively in the future.

	   In brief, the background is obtained from gamma-gamma 
	   events. Since these events are neutral, all DC hits 
	   in these events are considered background. The 
	   calorimeter clusters not identified as belonging 
	   to the gamma-gamma topology are considered background.
	   The separation is imperfect in the calorimeter, so 
	   event downscaling techniques are used to make sure 
	   that the energy and polar angle distributions of the 
	   clusters selected as background correspond to 
	   the background only. The timing distribution of 
	   background clusters has been shown to be well reproduced.
	   The histograms used for the statisical separation 
	   of background clusters from gamma-gamma fragments  
	   and additional clusters from 3-gamma events 
	   are made in groups of 2 to 5 pb-1, where the 
	   group boundaries are chosen on the basis of the 
	   accidental rate measured in cosmic events as 
	   recorded by physmon. 	   


	-------------------------------------------------------------
	Q. Are background events recycled?
	-------------------------------------------------------------
	A. Yes. The cross section for an identified gamma-gamma 
	   event that yields background is roughly 40 nb. So, 
	   in all_phys running at 1:5 scale (CS = 620 nb), 
	   each background event will be used in 15.5 simulated
	   events, on average.

	   For each raw data file in the data set, we make a 
	   background (bgg) file. An MC run corresponds to 
	   a set of raw files in data. To the extent that
	   the cross section for an extractable bgg event 
	   is constant, if we insert all of the bgg events 
	   for the set of raw files into the MC run (recycling 
	   by the appropriate reuse factor), then we should 
	   reproduce the background spectrum resulting from 
	   the time dependence of the background variation
	   in the data.

	   The reuse factor is a non-integer value. The number 
	   of times to reuse each background event is obtained 
	   by remainder series. (If the reuse factor is 15.537, 
	   slightly more than half of the background events will
	   be used 16 times and the rest will be used 15 times.)
	   Reconstruction of a file stops when either the MC 
	   events run out or the background does. When the 
	   reuse factor is calculated correctly, at most a few 
	   MC events are lost because the background runs out.

	   (Note: an earlier version of the insert module used a 
	   pseudorandom sequence rather than the remainder series.
	   On average, with that method, slightly fewer events 
	   are reconstructed than are generated. This technique
	   was used for the eps_ppg campaign.)


	-------------------------------------------------------------
	Q. What if I want to study the background itself?
	-------------------------------------------------------------
	A. Any distribution that you make that includes measured 
	   quantities from background clusters or hits will 
	   obviously show the fluctuations from the underlying 
	   background statistics. This is an unavoidable fact 
	   of life that comes from the fact that the cross section 
	   for gamma-gamma events with isolable background is 
	   relatively small, which necessitates the recycling 
	   of background events for high-statistics MC production.

	   In many cases--such as if you're studying events of
	   a particular type that pass a series of selection 
	   cuts--this probably won't matter. The obvious exception, 
	   as pointed out to me by C. Gatti, is if you're studying  
	   the background itself. 

	   In this case, you may want to exploit the fact that 
	   when the background is recycled, it is recycled into 
	   contiguous events, at least for now. So, if this is 
	   a crucial concern for you, you can test on the 
	   run and/or event number of the inserted event (in 
	   the BRID bank) and fill your histograms and/or Ntuples
	   only when the inserted event changes.

	   Note that in principle, if we are simulating a frequent 
	   process at a high LSF, then it may take more than one MC 
	   run to span a single raw file in data. In this case, 
	   all insertions of the same event are not guaranteed 
	   to be contiguous in the MC output files. We are not 
	   running this way for now, and probably won't be for 
	   a while. But it can't be entirely excluded that we 
	   will never do so, so if you have a doubt, ask me. 


	-------------------------------------------------------------
	Q. How is the background inserted?
	-------------------------------------------------------------
	A. This will also be documented more extensively in the 
	   future.

	   Roughly speaking, the hits on both the EmC and DC
	   are inserted, preserving their timing relative to 
	   the t0 in the gamma-gamma event from which they 
	   were extracted. This happens before the t0 smearing.
	   The simulated and background hits are then smeared
	   together using the standard t0 smearing algorithm.

	   Background hits preceeding simulated hits will 
	   clobber the simulated hits and vice versa. 
	   Background hits with negative times are introduced
	   with negative times. In the DC, a background hit 
	   with negative time will cause the simulated hit  
	   to be removed from the event.