#error This file is not for compilation /** @page train_setup_doc Using the TrainSetup facility @tableofcontents @section train_setup_overview Overview The TrainSetup framework allows users to easily set up an analysis train which can be executed in all environments supported by ALICE. The train definition takes the form of a class deriving from the base class TrainSetup. Specific hooks in the base class allows users to customize the various aspects of a train. The base class also facilities to easily define parameters of the train which can be set by parsing simple command line options or strings. Furthermore, the basic setup ensures that the analysis becomes a self-contained, self-documenting unit by storing all relevant files together with the various kinds of output generated during the analysis job. The execution environment (local, Proof, Grid) is specified as a simple URL like string, with room for environment specific options. This scheme allows a user to run the same analysis in various environments by simply changing the execution environment URL with another URL. Various helpers for each type of environment ensures that all needed steps are taken to help ensure successful execution of the analysis regardless of the underlying execution environment. Trains defined using this framework can either be executed in an interactive AliROOT session or using a stand-alone program. @section train_setup_usage Usage Users should define a class that derives from TrainSetup. The class should implement the member function TrainSetup::CreateTasks to add needed tasks to the train. The derived class must also override the member function TrainSetup::ClassName to return the name of the derived class as a C-string. @code // MyTrain.C class MyTrain : public TrainSetup { public: MyTrain(const char name="MyTrain") : TrainSetup(name), { // fOptions.Set("type", "AOD"); // AOD input // fOptions.Set("type", "ESD"); // ESD input fOptions.Add("parameter", "VALUE", "Help on parameter", "value"); } protected: void CreateTasks(AliAnalysisManager mgr) { AliAnalysisManager::SetCommonFileName("my_analysis.root"); fHelper->LoadLibrary("MyAnalysis", true); Bool_t mc = mgr->GetMCtruthEventHandler() != 0; Double_t param = fOptions.AsDouble("parameter"); gROOT->Macro(Form("AddTaskMyAnalysis.C(%f)",param)); } const char* ClassName() const { return "MyTrain"; } }; @endcode (Please note, that TrainSetup does not inherit from TObject so one should _not_ put in a call to the *ClassDef* macro) @section train_setup_params Parameters of the setup Parameters of the user defined class deriving from TrainSetup is best handled by adding options to the internal member @c fOptions in the constructor e.g., @code fOptions.Add("<name>", "<dummy>", "<description>", "<default>"); fOptions.Add("<name>", "<dummy>", "<description>", defaultInt_t); fOptions.Add("<name>", "<dummy>", "<description>", defaultLong64_t); fOptions.Add("<name>", "<dummy>", "<description>", defaultDouble_t); fOptions.Add("<name>", "<description>"); fOptions.Add("<name>", "<description>", defaultBool); @endcode The first 4 forms defined a parameter that has a value, while the last 2 forms defines a flag (or toggle). The values or flags can be retrieved later by doing @code Double_t value = fOptions.AsDouble("<name>",<value if not set>); Int_t value = fOptions.AsInt("<name>",<value if not set>); Long64_t value = fOptions.AsLong("<name>",<value if not set>); Bool_t value = fOptions.AsBool("<name>",<value if not set>) TString value = fOptions.Get("<name>"); Bool_t value = fOptions.Has("<name>"); @endcode Parameters defined this way are directly accessible as options to pass to either runTrain or RunTrain.C @section train_setup_exec Execution of the train A user defined TrainSetup class can then be run like @code Root> .x RunTrain.C("<class>", "<name>", "<uri>", "<options>") @endcode or using the program @b runTrain @verbatim > runTrain --class=<class> --name=<name> --url=<uri> [<options>] @endverbatim Here, <dl> <dt>`<class>`</dt> <dd> is the name of the user defined class deriving from TrainSetup.</dd> <dt>`<name>`</dt> <dd> is an arbitary name to give to the train. Note, an @e escaped @e name will be generated from this, which replaces all spaces and the like with '_' and (optionally) with the date and time appended.</dd> <dt>`<uri>`</dt> <dd> is the job execution URI which specified both the execution environment and the input data, as well as some options. See more below. </dd> <dt>`<options>`</dt> <dd> is a list of options. For RunTrain this is a comma separated list of options in the form <option>=<value> for value options and <option> for flags (booleans). For @c runTrain, the options are of the traditional Unix long type: `--<option>=<value>` and @c `--<option>`. The exact list of options for a given train can be listed by passing the option @b help. </dd> </dl> See also ::RunTrain and ::main In both cases, a new sub-directory called @e escaped @e name of the train is created, and various files are copied there - depending on the mode of execution. For local analysis, no aditional files are copied there, but the output will be put there. For PROOF analysis, the needed PAR files are copied there and expanded. The output of the job may end up in this directory if so instructed. For Grid analysis, various JDL and steering scripts are copied to this directory. Scripts to run merge/terminate stages and to download the results are also generated for the users convinence. The special generated script <tt>Watch.C</tt> will monitor the progess of the jobs and automatically execute the needed merging and terminate stages. Various files needed by the train are copied to the Grid working directory as a form of documentation. In all cases, a file named @c ReRun.C (and for @b runTrain: rerun.sh) is generated in this sub-directory. It contains the setting used for the train and can easily be used to run jobs again as well as serve as a form of documentation. @section train_setup_url_spec Execution URI This URI has the form @verbatim <protocol>://[[<user>@]<host>]/<input>[?<options>][#<treename>] @endverbatim and specifies several things. <dl> <dt>`<protocol>`</dt> <dd>One of <dl> <dt><tt>local</tt></dt> <dd>Local analysis on local data executed sequentially on the local machine</dd> <dt><tt>lite</tt></dt> <dd>Proof-Lite analysis on local data executed in parallel on the local machine</dd> <dt><tt>proof</tt></dt> <dd>Proof analysis on cluster data executed in parallel on a PROOF cluster</dd> <dt><tt>alien</tt></dt> <dd>Grid analysis on grid data executed on the Grid</dd> </dl> </dd> <dt>`[[<user>@]<host>]`</dt> <dd>Sets the master host for Proof analysis</dd> <dt>`<input>`</dt> <dd>Input data specification. The exact form depends on the protocol used e.g., for local analysis it can be a single, while for other environments it could be a data set name, and so on.</dd> <dt>`<options>`</dt> <dd>Protocol specific options</dd> <dt>`<treename>`</dt> <dd>If specified, gives what data to analyse</dd> </dl> @section train_setup_proof_spec PROOF specifics Local and Grid jobs are in a sense very similar. That is, the individual Grid jobs are very much like Local jobs, in that they always produce output files (albiet not after Terminate, though parameter container files are (re)made). PROOF jobs are very different. In a PROOF analysis, each slave only produces in memory output which is then sent via net connections (sockets) to the master. One therefore needs to be very of output object ownership and the like. Another major difference is that output files are generated within the PROOF cluster, and are generally not accessible from the outside. For plain PROOF clusters in a local area network or so-called <i>Lite</i> session, it is generally not a problem since the files are accessible on the LAN or local machine for Lite sessions. However, for large scale analysis farms (AAFs), the workers and masters are generally on a in-accessible sub-net, and there's no direct access to the produced files. Now, for normal output files, like histogram files, etc. there are provisions for this, which means the final merged output is sent back to the client. Special output, such as AODs, are however not merged nor sent back to the user by default. There are two ways to deal with this: <ol> <li> Register the output tree as a data set on the cluster. This is useful if you need to process the results again on the cluster.</li> <li> Send the output to a (possibly custom) XRootd server. This is useful if you need to process the output outside of the cluster</li> </ol> The first mode is specified by passing the option <tt>dsname=</tt><i><name></i> in the cluster URI. The created dataset will normally be made in <tt>/default/</tt><i><user></i><tt>/</tt><i><name></i>. If the <tt>=</tt><i><name></i> part is left out, the <i>escaped name</i> of the job will be used. The second mode is triggered by passing the option <tt>storage=<i>URI</i></tt> to the train setup. The <i>URI</i> should be of the form @verbatim rootd://<host>[:<port>]/<path> @endverbatim where <i><host></i> is the name of a machine accessible by the cluster, <i><port></i> is an optional port number (e.g., if different from 1093), and <i><path></i> is an absolute path on <i><host></i>. The XRootd process should be started (optionally by the user) on <i><host></i> as @verbatim xrootd -p <port> <path> @endverbatim When running jobs on AAFs, one can use the Grid handler to set-up aspects of the job. To enable the Grid handler, pass the option <tt>plugin</tt> in the execution URI @section train_setup_input Specifying the input @subsection train_setup_local Local and Lite data input For both ESD and AOD input for local jobs, one must specify the root of the sub-tree that holds the data. That is, if - for example - the data resides in a directory structure like @verbatim /some/directory/<run>/<seq>/AliESDs.root @endverbatim then one should specify the input location like @verbatim local:///some/directory[?pattern=AliESDs.root][#esdTree] lite:///some/directory[?pattern=AliESDs.root][#esdTree] @endverbatim <tt>/some/directory</tt> is then search recursively for input files that match the pattern given by the analysis type (ESD: <tt>AliESDs.root</tt>, AOD: <tt>AliAOD.root</tt>). The found files are then chained together. If MC input is specified, then the companion files <tt>galice.root</tt>, <tt>Kinematics.root</tt>, and <tt>TrackRefs.root</tt> must be found in the same directories as the <tt>AliESDs.root</tt> files @subsection train_setup_proof PROOF input. The input data for a PROOF based analysis is specified as data set names, @verbatim proof://[<user>@]<host>/<data-set-name>[?options][#<treename>] @endverbatim @subsection train_setup_grid_esd Grid ESD input. Suppose the ESD files are stored on the Grid as @verbatim /alice/data/<year>/<period>/<run>/ESDs/pass<no>/<year><run><chunk>.<part>/AliESDs.root @endverbatim where <run> is zero-padded by typically 3 '0's. One should specify the input location like @verbatim alien:///alice/data/<year>/<period>?pattern=ESDs/pass<no>/*&run=<run>[#<treename>] @endverbatim If a particular kind of pass is needed, say <tt>pass<no>_MUON</tt>, one should do modify the <tt>pattern</tt> option accordingly @verbatim /alice/data/<year>/<period>/<run>/ESDs/pass<no>_MUON/* /AliESDs.root @endverbatim For simulation output, the files are generally stored like @verbatim /alice/sim/<year>/<prod>/<run>/<seq>/AliESDs.root @endverbatim where <run> is generally @e not zero-padded. One should specify the input location like @verbatim alien:///alice/data/<year>/<period>?pattern=*&mc&run=<run>[#<treename>] @endverbatim @subsection train_setup_grid_aod Grid AOD input Suppose your AOD files are placed in directories like @verbatim /some/directory/<run>/<seq>/AliAOD.root @endverbatim where <run> is zero-padded by typically 3 '0's. One should then specify the input as @verbatim alien:///some/directory?pattern=*&run=<run>[#<treename> @endverbatim The AliEn analysis plug-in is then instructed to look for data files under @verbatim /some/directory/<run>/* /AliAOD.root @endverbatim for each added run. Suppose the AODs are in @verbatim /alice/data/<year>/<period>/<run>/ESDs/pass<no>/AOD<vers>/<seq>/AliAOD.root @endverbatim Then the url should be @verbatim alien:///alice/data/<year>/<period>?pattern=ESDs/pass<no>/AOD<vers>/*&run=<run>[#<treename>] @endverbatim @section train_setup_other Other features @subsection train_setup_aux Auxillary libraries, sources, and files Auxillary libraries should be loaded using @code Helper::LoadLibrary(const char*) @endcode where the argument is the name of the library If the train needs additional files, say a script for setting up the tasks, or some data file, it can be passed on the the PROOF/Grid workers using the member functions @code Helper::LoadAux(const char*) Helper::LoadSource(const TString&,bool) @endcode @subsection train_setup_overload Overloading the behaviour The base class TrainSetup tries to implement a sensible setup for a given type of analysis, but some times a particular train needs a bit of tweaking. One can therefore overload the following functions - TrainSetup::CreateInputHandler(UShort_t) - TrainSetup::CreateMCHandler(UShort_t,bool) - TrainSetup::CreateOutputHandler(UShort_t) - TrainSetup::CreatePhysicsSelection(Bool_t,AliAnalysisManager*) - TrainSetup::CreateCentralitySelection(Bool_t,AliAnalysisManager*) @section train_setup_scripts Tasks defined in scripts A task can even be defined in a script, like for example a task like @include MyAnalysis.C Our train set-up can then use the member function ParUtilities::MakeScriptPAR to make a PAR file of the script and use that to make a library loaded on the workers and then generate an object of our task defined in the script. @include MyTrain.C This can allow for fast development and testing of analysis tasks without having to wait for official tasks and builds of all of AliROOT @section train_setup_impl Implementation details @subsection train_setup_imp_helper Helpers The specifics of the each possible execution environment and input is handled by sub-classes of the base class Helper. Each of these helpers define - URI options. - Steps to be done before the tasks are added to the train - How to load libraries, additional scripts and files - Steps to be done after the setup of tasks - How to execute the analysis Currently defined helpers are - LocalHelper for local jobs - ProofHelper for jobs running on a PROOF farm - LiteHelper for jobs running in a PROOF-Lite session - AAFHelper Special kind of ProofHelper for jobs running on AAFs - AAFPluginHelper As AAFHelper, but uses the AliEn plugin - GridHelper for Grid jobs */ // // EOF //