Logo ROOT   6.10/00
Reference Guide
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
proof/doc/confman/DatasetStager.md
Go to the documentation of this file.
1 The Dataset Stager
2 ==================
3 
4 Overview
5 --------
6 
7 The [Dataset Stager (afdsmgrd)](http://afdsmgrd.googlecode.com/) is
8 a daemon that coordinates the transfer of data from a remote storage
9 to your local storage.
10 
11 For each file to transfer, a script is called. The script can be
12 customized to support your source and destination protocol.
13 
14 Staging requests are issued from the ROOT console, where you can also
15 control the progress of your staging. How to request stagings and how to
16 check current transfer progress from ROOT is explained in the [PROOF interface
17 to AliEn file catalog documentation](TDataSetManagerAliEn.html).
18 
19 Installation
20 ------------
21 
22 The Dataset Stager is distributed both on a repository on its own and as
23 part of ROOT. The easiest way to compile it is to do it inside ROOT.
24 
25 Installing from ROOT
26 --------------------
27 
28 When configuring the ROOT source, enable the Dataset Stager by adding
29 `--enable-afdsmgrd`. Check in the list of enabled features if you have
30 "afdsmgrd".
31 
32 After running `make` (and, optionally, `make install`) you'll find the
33 daemon in the same directory of `root.exe`.
34 
35 The configuration file and init.d startup script will be in
36 `$ROOTSYS/etc/proof`. The daemon can and **must** run as unprivileged
37 user.
38 
39 Configuration
40 -------------
41 
42 The Dataset Stager can share its configuration file with PROOF, as
43 some directives are the same and unknown directives are just ignored.
44 
45 Directives are one per line and lines beginning with a pound sign (`#`)
46 are used for comments.
47 
48 > The configuration file is automatically checked at each loop: this
49 > means you can change configuration without restarting the daemon or
50 > stopping your current transfers.
51 
52 A detailed description of each directive follows.
53 
54 set *VARIABLE=value*
55 : This statement will substitute every occurrence of `$VARIABLE` with
56  its *value* in the rest of the configuration file. You can have
57  multiple `set` statements.
58 
59 xpd.stagereqrepo [dir:]*directory*
60 : This directive is shared with PROOF: *directory* is the full path to
61  the dataset repository. **Defaults to empty:** without this
62  directive the daemon is not operative.
63 
64  The `dir:` prefix is optional.
65 
66 dsmgrd.purgenoopds *true|false*
67 : Set it to *true* **(default is false)** to remove a dataset when no file to stage
68  is found. If no file to stage is found, but corrupted files exist, the
69  dataset is kept to signal failures. Used in combination with `xpd.stagereqrepo`
70  makes it "disposable": only the datasets effectively needed for signaling
71  the staging status will be kept, improving scalability and stability.
72 
73 dsmgrd.urlregex *regex* *subst*
74 : Each source URL present in the datasets will be matched to *regex*
75  and substituted to *subst*. *regex* supports grouping using
76  parentheses, and groups can be referenced in order using the dollar
77  sign with a number (`$1` for instance) in *subst*.
78 
79  Matching and substitution for multiple URL schemas are supported by
80  using in addition directives `dsmgrd.urlregex2` up to
81  `dsmgrd.urlregex4` which have the same syntax of this one.
82 
83  Example of URL translation via regexp:
84 
85  > - Configuration line:
86  >
87  > dsmgrd.urlregex alien://(.*)$ root://xrd.cern.ch/$1
88  >
89  > - Source URL:
90  >
91  > alien:///alice/data/2012/LHC12b/000178209/ESDs/pass1/12000178209061.17/AliESDs.root
92  >
93  > - Resulting URL:
94  >
95  > root://xrd.cern.ch//alice/data/2012/LHC12b/000178209/ESDs/pass1/12000178209061.17/AliESDs.root
96  >
97 dsmgrd.sleepsecs *secs*
98 : Seconds to sleep between each loop. The dataset stager checks at
99  each loop the status of the managed transfers. Defaults to **30
100  seconds**.
101 
102 dsmgrd.scandseveryloops *n*
103 : Every `n` loops, the dataset repository is checked for newly
104  incoming staging requests. Defaults to **10**.
105 
106 dsmgrd.parallelxfrs *n*
107 : Number of concurrent transfers. Defaults to **8**.
108 
109 dsmgrd.stagecmd *shell\_command*
110 : Command to run in order to stage each file. It might be whatever you
111  want (executable, shell script...). If you add `$URLTOSTAGE` and/or
112  `$TREENAME` in the *shell\_command*, they'll be substituted
113  respectively with the destination URL and the default ROOT tree name
114  in the file (as specified in the dataset staging request from ROOT).
115 
116  An example:
117 
118  dsmgrd.stagecmd /path/to/afdsmgrd-xrd-stage-verify.sh "$URLTOSTAGE" "$TREENAME"
119 
120  Return value of the command is ignored: standard output is
121  considered, as explained here.
122 
123  Defaults to `/bin/false`.
124 
125 dsmgrd.cmdtimeoutsecs *secs*
126 : Timeout on staging command, expressed in seconds: after this
127  timeout, the command is considered failed and it is killed (in first
128  place with `SIGSTOP`, then if it is unresponsive with `SIGKILL`).
129  Defaults to **0 (no timeout)**.
130 
131 dsmgrd.corruptafterfails *n*
132 : Set this to a number above zero to tell the daemon to mark files as
133  corrupted after a certain number of either download or verification
134  failures. A value of **0 (default)** tells the daemon to retry
135  forever.
136 
137 Configuring the MonALISA monitoring plugin
138 ------------------------------------------
139 
140 The Dataset Stager supports generic monitoring plugins. The only plugin
141 distributed with the stager is the MonALISA monitoring plugin.
142 
143 dsmgrd.notifyplugin */path/to/libafdsmgrd\_notify\_apmon.so*
144 : Set it to the path of the MonALISA plugin shared object. By default,
145  notification plugin is disabled.
146 
147 dsmgrd.apmonurl *apmon://apmon.cern.ch*
148 : This variable tells the ApMon notification plugin how to contact one
149  or more MonALISA server(s) to activate monitoring via ApMon. It
150  supports two kinds of URLs:
151 
152  - `http[s]://host/path/configuration_file.conf` (a remote file
153  where to fetch the list of servers from)
154 
155  - `apmon://[:password@]monalisahost[:8884]` (a single server to
156  contact directly)
157 
158  If the variable is not set, yet the plugin is loaded, MonALISA
159  monitoring is inhibited until a valid configuration variable is
160  provided.
161 
162 dsmgrd.apmonprefix *MY::CLUSTER::PREFIX*
163 : Since MonALISA organizes information in "clusters" and "hosts", here
164  you can specify what to use as cluster prefix for monitoring
165  datasets information and daemon status. If this variable is not set,
166  MonALISA monitoring is inhibited. Please note that the suffix
167  `_datasets` or `_status` is appended for each of the two types of
168  monitoring.
169 
170 A sample configuration file
171 ---------------------------
172 
173  xpd.stagereqrepo /opt/aaf/var/proof/datasets
174  dsmgrd.purgenoopds true
175  dsmgrd.urlregex alien://(.*)$ /storage$1
176  dsmgrd.sleepsecs 20
177  dsmgrd.scandseveryloops 30
178  dsmgrd.parallelxfrs 10
179  dsmgrd.stagecmd /opt/aaf/bin/af-xrddm-verify.sh "$URLTOSTAGE" "$TREENAME"
180  dsmgrd.cmdtimeoutsecs 3600
181  dsmgrd.corruptafterfails 0
for(Int_t i=0;i< n;i++)
Definition: legend1.C:18
#define mark(osub)
Definition: triangle.c:1206
TString as(SEXP s)
Definition: RExports.h:71
TArc * a
Definition: textangle.C:12
constexpr std::array< decltype(std::declval< F >)(std::declval< int >))), N > make(F f)
static double A[]
void example()
Definition: example.C:26
const Int_t n
Definition: legend1.C:16
char name[80]
Definition: TGX11.cxx:109