Logo ROOT   6.10/00
Reference Guide
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
Namespaces
tdf001_introduction.py File Reference

Namespaces

 tdf001_introduction
 

Detailed Description

This tutorial illustrates the basic features of the TDataFrame class, a utility which allows to interact with data stored in TTrees following a functional-chain like approach.

1 
2 import ROOT
3 
4 # A simple helper function to fill a test tree: this makes the example stand-alone.
5 fill_tree_code = '''
6 void fill_tree(const char *filename, const char *treeName)
7 {
8  TFile f(filename, "RECREATE");
9  TTree t(treeName, treeName);
10  double b1;
11  int b2;
12  t.Branch("b1", &b1);
13  t.Branch("b2", &b2);
14  for (int i = 0; i < 10; ++i) {
15  b1 = i;
16  b2 = i * i;
17  t.Fill();
18  }
19  t.Write();
20  f.Close();
21  return;
22 }
23 '''
24 # We prepare an input tree to run on
25 fileName = "tdf001_introduction_py.root"
26 treeName = "myTree"
27 ROOT.gInterpreter.Declare(fill_tree_code)
28 ROOT.fill_tree(fileName, treeName)
29 
30 
31 # We read the tree from the file and create a TDataFrame, a class that
32 # allows us to interact with the data contained in the tree.
33 TDF = ROOT.ROOT.Experimental.TDataFrame
34 d = TDF(treeName, fileName)
35 
36 # Operations on the dataframe
37 # We now review some *actions* which can be performed on the data frame.
38 # All actions but ForEach return a TActionResultPtr<T>. The series of
39 # operations on the data frame is not executed until one of those pointers
40 # is accessed.
41 # But first of all, let us we define now our cut-flow with two strings.
42 # Filters can be expressed as strings. The content must be C++ code. The
43 # name of the variables must be the name of the branches. The code is
44 # just in time compiled.
45 cutb1 = 'b1 < 5.'
46 cutb1b2 = 'b2 % 2 && b1 < 4.'
47 
48 # `Count` action
49 # The `Count` allows to retrieve the number of the entries that passed the
50 # filters. Here we show how the automatic selection of the column kicks
51 # in in case the user specifies none.
52 entries1 = d.Filter(cutb1) \
53  .Filter(cutb1b2) \
54  .Count();
55 
56 print("%s entries passed all filters" %entries1.GetValue())
57 
58 entries2 = d.Filter("b1 < 5.").Count();
59 print("%s entries passed all filters" %entries2.GetValue())
60 
61 # `Min`, `Max` and `Mean` actions
62 # These actions allow to retrieve statistical information about the entries
63 # passing the cuts, if any.
64 b1b2_cut = d.Filter(cutb1b2)
65 minVal = b1b2_cut.Min('b1')
66 maxVal = b1b2_cut.Max('b1')
67 meanVal = b1b2_cut.Mean('b1')
68 nonDefmeanVal = b1b2_cut.Mean("b2")
69 print("The mean is always included between the min and the max: %s <= %s <= %s" %(minVal.GetValue(), meanVal.GetValue(), maxVal.GetValue()))
70 
71 # `Histo1D` action
72 # The `Histo1D` action allows to fill an histogram. It returns a TH1F filled
73 # with values of the column that passed the filters. For the most common
74 # types, the type of the values stored in the column is automatically
75 # guessed.
76 hist = d.Filter(cutb1).Histo1D('b1')
77 print("Filled h %s times, mean: %s" %(hist.GetEntries(), hist.GetMean()))
78 
79 # Express your chain of operations with clarity!
80 # We are discussing an example here but it is not hard to imagine much more
81 # complex pipelines of actions acting on data. Those might require code
82 # which is well organised, for example allowing to conditionally add filters
83 # or again to clearly separate filters and actions without the need of
84 # writing the entire pipeline on one line. This can be easily achieved.
85 # We'll show this re-working the `Count` example:
86 cutb1_result = d.Filter(cutb1);
87 cutb1b2_result = d.Filter(cutb1b2);
88 cutb1_cutb1b2_result = cutb1_result.Filter(cutb1b2)
89 
90 # Now we want to count:
91 evts_cutb1_result = cutb1_result.Count()
92 evts_cutb1b2_result = cutb1b2_result.Count()
93 evts_cutb1_cutb1b2_result = cutb1_cutb1b2_result.Count()
94 
95 print("Events passing cutb1: %s" %evts_cutb1_result.GetValue())
96 print("Events passing cutb1b2: %s" %evts_cutb1b2_result.GetValue())
97 print("Events passing both: %s" %evts_cutb1_cutb1b2_result.GetValue())
98 
99 # Calculating quantities starting from existing columns
100 # Often, operations need to be carried out on quantities calculated starting
101 # from the ones present in the columns. We'll create in this example a third
102 # column the values of which are the sum of the *b1* and *b2* ones, entry by
103 # entry. The way in which the new quantity is defined is via a runable.
104 # It is important to note two aspects at this point:
105 # - The value is created on the fly only if the entry passed the existing
106 # filters.
107 # - The newly created column behaves as the one present on the file on disk.
108 # - The operation creates a new value, without modifying anything. De facto,
109 # this is like having a general container at disposal able to accommodate
110 # any value of any type.
111 # Let's dive in an example:
112 entries_sum = d.Define('sum', 'b2 + b1') \
113  .Filter('sum > 4.2') \
114  .Count()
115 print(entries_sum.GetValue())
Date
May 2017
Author
Danilo Piparo

Definition in file tdf001_introduction.py.