This macro provides examples for the training and testing of the TMVA classifiers in categorisation mode.
As input data is used a toy-MC sample consisting of four Gaussian-distributed and linearly correlated input variables with category (eta) dependent properties.
For this example, only Fisher and Likelihood are used. Run via:
The output file "TMVA.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro.
Processing /builddir/build/BUILD/root-6.10.00/tutorials/tmva/TMVAClassificationCategory.C...
==> Start TMVAClassificationCategory
--- TMVAClassificationCategory: Accessing /builddir/build/BUILD/root-6.10.00/tutorials/tmva/data/toy_sigbkg_categ_offset.root
<HEADER> DataSetInfo : [dataset] : Added class "Signal"
:
Add Tree TreeS of type Signal with 10000 events
<HEADER> DataSetInfo : [dataset] : Added class "Background"
:
Add Tree TreeB of type Background with 10000 events
<HEADER> Factory : Booking method: Fisher
:
<HEADER> Factory : Booking method: Likelihood
:
<HEADER> Factory : Booking method: FisherCat
:
: Adding sub-classifier: Fisher::Category_Fisher_1
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Background"
: Adding sub-classifier: Fisher::Category_Fisher_2
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Background"
<HEADER> Factory : Booking method: LikelihoodCat
:
: Adding sub-classifier: Likelihood::Category_Likelihood_1
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Background"
: Adding sub-classifier: Likelihood::Category_Likelihood_2
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Background"
<HEADER> Factory : Train all methods
<HEADER> DataSetFactory : [dataset] : Number of events in input trees
:
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 5000
: Signal -- testing events : 5000
: Signal -- training and testing events: 10000
: Background -- training events : 5000
: Background -- testing events : 5000
: Background -- training and testing events: 10000
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 +0.371 +0.379 +0.384
: var2: +0.371 +1.000 +0.376 +0.391
: var3: +0.379 +0.376 +1.000 +0.385
: var4: +0.384 +0.391 +0.385 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 +0.372 +0.378 +0.382
: var2: +0.372 +1.000 +0.382 +0.394
: var3: +0.378 +0.382 +1.000 +0.381
: var4: +0.382 +0.394 +0.381 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [dataset] :
:
:
<HEADER> Fisher : Results
for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: -0.059
: var2: -0.006
: var3: +0.096
: var4: +0.219
: (offset): -0.024
: -----------------------
: Elapsed time
for training with 10000 events: 0.0116 sec
<HEADER> Fisher : [dataset] : Evaluation of Fisher on training sample (10000 events)
: Elapsed time
for evaluation of 10000 events: 0.00376 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_Fisher.weights.xml
: Creating standalone class: dataset/weights/TMVAClassificationCategory_Fisher.class.C
<HEADER> Factory : Training finished
:
:
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time
for training with 10000 events: 0.0859 sec
<HEADER> Likelihood : [dataset] : Evaluation of Likelihood on training sample (10000 events)
: Elapsed time
for evaluation of 10000 events: 0.033 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_Likelihood.weights.xml
: Creating standalone class: dataset/weights/TMVAClassificationCategory_Likelihood.class.C
: TMVA.root:/dataset/Method_Likelihood/Likelihood
<HEADER> Factory : Training finished
:
:
<HEADER> DataSetFactory : [Category_Fisher_1_dsi] : Number of events in input trees
: Dataset[Category_Fisher_1_dsi] : Signal requirement: "abs(eta)<=1.3"
: Dataset[Category_Fisher_1_dsi] : Signal -- number of events passed: 5123 / sum of weights: 5123
: Dataset[Category_Fisher_1_dsi] : Signal -- efficiency : 0.5123
: Dataset[Category_Fisher_1_dsi] : Background requirement: "abs(eta)<=1.3"
: Dataset[Category_Fisher_1_dsi] : Background -- number of events passed: 5134 / sum of weights: 5134
: Dataset[Category_Fisher_1_dsi] : Background -- efficiency : 0.5134
: Dataset[Category_Fisher_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Fisher_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2561
: Signal -- testing events : 2561
: Signal -- training and testing events: 5122
: Dataset[Category_Fisher_1_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5123
: Background -- training events : 2567
: Background -- testing events : 2567
: Background -- training and testing events: 5134
: Dataset[Category_Fisher_1_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5134
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.023 +0.001 +0.009
: var2: -0.023 +1.000 +0.007 +0.014
: var3: +0.001 +0.007 +1.000 -0.007
: var4: +0.009 +0.014 -0.007 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.029 -0.015 +0.019
: var2: -0.029 +1.000 +0.005 +0.003
: var3: -0.015 +0.005 +1.000 -0.019
: var4: +0.019 +0.003 -0.019 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Fisher_1_dsi] :
:
: Train method: Category_Fisher_1 for Classification
<HEADER> Category_Fisher_1 : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: +0.096
: var2: +0.135
: var3: +0.237
: var4: +0.382
: (offset): +0.626
: -----------------------
: Elapsed time for training with 5128 events: 0.00565 sec
<HEADER> Category_Fisher_1 : [Category_Fisher_1_dsi] : Evaluation of Category_Fisher_1 on training sample (5128 events)
: Elapsed time for evaluation of 5128 events: 0.00194 sec
: Training finished
<HEADER> DataSetFactory : [Category_Fisher_2_dsi] : Number of events in input trees
: Dataset[Category_Fisher_2_dsi] : Signal requirement: "abs(eta)>1.3"
: Dataset[Category_Fisher_2_dsi] : Signal -- number of events passed: 4877 / sum of weights: 4877
: Dataset[Category_Fisher_2_dsi] : Signal -- efficiency : 0.4877
: Dataset[Category_Fisher_2_dsi] : Background requirement: "abs(eta)>1.3"
: Dataset[Category_Fisher_2_dsi] : Background -- number of events passed: 4866 / sum of weights: 4866
: Dataset[Category_Fisher_2_dsi] : Background -- efficiency : 0.4866
: Dataset[Category_Fisher_2_dsi] : you have opted
for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Fisher_2_dsi] : you have opted
for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2438
: Signal -- testing events : 2438
: Signal -- training and testing events: 4876
: Dataset[Category_Fisher_2_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4877
: Background -- training events : 2433
: Background -- testing events : 2433
: Background -- training and testing events: 4866
: Dataset[Category_Fisher_2_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4866
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.021 -0.010 +0.011
: var2: -0.021 +1.000 +0.043 +0.001
: var3: -0.010 +0.043 +1.000 -0.003
: var4: +0.011 +0.001 -0.003 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.026 +0.006 +0.016
: var2: -0.026 +1.000 +0.004 +0.044
: var3: +0.006 +0.004 +1.000 -0.027
: var4: +0.016 +0.044 -0.027 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Fisher_2_dsi] :
:
<HEADER> Category_Fisher_2 : Results
for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: +0.107
: var2: +0.125
: var3: +0.249
: var4: +0.375
: (offset): -0.733
: -----------------------
: Elapsed time
for training with 4871 events: 0.00553 sec
<HEADER> Category_Fisher_2 : [Category_Fisher_2_dsi] : Evaluation of Category_Fisher_2 on training sample (4871 events)
: Elapsed time
for evaluation of 4871 events: 0.00173 sec
: Training finished
<HEADER> Category_Fisher_1 : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 2.224e-01
: 2 : var3 : 9.802e-02
: 3 : var2 : 3.679e-02
: 4 : var1 : 1.825e-02
: -------------------------------
<HEADER> Category_Fisher_2 : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 2.177e-01
: 2 : var3 : 1.102e-01
: 3 : var2 : 3.583e-02
: 4 : var1 : 2.281e-02
: -------------------------------
: Elapsed time
for training with 10000 events: 0.123 sec
<HEADER> FisherCat : [dataset] : Evaluation of FisherCat on training sample (10000 events)
: Elapsed time
for evaluation of 10000 events: 0.0118 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_FisherCat.weights.xml
<HEADER> Factory : Training finished
:
:
<HEADER> DataSetFactory : [Category_Likelihood_1_dsi] : Number of events in input trees
: Dataset[Category_Likelihood_1_dsi] : Signal requirement: "abs(eta)<=1.3"
: Dataset[Category_Likelihood_1_dsi] : Signal -- number of events passed: 5123 / sum of weights: 5123
: Dataset[Category_Likelihood_1_dsi] : Signal -- efficiency : 0.5123
: Dataset[Category_Likelihood_1_dsi] : Background requirement: "abs(eta)<=1.3"
: Dataset[Category_Likelihood_1_dsi] : Background -- number of events passed: 5134 / sum of weights: 5134
: Dataset[Category_Likelihood_1_dsi] : Background -- efficiency : 0.5134
: Dataset[Category_Likelihood_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Likelihood_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2561
: Signal -- testing events : 2561
: Signal -- training and testing events: 5122
: Dataset[Category_Likelihood_1_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5123
: Background -- training events : 2567
: Background -- testing events : 2567
: Background -- training and testing events: 5134
: Dataset[Category_Likelihood_1_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5134
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.023 +0.001 +0.009
: var2: -0.023 +1.000 +0.007 +0.014
: var3: +0.001 +0.007 +1.000 -0.007
: var4: +0.009 +0.014 -0.007 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.029 -0.015 +0.019
: var2: -0.029 +1.000 +0.005 +0.003
: var3: -0.015 +0.005 +1.000 -0.019
: var4: +0.019 +0.003 -0.019 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Likelihood_1_dsi] :
:
: Train method: Category_Likelihood_1 for Classification
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 5128 events: 0.0508 sec
<HEADER> Category_Likelihood_1 : [Category_Likelihood_1_dsi] : Evaluation of Category_Likelihood_1 on training sample (5128 events)
: Elapsed time for evaluation of 5128 events: 0.0175 sec
: TMVA.root:/dataset/Method_LikelihoodCat/LikelihoodCat/Method_Likelihood/Category_Likelihood_1
: Training finished
<HEADER> DataSetFactory : [Category_Likelihood_2_dsi] : Number of events in input trees
: Dataset[Category_Likelihood_2_dsi] : Signal requirement: "abs(eta)>1.3"
: Dataset[Category_Likelihood_2_dsi] : Signal -- number of events passed: 4877 / sum of weights: 4877
: Dataset[Category_Likelihood_2_dsi] : Signal -- efficiency : 0.4877
: Dataset[Category_Likelihood_2_dsi] : Background requirement: "abs(eta)>1.3"
: Dataset[Category_Likelihood_2_dsi] : Background -- number of events passed: 4866 / sum of weights: 4866
: Dataset[Category_Likelihood_2_dsi] : Background -- efficiency : 0.4866
: Dataset[Category_Likelihood_2_dsi] : you have opted
for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Likelihood_2_dsi] : you have opted
for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2438
: Signal -- testing events : 2438
: Signal -- training and testing events: 4876
: Dataset[Category_Likelihood_2_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4877
: Background -- training events : 2433
: Background -- testing events : 2433
: Background -- training and testing events: 4866
: Dataset[Category_Likelihood_2_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4866
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.021 -0.010 +0.011
: var2: -0.021 +1.000 +0.043 +0.001
: var3: -0.010 +0.043 +1.000 -0.003
: var4: +0.011 +0.001 -0.003 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.026 +0.006 +0.016
: var2: -0.026 +1.000 +0.004 +0.044
: var3: +0.006 +0.004 +1.000 -0.027
: var4: +0.016 +0.044 -0.027 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Likelihood_2_dsi] :
:
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time
for training with 4871 events: 0.049 sec
<HEADER> Category_Likelihood_2 : [Category_Likelihood_2_dsi] : Evaluation of Category_Likelihood_2 on training sample (4871 events)
: Elapsed time
for evaluation of 4871 events: 0.0168 sec
: TMVA.root:/dataset/Method_LikelihoodCat/LikelihoodCat/Method_Likelihood/Category_Likelihood_2
: Training finished
<HEADER> Category_Likelihood_1 : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.349e-01
: 2 : var3 : 2.514e-02
: 3 : var1 : 9.827e-03
: 4 : var2 : 2.960e-03
: -----------------------------------
<HEADER> Category_Likelihood_2 : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.653e-01
: 2 : var3 : 8.002e-02
: 3 : var1 : 2.458e-02
: 4 : var2 : -3.762e-04
: -----------------------------------
: Elapsed time
for training with 10000 events: 0.664 sec
<HEADER> LikelihoodCat : [dataset] : Evaluation of LikelihoodCat on training sample (10000 events)
: Elapsed time
for evaluation of 10000 events: 0.0441 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_LikelihoodCat.weights.xml
<HEADER> Factory : Training finished
:
: Ranking input
variables (method specific)...
<HEADER> Fisher : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 1.488e-01
: 2 : var3 : 7.387e-02
: 3 : var2 : 2.853e-02
: 4 : var1 : 1.148e-02
: -------------------------------
<HEADER> Likelihood : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.108e-01
: 2 : var3 : 5.494e-02
: 3 : var2 : 3.017e-02
: 4 : var1 : 2.291e-02
: -----------------------------------
: No variable ranking supplied by classifier: FisherCat
: No variable ranking supplied by classifier: LikelihoodCat
<HEADER> Factory : ===
Destroy and recreate all methods via weight files
for testing ===
:
: Recreating sub-classifiers from XML-file
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Background"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Background"
: Recreating sub-classifiers from XML-file
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Background"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Background"
<HEADER> Factory : Test all methods
:
<HEADER> Fisher : [dataset] : Evaluation of Fisher on testing sample (10000 events)
: Elapsed time
for evaluation of 10000 events: 0.0045 sec
:
<HEADER> Likelihood : [dataset] : Evaluation of Likelihood on testing sample (10000 events)
: Elapsed time
for evaluation of 10000 events: 0.033 sec
:
<HEADER> FisherCat : [dataset] : Evaluation of FisherCat on testing sample (10000 events)
: Elapsed time
for evaluation of 10000 events: 0.0117 sec
:
<HEADER> LikelihoodCat : [dataset] : Evaluation of LikelihoodCat on testing sample (10000 events)
: Elapsed time
for evaluation of 10000 events: 0.0427 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: Fisher
:
<HEADER> Fisher : [dataset] : Loop over test events and
fill histograms with classifier response...
:
: -----------------------------------------------------------
: var1: -0.028190 1.2905 [ -4.3323 4.5609 ]
: var2: -0.025496 1.3165 [ -4.7537 4.6723 ]
: var3: -0.025183 1.3669 [ -5.2892 4.7007 ]
: var4: 0.12022 1.4790 [ -4.6497 5.1415 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: Likelihood
:
<HEADER> Likelihood : [dataset] : Loop over test events and
fill histograms with classifier response...
:
: -----------------------------------------------------------
: var1: -0.028190 1.2905 [ -4.3323 4.5609 ]
: var2: -0.025496 1.3165 [ -4.7537 4.6723 ]
: var3: -0.025183 1.3669 [ -5.2892 4.7007 ]
: var4: 0.12022 1.4790 [ -4.6497 5.1415 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: FisherCat
:
<HEADER> FisherCat : [dataset] : Loop over test events and
fill histograms with classifier response...
:
: -----------------------------------------------------------
: var1: -0.028190 1.2905 [ -4.3323 4.5609 ]
: var2: -0.025496 1.3165 [ -4.7537 4.6723 ]
: var3: -0.025183 1.3669 [ -5.2892 4.7007 ]
: var4: 0.12022 1.4790 [ -4.6497 5.1415 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: LikelihoodCat
:
<HEADER> LikelihoodCat : [dataset] : Loop over test events and
fill histograms with classifier response...
:
: -----------------------------------------------------------
: var1: -0.028190 1.2905 [ -4.3323 4.5609 ]
: var2: -0.025496 1.3165 [ -4.7537 4.6723 ]
: var3: -0.025183 1.3669 [ -5.2892 4.7007 ]
: var4: 0.12022 1.4790 [ -4.6497 5.1415 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
:
Name: Method: ROC-integ
: dataset FisherCat : 0.914
: dataset LikelihoodCat : 0.912
: dataset Fisher : 0.803
: dataset Likelihood : 0.763
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
:
Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: dataset FisherCat : 0.340 (0.352) 0.741 (0.738) 0.920 (0.916)
: dataset LikelihoodCat : 0.332 (0.357) 0.740 (0.739) 0.918 (0.916)
: dataset Fisher : 0.172 (0.173) 0.475 (0.474) 0.729 (0.739)
: dataset Likelihood : 0.187 (0.220) 0.437 (0.441) 0.599 (0.607)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Dataset:dataset : Created tree 'TestTree' with 10000 events
:
<HEADER> Dataset:dataset : Created tree 'TrainTree' with 10000 events
:
<HEADER> Factory : Thank you
for using TMVA!
: For citation information, please visit: http:
==> Wrote root file: TMVA.root
==> TMVAClassificationCategory is done!