3. Regression
The main readme link, with the details of the regression framework, can be seen here: link.
3.1. Some general information
Input root file: Input root file should be flat tree. All the branches should beint,float,double, etc. Branches should not be saved asvector.This means that if in an event we have two electrons then the same event number will appear twice, if we
Scanthe root file.
Input target variable:
(True Energy) / (Raw Energy)List of input variables for the training:
nrHitsEB1GeV+nrHitsEE1GeV: Total number of ECAL rechits in EB and EE above 1 GeV thresholdeg_eta: rapidityeg_phiWidth: phi-width of superclustereg_r9Frac: Fractional R9eg_nrClus: Total number of clusters in superclustereg_clusterMaxDR: Maximum distance between the seed and clusters inside the SCeg_rawEnergy: Raw energy
NOTE: These input features are hardcoded inside the CMSSW ( here). So, if one update the input features, then the same information should be propagated to the CMSSW for correct evaluation.
Output
invTar: Invarse target, i.e.
(Raw Energy) / (True Energy)To get the corrected response:
invTar * mean
3.2. Regression setup
cmsrel CMSSW_12_0_1
cd CMSSW_12_0_1/src
git clone -b Run3_2021_Robert_CMSSW12_0_1_FullSelection https://github.com/ram1123/EgRegresTrainerLegacy.git
cd EgRegresTrainerLegacy
gmake RegressionTrainerExe -j 8
gmake RegressionApplierExe -j 8
export PATH=$PATH:./bin/$SCRAM_ARCH #add the binary location to path
python3 scripts/runSCRegTrainings.py --era "Run3"
3.2.1. Regression performance test
For this compare the distribution of E_reco/E_gen and regInvTar*regMean.
Here, regInvTar*regMean should be equivalent to E_reco/E_gen after the regression correction. So, the distribution of regInvTar*regMean should be closer to 1 with narrow width.
An example script exists here: Plot_mean.py
Or this can be also checked like:
export ROOT_INCLUDE_PATH=$ROOT_INCLUDE_PATH:$PWD/include #otherwise will get header not found errors
#root -l -b rootScripts/setupExample.c
root -l rootScripts/setupExample.c
hists = makeHists(regTestTree,{-3.0,-2.5,-2.,-1.6,-1.566,-1.4442,-1.1,-0.7,0.,0.7,1.1,1.4442,1.566,1.6,2.,2.5,3.0},150,0,1.5,{"regInvTar*regMean:eg_eta","eg_rawEnergy/eg_gen_energy:eg_eta","eg_energy/eg_gen_energy:eg_eta"},"eg_energy>0 && eg_sigmaIEtaIEta>0 && eg_sigmaIPhiIPhi>0 && eg_gen_pt>20 && eg_gen_pt<60")
compareRes({hists[0],"Corr Energy"},{hists[1],"raw energy"},{hists[2],"corr energy (old)"}, 6)
3.3. Verify new regression by re-running hlt
3.3.1. Upload new regression to GT
Need to create 4 .db files. Two files for the correction and another two files for the uncertainty, for EB and EE respectively. We should compute correction from ideal training and uncertainty from real training.
They are:
Correction for EB: We should get this from
Run3HLT_IdealIC_IdealTraining_stdVar_stdCuts_EB_ntrees1500_results.rootLabel:
pfscecal_EBCorrection_onlineTag:
pfscecal_EBCorrection_online_Run3_120X_v1
Uncertainty for EB: We should get this from
Run3HLT_RealIC_RealTraining_stdVar_stdCuts_EB_ntrees1500_results.rootLabel:
pfscecal_EBUncertainty_onlineTag:
pfscecal_EBUncertainty_online_Run3_120X_v1
Correction for EE: We should get this from
Run3HLT_IdealIC_IdealTraining_stdVar_stdCuts_EE_ntrees1500_results.rootLabel:
pfscecal_EECorrection_onlineTag:
pfscecal_EECorrection_online_Run3_120X_v1
Uncertainty for EE: We should get this from
Run3HLT_RealIC_RealTraining_stdVar_stdCuts_EE_ntrees1500_results.rootLabel:
pfscecal_EEUncertainty_onlineTag:
pfscecal_EEUncertainty_online_Run3_120X_v1
3.3.2. Get DBTool Setup
cmsrel CMSSW_10_2_13
cd CMSSW_10_2_13/src
cmsenv
git cms-init
git clone git@github.com:cms-egamma/EgammaDBTools.git RecoEgamma/EgammaDBTools
scram b -j 16
Command to run
# EB Correction
cmsRun RecoEgamma/EgammaDBTools/test/gbrForestDBWriter.py gbrFilename=/eos/user/r/rasharma/post_doc_ihep/EGamma/HLT/regression/MainNtuples_v2/results/resultSC_UpdatedEtaDef_16March/Run3HLT_IdealIC_IdealTraining_stdVar_stdCuts_EB_ntrees1500_results.root fileLabel=EBCorrection dbTag=pfscecal_EBCorrection_online_Run3_120X_v1 dbLabel=pfscecal_EBCorrection_online dbFilename=/eos/user/r/rasharma/post_doc_ihep/EGamma/HLT/regression/MainNtuples_v2/dbFiles_BugFixed_ReOrderVars_16March/pfscecal_EBCorrection_online_Run3_120X_v1
# EE Correction
cmsRun RecoEgamma/EgammaDBTools/test/gbrForestDBWriter.py gbrFilename=/eos/user/r/rasharma/post_doc_ihep/EGamma/HLT/regression/MainNtuples_v2/results/resultSC_UpdatedEtaDef_16March/Run3HLT_IdealIC_IdealTraining_stdVar_stdCuts_EE_ntrees1500_results.root fileLabel=EECorrection dbTag=pfscecal_EECorrection_online_Run3_120X_v1 dbLabel=pfscecal_EECorrection_online dbFilename=/eos/user/r/rasharma/post_doc_ihep/EGamma/HLT/regression/MainNtuples_v2/dbFiles_BugFixed_ReOrderVars_16March/pfscecal_EECorrection_online_Run3_120X_v1
# EB Uncertainty
cmsRun RecoEgamma/EgammaDBTools/test/gbrForestDBWriter.py gbrFilename=/eos/user/r/rasharma/post_doc_ihep/EGamma/HLT/regression/MainNtuples_v2/results/resultSC_UpdatedEtaDef_16March/Run3HLT_RealIC_RealTraining_stdVar_stdCuts_EB_ntrees1500_results.root fileLabel=EBUncertainty dbTag=pfscecal_EBUncertainty_online_Run3_120X_v1 dbLabel=pfscecal_EBUncertainty_online dbFilename=/eos/user/r/rasharma/post_doc_ihep/EGamma/HLT/regression/MainNtuples_v2/dbFiles_BugFixed_ReOrderVars_16March/pfscecal_EBUncertainty_online_Run3_120X_v1
# EE Uncertainty
cmsRun RecoEgamma/EgammaDBTools/test/gbrForestDBWriter.py gbrFilename=/eos/user/r/rasharma/post_doc_ihep/EGamma/HLT/regression/MainNtuples_v2/results/resultSC_UpdatedEtaDef_16March/Run3HLT_RealIC_RealTraining_stdVar_stdCuts_EE_ntrees1500_results.root fileLabel=EEUncertainty dbTag=pfscecal_EEUncertainty_online_Run3_120X_v1 dbLabel=pfscecal_EEUncertainty_online dbFilename=/eos/user/r/rasharma/post_doc_ihep/EGamma/HLT/regression/MainNtuples_v2/dbFiles_BugFixed_ReOrderVars_16March/pfscecal_EEUncertainty_online_Run3_120X_v1
Latest version of .db files are present here
3.4. Re-Run HLT with updated training information
Re-run HLT step with the realIC configuration file append the following lines (with proper naming conventions and 4 times corresponding to above 4 .db files) and run:
process.GlobalTag.toGet.extend( cms.VPSet(
cms.PSet(
record = cms.string('GBRDWrapperRcd'),
tag = cms.string('pfscecal_EBCorrection_online_Run3_120X_v1'),
label = cms.untracked.string('pfscecal_EBCorrection_online'),
connect = cms.string('sqlite_file:pfscecal_EBCorrection_online_Run3_120X_v1.db')
)
)
)
Link of updated real hlt config file: hlt_real_WithDBFile.py#L3456-L3491