Advanced Methods for Data Analysis

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2024/2025



Konsultacje: czwartek, godz. 10:00 - 11:00; pokój G-0-10.

Exams:
--------------------

     29.01.2025, A-1-03, 11.00-13.00
     25.02.2025,
A-1-03,   9.00-11.00

Lectures
---------
Recommended books/articles for reading:

=> G. Cowan, "Statistical Data Analysis"
      recorded lectures at CERN Summer School 2023: link
=> F. James, "Statistical Methods in Experimental Physics"
=> J. Narsky, F. Porter, "Statistical Analysis Techniques in Particle Physics"
=> J. A.Rice, "Mathematical Statistics and Data Analysis"
=> I. Narsky, F. C. Porter, "Statistical Analysis Techniques in Particle Physics"

=> K. Cranmer, "Practical statistics for LHC"

Recent conferences and workshops:

=> PHYSTAT (2024):
Statistics Meets Machine Learning
=> PHYSTAT (2024): Unfolding
=> PHYSTAT (2024):
Simulation Based Inference in Fundamental Physics
=> PHYSTAT (2022): Anomalies


Date

Lecture slides
Additional material


Statistics and Data Analysis: Basic
Statistical methods for LHC (advanced)
10.10.2024
Introduction,   StatAnal-lecture-1 Efficiency uncertainties
 17.10.2024

StatAnal-lecture-2
24.10.2024

StatAnal-lecture-3 EW measurements at LEP2
31.10.2024
Canceled (Rector's hours)

7.11.2024
StatAnal-lecture-4


Statistics and Data Analysis: Advanced
14.11.2024

LHCStatAnal-lecture-1


https://arxiv.org/pdf/1007.1727.pdf
https://arxiv.org/pdf/1609.04150.pdf
https://arxiv.org/pdf/1807.05996.pdf
https://arxiv.org/pdf/2101.06944.pdf
HistFactory
Pyhf
https://arxiv.org/pdf/2109.04981.pdf
21.11.2024
LHCStatAnal-lecture-2
LHCStatAnal-lecture-3



Statistics and Data Analysis:  Questions for exam



 Multivariate Techniques and Machine Learning

28.11.2024

Unfolding-lecture            

https://arxiv.org/pdf/1910.14654.pdf
F. Spano-Proc14-02/P52.pdf
https://arxiv.org/pdf/1611.01927.pdf

5.12.2024
MVandML-lecture-1          
https://arxiv.org/pdf/1506.02169.pdf
P.Bhat, Multivariate_Analysis_Methods_in_Particle_Physics
Understanding Deep Learning
https://arxiv.org/pdf/1806.11484.pdf
ML4Jets 2024 workshop
https://atlas.cern/Updates/Feature/Machine-Learning
12.12.2024
MVandML-lecture-1a,   MVandML-lecture-1b,   MVandML-lecture-1c
postponed and moved to on-line
starts 9.01.2025 at 18:00 pm
https://iopscience.iop.org/article/
10.1088/1748-0221/11/01/P01019/pdf
https://arxiv.org/pdf/1812.09722.pdf
ATL-PHYS-PUB-2019-033.pdf
ATL-PHYS-PUB-2020-018.pdf
GNN in ATLAS flavour tagging
19.12.2024
MVandML-lecture-2



MV and ML: Questions for exam



Physics Modeling, Simulation and Monte Carlo Methods
9.01.2025
PhysModel-lecture
https://www.coursera.org/learn/
modeling-simulation-natural-processes
 16.01.2025
MCandSimulation-lecture
23.01.2025
MCandSimulation-lecture


Phys. Model and MC methods: Questions for exam (??)

    

Assignments:

-------------------------

Date
Topic

Root/C++  or use PyRoot
Datasets/Tutorials

Python + Anaconda
Datasets/Tutorials

Statistics and Data Analysis






10.10.2024 Introduction-labs

StatAnal_labs-lecture-1.txt






Scripts in   K. Cranmer,
Statistics and Data Science


17.10.2024 StatAnal_labs-lecture-2.txt





24.10.2024 StatAnal_labs-lecture-3.txt







31.10.2024
Canceled (Rector's hours)






 7.11.2024 StatAnal_labs-lecture-4.txt





 


 





14.11-5.12.2024








StatAnal-project:
select one from suggested topics or propose your own.



1) Modeling tools:
RooFit, RooStats and HistFactory

 LHCStatAnal-labs-4-Root
follow exercises there

2) Interval Estimation and Hypotheses Testing
by T. Dorigo,
IN2P3 School on Statistics, 2018

slides
follow exercises there


3) Higgs signal at LHC
by I. van Vulpen, Terascale Statistics School, DESY 2018  slides and exercises
DesyCode2018.tgz
follow exercises there



1) Folow examples in:
Asia-Europe Pacific School 2022
lectures+hands-on by N. Berger

Practical Statistics-1
Practical-Statistics-2

Practical-Statistics-3




2) Try out PYHF tool on exercises proposed for Rootfit/Roostats

PYHF: python based fitting/limit-setting/interval estimation




Multivariate techniques
and Machine Learning






 




   12.12.2024
 - 9.01.2025

    
    
      

         
 MVandML-project:
select one from suggested topics or propose your own.



1) BDTs and TMVA
by I. Coadou,
IN2P3 School on Statistics, 2018
slides
Apply.C  Train.C
dataSchachbrett.root
follow exercises there

2) Unfolding
RooUnfold
slides
follow exercises there

3) Analysis of ATLAS open data
with MV or ML methods

4) Analysis with MVA methods
by N. Chanon (ETH Zurich, 2012)



1) Analysis of ATLAS open data
BDT example for H->4l
infofile.py

2)Unfolding with Gaussian processes:
https://arxiv.org/pdf/1811.01242.pdf
https://github.com/adambozson/gp-unfold


3) Follow hands-on examples/course

https://jduarte.physics.ucsd.edu/phys139_239/README.html


Physics Modeling, Simulation
and Monte Carlo Methods






 16.01.2025
-30.01.2025

       PhysModelandMC-project:









Aachen Online Statistics School 2023
https://indico.desy.de/event/37562/timetable/

Statistical Analysis in HEP Physics

N. Beger, 
Foundation of Statistics, Lectures at CERN Summer School 2019
link1, link2, link3

Statistics and Data Science

K. Cranmer, Course at NYU Physics,  Fall 2020, link

Machine learning applications in HEP physics:

B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"

M. Stoye,

"ML applications in CMS"

ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

A. Castaneda,
LHCP conference, Puebla, Mexico, 2019
ML and Big data tools at HEP,

Last part of the course will be based on the materials from Coursera: 


Dr. Mine Çetinkaya-Rundel    
"Data Analysis and Statistical Inference"
C. Guestrin and E. Fox, "Machine Learning Specialisation"
        Foundation: link
        Regression: link
        Classification: link
        Clustering and Retrieval: link

Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"
B. Chopard et al., "Simulation and modeling of natural processes"

Collection of datasets

http://mlr.cs.umass.edu/ml/datasets.html
http://faculty.marshall.usc.edu/gareth-james/ISL/data.html
http://snap.stanford.edu/data/amazon/


Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A


Ostatnia modyfikacja: 7 October  2024

Elzbieta Richter-Was


Wstecz