List of topics for examination-based assessment.
=================================================

Statistical data analyses
==========================

1. Binomial, Poissonian and Gaussian distributions are particularly
important for statistical analyses in physics.
Coud you explain what are relations between those, write down fomulas
for probability distributions and point estimates: mean and standard
deviation.

2. When combining 1D measurements one often uses assumptions is that
they are distributed according to Gaussian distributions.
It leads to very commonly used expressions for calculating weighted
average and weighted error on the average.
Write down and explain expressions used. How it changes when included
are also correlations between measurements?
You can be qualitative in the answer.

3. You have measured  distributions of certain observable and have
theoretical model for that distribution. Explain how you would test
is the predictions are corrected performing chi2 consistency test.
Write  down few formulas, do some illustrative drawings.
Would the chi2 method be correct in case of low statistics
distributions?

4. In some case more appropriate is to perform likelihood consistency
tests based on Poissonian statistics. Could you explain what does
it mean? Write  down few formulas, do some illustrative drawings.

5. Measurements have attributed statistical error and systematic
uncertainties.
Explain how one includes systematic uncertainties when performing
consistency tests (with theoretical model pedictions) or parameters
fitting.

6. Statistical analysis of physics measurement requires defining
first statistical model, which allows then to properly treat
statistical and systematic error.
Describe statistical model for a simple counting measurement,
with observable being number of events

7. Statistical analysis of physics measurement requires defining
first statistical model, which allows then to properly treat
statistical and systematic error.
Describe statistical model for binned shape analysis,
with observable being n_i, i=1,..N_bins

8. Statistical analysis of physics measurement requires defining
first statistical model, which allows then to properly treat
statistical and systematic error.
Describe statistical model for unbinned shape analysis,
with observable being m_i, i=1,..N_events

9. Maximum Likelihood Estimation is nowedays a standard approach
to infer value of physical parameters (eg. couplings, masses)
from the measured distributions or event counts.
Could you explain this concept in case of binned shape analysis.

10. To claim discovery or exclusion based on specific measurement,
one first defines its statistical model, formulates hypotheses
H0 and H1 and then use likelihood ratio to decide between hypotheses.
Could you explain above procedures and compare/contrast
between discovery and exclusion (setting limits) cases.

11. To estimate expected results (limits) for a given measurement, one
is often generating special MC datasets (pseudo-experiments, toys).
Sometimes however is enough to generated so called Assimov data.
Could you explain the difference and how those datasets are used. 

12. Likelihood typically includes parameters of interest (POIs) and
nuisance parameters (NPs). Give example illustrating what they can be.
What about systematic uncertainties? How they can be incorporated
into likelihood. What does it mean "profiling systematic uncertainties"?


Multi-variate analyses
========================

13. Briefly explain what is pourpose of  unfolding
procedures. Say more about  commonly used approaches.

14. Rectangular cuts and Fisher discriminant.
Explain the methods and how one can optimise teir performance.
 
15. Decision trees:
How one defines classification of the final leafs.
How one measure quality of the predictions: error and accuracy. 
How one is measuring the performance.

16. Idea of ensamble classifiers and boosting:
Could you explain the concept of weighted weak classifiers
and weighted data. Could you write down formula for final mode predictions.


Data Science and Machine Learning
=====================================

17. Draw the flow-chart diagram for making predictions using regression as ML algorithm.
Explain briefly each box on the flow-chart: ML model, ML algorithm, Quality metric,
Feature extraction.

18. How do we access performance of ML algorithms?
Explain what is the "training error", "validation error",
"generalization error", "test error". What does it mean "cross-validation"?
Draw illustrative plot how they typically behave with regression model complexity.
What does it mean "over-fitting"? How we can mitigate it adding extra term to the
cost function

19. Explain model of logistic regression classifier.
Write down formula for linear score and logistic link function. 
We measure performance of the classifier based on:
   "classification error",  "classification accuracy", "confusion matrix".
Could you explain what does it mean? What is the problem of "class majority".

20. Explain probabilistic approach for clustering. The soft assignment can be optimised
with MLE approach (maximum likelihood estimator). Can you explain what it means,
write down some formulas?