List of topics for examination-based assessment. ================================================= Statistical data analyses ========================== 1. Binomial, Poissonian and Gaussian distributions are particularly important for statistical analyses in physics. Coud you explain what are relations between those, write down fomulas for probability distributions and point estimates: mean and standard deviation. 2. When combining 1D measurements one often uses assumptions is that they are distributed according to Gaussian distributions. It leads to very commonly used expressions for calculating weighted average and weighted error on the average. Write down and explain expressions used. How it changes when included are also correlations between measurements? You can be qualitative in the answer. 3. You have measured distributions of certain observable and have theoretical model for that distribution. Explain how you would test is the predictions are corrected performing chi2 consistency test. Write down few formulas, do some illustrative drawings. Would the chi2 method be correct in case of low statistics distributions? 4. In some case more appropriate is to perform likelihood consistency tests based on Poissonian statistics. Could you explain what does it mean? Write down few formulas, do some illustrative drawings. 5. Measurements have attributed statistical error and systematic uncertainties. Explain how one includes systematic uncertainties when performing consistency tests (with theoretical model pedictions) or parameters fitting. 6. Statistical analysis of physics measurement requires defining first statistical model, which allows then to properly treat statistical and systematic error. Describe statistical model for a simple counting measurement, with observable being number of events 7. Statistical analysis of physics measurement requires defining first statistical model, which allows then to properly treat statistical and systematic error. Describe statistical model for binned shape analysis, with observable being n_i, i=1,..N_bins 8. Statistical analysis of physics measurement requires defining first statistical model, which allows then to properly treat statistical and systematic error. Describe statistical model for unbinned shape analysis, with observable being m_i, i=1,..N_events 9. Maximum Likelihood Estimation is nowedays a standard approach to infer value of physical parameters (eg. couplings, masses) from the measured distributions or event counts. Could you explain this concept in case of binned shape analysis. 10. To claim discovery or exclusion based on specific measurement, one first defines its statistical model, formulates hypotheses H0 and H1 and then use likelihood ratio to decide between hypotheses. Could you explain above procedures and compare/contrast between discovery and exclusion (setting limits) cases. 11. To estimate expected results (limits) for a given measurement, one is often generating special MC datasets (pseudo-experiments, toys). Sometimes however is enough to generated so called Assimov data. Could you explain the difference and how those datasets are used. 12. Likelihood typically includes parameters of interest (POIs) and nuisance parameters (NPs). Give example illustrating what they can be. What about systematic uncertainties? How they can be incorporated into likelihood. What does it mean "profiling systematic uncertainties"? Multi-variate analyses ======================== 13. Briefly explain what is pourpose of unfolding procedures. Say more about commonly used approaches. 14. Rectangular cuts and Fisher discriminant. Explain the methods and how one can optimise teir performance. 15. Decision trees: How one defines classification of the final leafs. How one measure quality of the predictions: error and accuracy. How one is measuring the performance. 16. Idea of ensamble classifiers and boosting: Could you explain the concept of weighted weak classifiers and weighted data. Could you write down formula for final mode predictions. Data Science and Machine Learning ===================================== 17. Draw the flow-chart diagram for making predictions using regression as ML algorithm. Explain briefly each box on the flow-chart: ML model, ML algorithm, Quality metric, Feature extraction. 18. How do we access performance of ML algorithms? Explain what is the "training error", "validation error", "generalization error", "test error". What does it mean "cross-validation"? Draw illustrative plot how they typically behave with regression model complexity. What does it mean "over-fitting"? How we can mitigate it adding extra term to the cost function 19. Explain model of logistic regression classifier. Write down formula for linear score and logistic link function. We measure performance of the classifier based on: "classification error", "classification accuracy", "confusion matrix". Could you explain what does it mean? What is the problem of "class majority". 20. Explain probabilistic approach for clustering. The soft assignment can be optimised with MLE approach (maximum likelihood estimator). Can you explain what it means, write down some formulas?