Classification ================== 1. Explain model of logistic regression classifier. Write down formula for linear score and logistic link function. How it is extended in case of multi-classification problem. 2. We measure performance of the classifier based on: "classification error", "classification accuracy", "confusion matrix". Could you explain what does it mean? What is the problem of "class majority". 3. Write formula for quality metric in case of logistic classifier: likelihood function. The best classifier is found using MLE (maximum likelihood estimation) method and gradient ascent. Could you write down and explain final formula of that algorithm. How do we choose step size. 4. Classification with decision trees. How one defines classification of the final leafs. How one measure quality of the predictions: error and accuracy. Explain simple greedy algorithm to find the best decision tree. How one is measuring performance. 5. Greedy decision tree learning: what are the steps for building tree. Stopping conditions for the splitting in the decision tree model. What is the sign of over-fitting in decision trees, how one mitigate this effect: early stopping or pruning. Could you explain what does it mean? 6. What are strategies for handling missing data in case of decision trees. 7. Idea of ensemble classifiers and boosting. Could you explain the concept of weighted weak classifiers and weighted data. Could you write down formula for final mode predictions. 8. AdaBoost algorithm, formulas, learning process.