Clustering&Retrieval ===================== 1. Explain TF-IDF representation of documents. What are the metrics which are most commonly used to search for k-NN documents. 2. What are the KD-trees. How to build and query KD-tree. What is the complexity of querying and how it compares with complexity of other queries: 1-NN, k-NN. 3. Explain LSH method (locality sensitive hashing). Is it competitive to KD-tree method? 4. Describe steps of k-means clustering algorithm. How we measure its quality? Could you comment on its convergence 5. Explain probabilistic approach for clustering. The soft assignment can be optimised with MLE approach (maximum likelihood estimator). Can you explain what does it mean, give some formulas? 6. Explain what is the model for "bag-of-words" for clustering documents. 7. LDA method (Latent Dirichlet allocation). Can you explain the concept? 8. Hierarchical clustering. Explain algorithm, illustrate with dendrogram.