Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2017/2018



Konsultacje: wtorek, godz. 14:00 - 15:00; pokój 2-D-11.

 
Lectures & Assignments:
-----------------------------------------------------
based on course

C. Guestrin and E. Fox, "Machine Learning Specialisation"

31.10.2017
Introduction
Regression-Primer
Primer:
assignments

snippet
data: kc_house_data.csv.zip
info on kc_house dataset

  7.11.2017 Regression-Advanced
Regression-Recap
Advanced:
simple-regression-assignment.ipynb.zip
multiple-regression-assignment-2.ipynb.zip
polynomial-regression-assignment.ipynb.zip
ridge-regression-assignment-1.ipynb.zip
Overfitting_Demo_Ridge_Lasso.ipynb.zip
lasso-assignment-1.ipynb.zip
lasso-assignment-2.ipynb.zip
local-regression-assignment.ipynb.zip
 
data: kc_house_data.csv.zip

info on kc_house dataset
14.11.2017 Classification-Primer Primer:
assignments
AnalyzingProductSentiment.ipynb.gz

data: amazon_baby.csv.zip

21.11.2017
Classification-Advanced
Classification-Recap
Classification-Bonus
Advanced:
linear-classifier-assignment.ipynb.zip
linear-classifier-learning-assignment.ipynb.zip
decision-tree-assignment-1.ipynb.zip
decision-tree-assignment-2.ipynb.zip
decision-tree-practical-assignment.zip

boosting-assignment-1.ipynb.zip
boosting-assignment-2.ipynb.zip

data: amazon_baby.csv.zip
dictionary: important_words.json.zip
data: lendig-club-data.cvs.zip
info on LendingClub dataset
28.11.2017 Clustering-Primer Primer:
assignment:
0_nearest-neighbors-features-and-metrics.ipynb.zip


data: people_wiki.cvs.zip
example: DocumentRetrieval.ipynb.gz
  5.12.2017 Clustering-Advanced
Clustering-Recap
Clustering-Bonus
Advanced:
1_nearest-neighbors-lsh-implementation.ipynb.zip

2_kmeans-with-text-data.ipynb.zip
3_em-for-gmm.ipynb.zip
4_em-with-text-data.ipynb.zip
5_lda.ipynb.zip
6_hierarchical_clustering.ipynb.zip


data & helpers
people_wiki.cvs.zip
people_wiki_map_index_to_word.json.zip
people_wiki_word_count.npz.zip
people_wiki_tf_idf.npz.zip
kmeans-arrays.npz.zip
images.sf.zip
chosen_images.png
em_utilities.py.zip
topic_models.zip

12.12.2017
 Lecture&Assignments by  Piotr Bialas

19.12.2017 Recommender-Primer

DeepLearning-Primer
Primer:
SongRecommender.zip


Advanced:
DeepLearningForClassification.zip


Lectures are based on the materials from Coursera: 


Dr. Mine Çetinkaya-Rundel    
"Data Analysis and Statistical Inference"
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
C. Guestrin and E. Fox, "Machine Learning Specialisation"
        Foundation: link
        Regression: link
        Classification: link
        Clustering and Retrieval: link
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

Link to lectures given in 2014
http://th-www.if.uj.edu.pl/~erichter/dydaktyka/Dydaktyka2014/AiSAD-2014/index.html 




Ostatnia modyfikacja: 21 listopad  2017

Elzbieta Richter-Was


Wstecz