Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2018/2019



Konsultacje: wtorek, godz. 14:00 - 15:00; pokój 2-D-11.

 
Projects presentation (optional): 
               30.01.2019, 9.00 am, room A-2-01.

Lectures & Assignments:
-----------------------------------------------------

List of topics
for written test, to be held on 4-th December (during last lecture), 9.00-10.00, room A-1-08.


  2.10.2018

Introduction
Data exploration

Lab: HowToStart

assignment-1
 
data: kc_house_data.csv.zip
info on kc_house dataset
 9.10.2018 Regression-Primer assignment-2

16.10.2018 Regression-Advanced-I assignment-3

numpy tutorial
matrix algebra tutorial

23.10.2018
Regression-Advanced-II project-regression

data: kc_house_data.csv.zip
30.10.2018 Classification-Primer assignment-4-explained
assignment-4-graphlab

data:amazon_baby.csv.zip

scikit-learn: LogisticRegression
  6.11.2018 Classification-Advanced-I project-classification

data:lending-club-data.csv.zip

13.11.2018
Classification-Advanced-II
Clustering&Retrieval-Primer

20.11.2018 Clustering&Retrieval-Advanced-I assignment-5-graphlab

data:people_wiki.cvs.zip
27.11.2018
Clustering&Retrieval-Advanced-II project-clustering

data:people_wiki.cvs.zip
  4.12.2018
Clustering&Retrieval-Advanced-III

Examination-based assessment:
9.00-10.00, written part


Lectures are based on the materials from Coursera: 


Dr. Mine Çetinkaya-Rundel    
"Data Analysis and Statistical Inference"
C. Guestrin and E. Fox, "Machine Learning Specialisation"
        Foundation: link
        Regression: link
        Classification: link
        Clustering and Retrieval: link

Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Collection of datasets

https://archive.ics.uci.edu/ml/datasets.html

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

Link to lectures given in 2017
http://th-www.if.uj.edu.pl/~erichter/dydaktyka/Dydaktyka2017/AiSAD-2017/index.html 


Link to lectures given in 2014
http://th-www.if.uj.edu.pl/~erichter/dydaktyka/Dydaktyka2014/AiSAD-2014/index.html 
 

Ostatnia modyfikacja: 1 October  2018

Elzbieta Richter-Was


Wstecz