Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2019/2020



Konsultacje: wtorek, godz. 13:00 - 14:00; pokój 2-D-11.


Egzam (theory part, written):  31.01.2020, 14.00-16.00, A-1-13
List of topics: link


Lectures & Assignments:

-----------------------------------------------------


Date
Lecture slides
Python scripts with assignments
Datasets
Tutorials

  8.10.2019
Introduction
Data exploration

assignement-0-python
assignement-0-numpy
assignement-0-numpy-matplotlib
assignement-0-pandas

assignment-1

kc_house_data.csv.zip
info on kc_house

HowToStart

DS_cheatsheet_numpy.pdf
DS_cheatsheet_matplotlib.pdf
DS_cheatsheet_jupyter_notebook.pdf
DS_cheatsheet_pandas.pdf
15.10.2019 Regression-Primer

Regression-Advanced-I
assignment-2


S.Rashki
python-machine-learning-book


rashka_ch04.ipynb
rashka_ch10.ipynb
22.10.2019 Regression-Advanced-II assignment-3


numpy tutorial
matrix algebra tutorial
29.10.2019
Regression-Advanced-III

Classification-Primer




  5.11.2019 Classification-Advanced-I assignment-4
data:amazon_baby.csv.zip
scikit-learn: LogisticRegression
 12.11.2019 Classification-Advanced-II


19.11.2019
Classification-Advanced-III

Clustering-Primer


assignment-5


data:people_wiki.cvs.zip

26.11.2019 Clustering&Retrieval-Advanced-I




  3.12.2019
Clustering&Retrieval-Advanced-II Its time for your personal
Data Science  project!
Select one from the list below:




Proj_NYcrime.txt
Proj_PolandClimateChange.txt
Proj_TrollTweets.txt
Proj_VoteCast.txt
Proj_Classification.txt
Proj_Clustering.txt
Proj_Regression.txt
lending-club-data.csv.zip
http://snap.stanford.ed/data/amazon/
http://mlr.cs.umass.edu/ml/datasets.html
https//data.world/

10.12.2019
Clustering&Retrieval-Advanced-III



  7.01.2020
Modeling, simulation, Monte Carlo methods
assignement-6


14.01.2020
Wykład odwołany
assignement-7

21.01.2020
Statistical Inference


OpenIntro Statistics
Z. Fan lectures at Standford Uni.
28.01.2020
Artificial Neural Network
Recommending-System-Primer





Lectures are based on the materials from Coursera: 


Dr. Mine Çetinkaya-Rundel    
"Data Analysis and Statistical Inference"
C. Guestrin and E. Fox, "Machine Learning Specialisation"
        Foundation: link
        Regression: link
        Classification: link
        Clustering and Retrieval: link


Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Collection of datasets

http://mlr.cs.umass.edu/ml/datasets.html
http://faculty.marshall.usc.edu/gareth-james/ISL/data.html
http://snap.stanford.edu/data/amazon/

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

Link to lectures given in 2017
http://th-www.if.uj.edu.pl/~erichter/dydaktyka/Dydaktyka2017/AiSAD-2017/index.html 


Link to lectures given in 2014
http://th-www.if.uj.edu.pl/~erichter/dydaktyka/Dydaktyka2014/AiSAD-2014/index.html 
 

Ostatnia modyfikacja: 1 October  2019

Elzbieta Richter-Was


Wstecz