Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2021/2022


Egzamin (pisemny) sesja zimowa:
                 1.02.2022, godz. 9.00-11.0, system MSTeams
    poprawkowy
               25.02.2022, godz.
9.00-11.0, system MSTeams  

Lista zagadnien

 

 Lectures & Assignments:

-----------------------------------------------------


Week
Lecture slides
Lab slides
Python scripts with assignments
Datasets
Tutorials

  13.10.2021
Introduction
Data exploration

Introduction_lab
assignment-0-python
assignment-0-numpy
assignment-0-numpy-matplotlib
assignment-0-pandas

assignment-1

kc_house_data.csv.zip
info on kc_house




Optional datasets:
1) https://www.mldata.io/dataset-details/school_grades/
2) https://insights.stackoverflow.com/survey
3) https://stat.gov.pl/
4) https://www.kaggle.com/nasa/asteroid-impacts
5)https://www.kaggle.com/vik2012kvs/walmart-dataretail-analysis
6)https://www.kaggle.com/dronio/SolarEnergy
7)https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records
8)https://www.kaggle.com/cesartrevisan/scikit-learn-and-gridsearchcv
9)https://www.kaggle.com/osmi/mental-health-in-tech-survey
HowToStartAnaconda

link to GoogleColab

DS_cheatsheet_numpy.pdf
DS_cheatsheet_matplotlib.pdf
DS_cheatsheet_jupyter_notebook.pdf
DS_cheatsheet_pandas.pdf

Fundamentals  of Analysis

Box plot
20.10.2021 Regression-Primer

Regression-Advanced-I

assignment-2


S.Rashki
python-machine-learning-book


rashka_ch04.ipynb
rashka_ch10.ipynb
27.10.2021 Regression-Advanced-II
(no class, recorded lecture
 available in PEGAZ)


assignment-3


numpy tutorial
matrix algebra tutorial
 3.11.2021
Regression-Advanced-III





10.11.2021
lecture by  prof. P. Bialas




17.11.2021
lecture by  prof. P. Bialas



24.11.2021
lecture by  prof. P. Bialas



1.12.2021
lecture by  prof. P. Bialas



8.12.2021
lecture by  prof. P. Bialas



 15.12.2021   
Classification-Primer

Classification-Advanced-I

assignment-4
data:amazon_baby.csv.zip
scikit-learn: LogisticRegression
 22.12.2021  Godziny rektorskie




 5.01.2022
Classification-Advanced-II
(on-line, zarzadzenie JM Rektora)




12.01.2022 Clustering&Retrieval-Primer

Clustering&Retrieval-Advanced-I


assignment-5
data:people_wiki.cvs.zip
 19.01.2022
Clustering&Retrieval-Advanced-II



 26.01.2022 Clustering&Retrieval-Advanced-III






Lectures are based on the materials from Coursera: 



C. Guestrin and E. Fox, "Machine Learning Specialisation"
        Foundation: link
        Regression: link
        Classification: link
        Clustering and Retrieval: link


Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Collection of datasets

http://mlr.cs.umass.edu/ml/datasets.html
http://faculty.marshall.usc.edu/gareth-james/ISL/data.html
http://snap.stanford.edu/data/amazon/

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

 

Ostatnia modyfikacja: 5 December 2021

Elzbieta Richter-Was


Wstecz