Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2020/2021



Office hours: Tuesday, 13:00 - 14:00; office G-0-10.
COVID-19: Please use MSTeams system or email if applicable.
--------------------------------------------------------------------------------


Egzam (theory part, written):   Monday 1.02.2021, 9.00-11.00,
                                                    details to be announced later

                                                    Monday, 22.2.2021, 9.00-11.00

List of topics

Lectures & Assignments:

-----------------------------------------------------


Week
Lecture slides
Lab slides
Python scripts with assignments
Datasets
Tutorials

 
13.10.2020
Introduction
Data exploration

Introdution_lab
assignment-0-python
assignment-0-numpy
assignment-0-numpy-matplotlib
assignment-0-pandas

assignment-1

kc_house_data.csv.zip
info on kc_house




Optional datasets for exploration analysis:
1) https://www.mldata.io/dataset-details/school_grades/
2) https://insights.stackoverflow.com/survey
3) https://stat.gov.pl/
4) https://www.kaggle.com/nasa/asteroid-impacts
5)https://www.kaggle.com/vik2012kvs/walmart-dataretail-analysis
6)https://www.kaggle.com/dronio/SolarEnergy
7)https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records
8)https://www.kaggle.com/cesartrevisan/scikit-learn-and-gridsearchcv
9)https://www.kaggle.com/osmi/mental-health-in-tech-survey
HowToStartAnaconda

link to GoogleColab

DS_cheatsheet_numpy.pdf
DS_cheatsheet_matplotlib.pdf
DS_cheatsheet_jupyter_notebook.pdf
DS_cheatsheet_pandas.pdf
20.10.2020 Regression-Primer

Regression-Advanced-I

assignment-2


S.Rashki
python-machine-learning-book


rashka_ch04.ipynb
rashka_ch10.ipynb
27.10.2020 Regression-Advanced-II
assignment-3


numpy tutorial
matrix algebra tutorial
  3.11.2020
Regression-Advanced-III





10.11.2020   
Classification-Primer

Classification-Advanced-I

assignment-4
data:amazon_baby.csv.zip
scikit-learn: LogisticRegression
 17.11.2020 Classification-Advanced-II



24.11.2020
Classification-Advanced-III




1.12.2020 Clustering&Retrieval-Primer

Clustering&Retrieval-Advanced-I


assignment-5
data:people_wiki.cvs.zip
  8.12.2020
Clustering&Retrieval-Advanced-II



15.12.2020 Clustering&Retrieval-Advanced-III


Its time to start developing your
personal Data Science  project!
Select one from the list below or create your own:
Proj_NYcrime.txt
Proj_PolandClimateChange.txt
Proj_TrollTweets.txt
Proj_VoteCast.txt
Proj_Classification.txt
Proj_Clustering.txt
Proj_Regression.txt
lending-club-data.csv.zip
http://snap.stanford.ed/data/amazon/
http://mlr.cs.umass.edu/ml/datasets.html
https//data.world/
more reading:
LDA explained
22.12.2020
Clustering&Retrieval-Advanced-IV
Recommending-System




 
12.01.2021
Modeling, simulation, Monte Carlo methods

assignment-6


19.01.2021
Statistical Inference
assignment-7
OpenIntro Statistics
Z. Fan at Standford Uni.
26.01.2021
Multivariate Analyses and Artificial Neural Network



DNN and CNN
https://www.youtube.com/watch?v=u4alGiomYP4
RNN
https://www.youtube.com/watch?v=fTUwdXUFfI8

Lectures are based on the materials from Coursera: 


Dr. Mine Çetinkaya-Rundel    
"Data Analysis and Statistical Inference"
C. Guestrin and E. Fox, "Machine Learning Specialisation"
        Foundation: link
        Regression: link
        Classification: link
        Clustering and Retrieval: link


Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Collection of datasets

http://mlr.cs.umass.edu/ml/datasets.html
http://faculty.marshall.usc.edu/gareth-james/ISL/data.html
http://snap.stanford.edu/data/amazon/

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

Link to lectures given in 2017
http://th-www.if.uj.edu.pl/~erichter/dydaktyka/Dydaktyka2017/AiSAD-2017/index.html 


Link to lectures given in 2014
http://th-www.if.uj.edu.pl/~erichter/dydaktyka/Dydaktyka2014/AiSAD-2014/index.html 
 

Ostatnia modyfikacja: 7 October 2020

Elzbieta Richter-Was


Wstecz