Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2022/2023

Egzams: 
          30 January 2023,   9.00-11.00, A-1-08

          20 February 2023, 
9.00-11.00, A-2-07

Topics for assesment



 Lectures & Assignments:

-----------------------------------------------------


Week
Lecture slides
Lab slides
Python scripts with assignments
Datasets
Tutorials
5.10.2022  Godziny dziekanskie





  12.10.2022
Introduction
Data exploration

Introduction_lab
assignment-0-python
assignment-0-numpy
assignment-0-numpy-matplotlib
assignment-0-pandas

assignment-1

kc_house_data.csv.zip
info on kc_house




Optional datasets:
1) https://www.mldata.io/dataset-details/school_grades/
2) https://insights.stackoverflow.com/survey
3) https://stat.gov.pl/
4) https://www.kaggle.com/nasa/asteroid-impacts
5)https://www.kaggle.com/vik2012kvs/walmart-dataretail-analysis
6)https://www.kaggle.com/dronio/SolarEnergy
7)https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records
8)https://www.kaggle.com/cesartrevisan/scikit-learn-and-gridsearchcv
9)https://www.kaggle.com/osmi/mental-health-in-tech-survey

lending-club-data.csv.zip
http://snap.stanford.ed/data/amazon/
http://mlr.cs.umass.edu/ml/datasets.html
https//data.world/
HowToStartAnaconda

link to GoogleColab

DS_cheatsheet_numpy.pdf
DS_cheatsheet_matplotlib.pdf
DS_cheatsheet_jupyter_notebook.pdf
DS_cheatsheet_pandas.pdf

Fundamentals  of Analysis

Box plot
19.10.2022 Regression-Primer

Regression-Advanced-I

assignment-2


S.Rashki
python-machine-learning-book


rashka_ch04.ipynb
rashka_ch10.ipynb


ElasticNet Regression
26.10.2022 Regression-Advanced-II


assignment-3


numpy tutorial
matrix algebra tutorial
2.11.2022
Regression-Advanced-III
(on-line, MSTeams
zarz. JM. Rektora)






 9.11.2022 Classification-Primer

Classification-Advanced-I

assignment-4 data:amazon_baby.csv.zip scikit-learn: LogisticRegression
16.11.2022 (odwolany, urlop okolicznosciowy)




23.11.2022 Classification-Advanced-II



30.11.2022 (odwolany, wyjazd sluzbowy)




   7.12.2022 Clustering&Retrieval-Primer

Clustering&Retrieval-Advanced-I

assignment-5 data:people_wiki.cvs.zip more reading:
LDA explained
LDA paper
14.12.2022 Clustering&Retrieval-Advanced-II



    21.12.2022 Clustering&Retrieval-Advanced-III (on-line, MSTeams
zarz. JM. Rektora)





4.01.2023 Recommending-System
(on-line, MSTeams
zarz. JM. Rektora)





 11.01.2023 Modeling, simulation, Monte Carlo methods.

assignment-6

18.01.2023 Statistical Inference

assignment-7

25.01.2023 Multivariate Analyses and Artificial Neural Network











Lectures are based on the materials from Coursera: 



C. Guestrin and E. Fox, "Machine Learning Specialisation"
        Foundation: link
        Regression: link
        Classification: link
        Clustering and Retrieval: link


Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Collection of datasets

http://mlr.cs.umass.edu/ml/datasets.html
http://faculty.marshall.usc.edu/gareth-james/ISL/data.html
http://snap.stanford.edu/data/amazon/

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

 

Ostatnia modyfikacja: 30 October 2022

Elzbieta Richter-Was


Wstecz