Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2023/2024


 Exams: 
               29.01.2024,  godz 9.15 - 10.45,  sala A-1-13

              
23.02.2024, godz  9.15 - 10.45,  sala A-2-07

 List of questions for exam


  Lectures & Assignments:
-----------------------------------------------------


Week
Lecture slides
Lab slides
Python scripts with assignments
Datasets
Tutorials

  4.10.2023
Introduction
Data exploration

Introduction_lab
assignment-1
kc_house_data.csv.zip
info on kc_house




Example datasets for optional assignments:
1) https://www.mldata.io/dataset-details/school_grades/
2) https://insights.stackoverflow.com/survey
3) https://stat.gov.pl/
4) https://www.kaggle.com/nasa/asteroid-impacts
5)https://www.kaggle.com/vik2012kvs/walmart-dataretail-analysis
6)https://www.kaggle.com/dronio/SolarEnergy
7)https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records
8)https://www.kaggle.com/cesartrevisan/scikit-learn-and-gridsearchcv
9)https://www.kaggle.com/osmi/mental-health-in-tech-survey

lending-club-data.csv.zip
http://snap.stanford.ed/data/amazon/
http://mlr.cs.umass.edu/ml/datasets.html
https//data.world/
HowToStartAnaconda

Link to GoogleColab

DS_cheatsheet_numpy.pdf
DS_cheatsheet_matplotlib.pdf
DS_cheatsheet_jupyter_notebook.pdf
DS_cheatsheet_pandas.pdf

Fundamentals  of Analysis

Box plot
11.10.20223 Regression-Primer

Regression-Advanced-I

assignment-2


S.Rashki
python-machine-learning-book


rashka_ch04.ipynb
rashka_ch10.ipynb


ElasticNet Regression
18.10.2023 Regression-Advanced-II


assignment-3


numpy tutorial
matrix algebra tutorial
25.10.2023
Regression-Advanced-III






8.11.2023 Classification-Primer

Classification-Advanced-I

assignment-4 data:amazon_baby.csv.zip scikit-learn: LogisticRegression
15.11.2023 Odwolane, wyjazd sluzbowy




22.11.2023 Classification-Advanced-II



   29.11.2023 Clustering&Retrieval-Primer

Clustering&Retrieval-Advanced-I

assignment-5 data:people_wiki.cvs.zip
6.12.2023 Clustering&Retrieval-Advanced-II



    13.12.2023 Odwolane, wyjazd sluzbowy


more reading:
LDA explained
LDA paper
20.12.2023 Recommender-System
DeepLearning
(on-line, wyjazd sluzbowy)



Recovering Eigenfaces
Matrix Factorisation
VisualizingAndConvolutionalNN
 10.01.2024 Modeling, simulation, Monte Carlo methods.

assignment-6
Computational Modeling
Computational Thinking and Data Science
17.01.2024 Statistical Inference

assignment-7

24.01.2024 Multivariate Analyses and Artificial Neural Network



Understanding Deep Learning







Recomended books (
.pdf available for download) :
https://hastie.su.domains/ElemStatLearn/
https://hastie.su.domains/ISLP/ISLP_website.pdf
https://probml.github.io/pml-book/
https://www.deeplearningbook.org/

Lectures are based on the materials from Coursera: 



C. Guestrin and E. Fox, "Machine Learning Specialisation"
        Foundation: link
        Regression: link
        Classification: link
        Clustering and Retrieval: link


Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Collection of datasets

http://mlr.cs.umass.edu/ml/datasets.html
http://faculty.marshall.usc.edu/gareth-james/ISL/data.html
http://snap.stanford.edu/data/amazon/

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

 

Ostatnia modyfikacja: 30 September 2023

Elzbieta Richter-Was


Wstecz