Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2024/2025



 Exams:
------------
    29.01.2025, A-1-03, 9:00-11.00
    25.02.2025, A-1-03, 9.00-11.00



Lectures & Assignments:
-----------------------------------------------------


Week
Lecture slides
Lab slides
Python scripts with "primer" assignments
Datasets for "primer" and "advanced" assignments.
Tutorials and additional materials

  2.10.2024
Introduction

Data exploration




Questions for exam

Data exploration - primer
Assignment_1

kc_house_data.csv.zip
info on kc_house




Example datasets for "advanced" assignments:
1) https://www.mldata.io/dataset-details/school_grades/
2) https://insights.stackoverflow.com/survey
3) https://stat.gov.pl/
4) https://www.kaggle.com/nasa/asteroid-impacts
5)https://www.kaggle.com/vik2012kvs/walmart-dataretail-analysis
6)https://www.kaggle.com/dronio/SolarEnergy
7)https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records
8)https://www.kaggle.com/cesartrevisan/scikit-learn-and-gridsearchcv
9)https://www.kaggle.com/osmi/mental-health-in-tech-survey

lending-club-data.csv.zip
http://snap.stanford.ed/data/amazon/
http://mlr.cs.umass.edu/ml/datasets.html
https//data.world/
HowToStartAnaconda

Link to GoogleColab

DS_cheatsheet_numpy.pdf
DS_cheatsheet_matplotlib.pdf
DS_cheatsheet_jupyter_notebook.pdf
DS_cheatsheet_pandas.pdf

Exploratory Data Analysis

Fundamentals  of Analysis

Box plot
9.10.20224


9.10.2024
16.10.2024
23.10.2024
Regression-Primer

Regression-Advanced


Questions for exam

Regression - primer
Assignment_2

kc_house_data.csv.zip
info on kc_house
S.Rashki
python-machine-learning-book


rashka_ch04.ipynb
rashka_ch10.ipynb
numpy tutorial.ipynb

matrix algebra tutorial

ElasticNet Regression
30.10.2024

30.10.2024
6.11.2024
Classification-Primer

Classification-Advanced


Questions for exam


Classification - primer
Assignment_3
data:amazon_baby.csv.zip scikit-learn: LogisticRegression
13.11.2024  Canceled (wyjazd sluzbowy)




   20.11.2024

20.11.2024
27.11.2024

Clustering&Retrieval-Primer

Clustering&Retrieval-Advanced


Questions for exam


Clustering - primer
Assignment_4
data:people_wiki.cvs.zip
more reading:
LDA explained
LDA paper
4.12.2024 Recommender-System


Questions for exam


Example
https://www.geeksforgeeks.org/recommendation-system-in-python/


Recovering Eigenfaces
Matrix Factorisation

https://developers.google.com/machine-learning/recommendation/

11.12.2024 Deep Learning
Multivariate Analyses and Artificial Neural Network
postponed and moved to on-line, starts 8.01.2025, at 18.00



VisualizingAndConvolutionalNN

Understanding Deep Learning

Machine Learning in Physics
  18.12.2024 Modeling, simulation, Monte Carlo methods

Monte Carlo - primer
Assignment_5

Computational Modeling
Computational Thinking and Data Science

Introduction to Computational Modeling
(lectures by Ilya Nemenman)


8.01.2025 Statistical Inference

Questions for exam


Stat. inference - primer
Assignment_6


15.01.2025
22.01.2025
Elements of Statistical Learning




https://hastie.su.domains/ElemStatLearn/


Recomended books (
.pdf available for download) :
https://hastie.su.domains/ElemStatLearn/
https://hastie.su.domains/ISLP/ISLP_website.pdf
https://probml.github.io/pml-book/
https://www.deeplearningbook.org/

Lectures are based on the materials from Coursera: 


C. Guestrin and E. Fox, "Machine Learning Specialisation"

Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Collection of datasets

http://mlr.cs.umass.edu/ml/datasets.html
http://faculty.marshall.usc.edu/gareth-james/ISL/data.html
http://snap.stanford.edu/data/amazon/

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

 

Ostatnia modyfikacja: 19 October 2024

Elzbieta Richter-Was


Wstecz