Introduction to Data Science (Wprowadzenie do analityki danych)

Wydział Fizyki, Astronomii i Informatyki Stosowanej,

Uniwersytet Jagielloński  w Krakowie


Rok akademicki 2025/2026

konsultacje: 
    wtorek, godz 10.00-11.00, pokoj G-0-10
     sroda,    godz 9.00-10.00,  pokoj G-0-10


Egzaminy:  
       I termin:   29.01.2026,   14.30-16.30,  sala  A-1-03
      II termin:   25.02.2026,     9.00-11.00,  sala  A-1-03




Lectures & Assignments:
-----------------------------------------------------



Week
Lecture slides
Lab slides
Python scripts with "primer" assignments
Datasets for "primer" and "advanced" assignments.
Tutorials and additional materials

  9.10.2025
Introduction

Data exploration




Questions for exam

Data exploration - primer
Assignment_DataExploration

kc_house_data.csv.zip
info on kc_house



Example datasets for "advanced" assignments:
1) https://insights.stackoverflow.com/survey
2) https://stat.gov.pl/
3) https://www.kaggle.com/nasa/asteroid-impacts
4)https://www.kaggle.com/dronio/SolarEnergy
5)https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records
6)https://www.kaggle.com/cesartrevisan/scikit-learn-and-gridsearchcv
7)https://www.kaggle.com/osmi/mental-health-in-tech-survey
8)https://zenodo.org/records/4603412
9)https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume

lending-club-data.csv.zip
http://snap.stanford.ed/data/amazon/
http://mlr.cs.umass.edu/ml/datasets.html
https//data.world/
HowToStartAnaconda

Link to GoogleColab

DS_cheatsheet_numpy.pdf
DS_cheatsheet_matplotlib.pdf
DS_cheatsheet_jupyter_notebook.pdf
DS_cheatsheet_pandas.pdf

Exploratory Data Analysis

Fundamentals  of Analysis

Box plot

Introduction to Modern Statistics
https://openintro-ims.netlify.app/
16.10.2025

23.10.2025

30.10.2025

Regression-Primer

Regression-Advanced

Regression- StatInference


Questions for exam

Regression - primer
Assignment_Regression

kc_house_data.csv.zip
info on kc_house
S.Rashki
python-machine-learning-book


rashka_ch04.ipynb
rashka_ch10.ipynb
numpy tutorial.ipynb

matrix algebra tutorial

ElasticNet Regression
6.11.2025

13.11.2025
20.11.2025

Classification-Primer

Classification-Advanced


Questions for exam


Classification - primer
Assignment_Classification
data:amazon_baby.csv.zip scikit-learn: LogisticRegression
 20.11.2025


27.11.2025
4.12.2025
Clustering&Retrieval-Primer

Clustering&Retrieval-Advanced


Questions for exam


Clustering - primer
Assignment_Clustering
data:people_wiki.cvs.zip
more reading:
LDA explained
LDA paper

18.12.2025

Recommender-System


Questions for exam


Example
https://www.geeksforgeeks.org/recommendation-system-in-python/


Recovering Eigenfaces
Matrix Factorisation

https://developers.google.com/machine-learning/recommendation/

8.01.2026
15.01.2026

Statistical Inference

Questions for exam



Stat. inference - primer
Assignment_StatInference


https://openintro-ims.netlify.app/
  22.01.2026


Deep Learning
Multivariate Analyses and Artificial Neural Network


Modeling, simulation, Monte Carlo methods




VisualizingAndConvolutionalNN

Understanding Deep Learning

Machine Learning in Physics
Computational Modeling

Computational Thinking and Data Science

Introduction to Computational Modeling
(lectures by Ilya Nemenman)



Recomended books (
.pdf available for download) :
https://hastie.su.domains/ElemStatLearn/
https://hastie.su.domains/ISLP/ISLP_website.pdf
https://probml.github.io/pml-book/
https://www.deeplearningbook.org/

Lectures are based on the materials from Coursera: 


C. Guestrin and E. Fox, "Machine Learning Specialisation"

Related interesting material from Coursera
D. Peng, J. Leek and B. Caffo, " Exploratory Data Analysis"
J. Leskovec, A. Rajaraman and J. Ullman, "Mining Massive Datasets"
B. Caffo, R. D. Peng and J. Leek, "Regression Models"

Data Science applications in physics:
B. Nachman,

"Advanced Machine Learning for Classification, Regression, and Generation in Jet Physics
"
M. Stoye,

"ML applications in CMS"
ML techniques in HEP,  Workshop, Berkeley Laboratory, 11 - 13 December 2018
https://indico.physics.lbl.gov/indico/event/546/

Collection of datasets

http://mlr.cs.umass.edu/ml/datasets.html
http://faculty.marshall.usc.edu/gareth-james/ISL/data.html
http://snap.stanford.edu/data/amazon/

Useful links:
https://turi.com/download/install-graphlab-create-aws-coursera.html
https://turi.com/download/academic.html
https://github.com/turi-code/SFrame

Clustering
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Boosting
https://turi.com/learn/userguide/supervised-learning/boosted_trees_classifier.html
https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
-----------------
[1] https://class.coursera.org/statistics
[2] http://www.openintro.org/stat/textbook.php
[3] https://class.coursera.org/exdata-006
[4] https://class.coursera.org/mmds
[5]
http://www.mmds.org/
[6] http://www.cs.cmu.edu/~awm/tutorials.html

Additional materials
http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf
http://statweb.stanford.edu/~tibs/ElemStatLearn/
http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
http://www.youtube.com/watch?v=wQhVWUcXM0A

 

Ostatnia modyfikacja: 1 October 2025

Elzbieta Richter-Was


Wstecz