Machine Learning
PIBWI19
pi
2
V
2
U
5
6
no
English
Written exam
KI575
Computer Science and Communication Systems
6
optional course
KIB-MLRN
Computer Science and Communication Systems
6
optional course
PIBWI19
Applied Informatics
6
optional course
PIB-MLRN
Applied Informatics
6
optional course
60 class hours (= 45 clock hours) over a 15-week period.The total student study time is 150 hours (equivalent to 5 ECTS credits).There are therefore 105 hours available for class preparation and follow-up work and exam preparation.
PIB115
Fundamentals of Informatics
PIB120
Programming 1
PIB125
Mathematics 1
PIB215
Mathematics 2
PIB315
Mathematics 3
PIB330
Databases
Prof. Dr. Klaus Berberich
kbe
Prof. Dr. Klaus Berberich
kbe
After successfully completing this module, students will know about fundamental supervised and unsupervised methods from machine learning. This includes methods for regression, classification, and clustering. Students will understand how these methods work and know how to use existing implementations (e.g., in libraries such as scikit-learn). Given a practical problem setting, they will be able to choose a suitable method, apply it to the dataset at hand, and assess the quality of the determined model. In addition, students will be aware of typical data-quality issues and know how to resolve them.
Machine learning plays an increasingly important role with applications ranging from recognizing handwritten digits, via filtering out unwanted span e-mails, to the ranking of results in modern search engines. After successfully completing this module, students will know about fundamental supervised and unsupervised methods of machine learning. We will look into how these methods are defined formally, including the mathematics behind them. Moreover, we will apply all methods on concrete datasets to solve practical problems. To do so, we will rely on existing libraries (e.g., scikit-learn) that provide efficient implementations of the methods. This course will be accompanied by theoretical exercises and project assignments. The exercises will help students to deepen their understanding of the methods, while the project assignments will encourage students to solve practical problems by applying their knowledge to real-world datasets.
1. Introduction
- What is Machine Learning?
- Applications
- Libraries
- Literature
2. Working with data
- Typical data formats (e.g., CSV, spreadsheets, databases)
- Data quality issues (e.g., outliers, duplicates)
- Scales of measures (i.e., nominal, ordinal, numerical)
- Data pre-processing (in Python and using UNIX command line tools)
3. Regression
- Ordinary least squares
- Multiple linear regression
- Non-linear regression
- Evaluation
4. Classification
- Logistic regression
- k-nearest neighbors
- Naive Bayes
- Decision trees
- Neural networks
- Evaluation
5. Clustering
- k-means and k-medoids
- Hierarchical agglomerative/divisive clustering
- Evaluation
6. Outlook
- Ongoing research
- Competitions (e.g., Kaggle and KDD Cup)
- Other resources (e.g., KDnuggets)
P. Harrington: Machine Learning in Action, Manning, 2012
G. James, D. Witten, T. Hastie, R. Tibshirani: An Introduction to Statistical Learning - with Applications in R, Springer, 2015
A. C. Müller and S. Guido: Introduction to Machine Learning with Python, O"Reilly, 2017
M. J. Zaki und W. Meira Jr.: Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, 2014
SS 2023
SS 2022
SS 2021
SS 2020
SS 2019
SS 2018
SS 2017
Mon May 29 19:45:08 CEST 2023, CKEY=kml, BKEY=pi, CID=[?], LANGUAGE=en, DATE=29.05.2023