|
|
Module code: PIB-IRET |
|
2V+2PA (4 hours per week) |
5 |
Semester: 5 |
Mandatory course: no |
Language of instruction:
English |
Assessment:
Written exam/Project
[updated 26.02.2018]
|
DFIW-IRET (P610-0540) Computer Science and Web Engineering, Bachelor, ASPO 01.10.2019
, semester 3, mandatory course, informatics specific
KI584 (P221-0080, P610-0253) Computer Science and Communication Systems, Bachelor, ASPO 01.10.2014
, semester 5, optional course, informatics specific
KIB-IRET Computer Science and Communication Systems, Bachelor, ASPO 01.10.2021
, semester 5, optional course, technical
KIB-IRET Computer Science and Communication Systems, Bachelor, ASPO 01.10.2022
, semester 5, optional course, technical
PIBWI29 (P221-0080) Applied Informatics, Bachelor, ASPO 01.10.2011
, semester 5, optional course, informatics specific
PIB-IRET (P221-0080) Applied Informatics, Bachelor, ASPO 01.10.2022
, semester 5, optional course, informatics specific
Suitable for exchange students (learning agreement)
|
60 class hours (= 45 clock hours) over a 15-week period. The total student study time is 150 hours (equivalent to 5 ECTS credits). There are therefore 105 hours available for class preparation and follow-up work and exam preparation.
|
Recommended prerequisites (modules):
None.
|
Recommended as prerequisite for:
|
Module coordinator:
Prof. Dr. Klaus Berberich |
Lecturer: Prof. Dr. Klaus Berberich
[updated 10.11.2016]
|
Learning outcomes:
After successfully completing this course, students will have learned basic information retrieval methods. This includes retrieval models (e.g., Vector Space Model), link analysis (e.g., PageRank), and effectiveness measures (e.g., Precision/Recall and MAP). They will be able to apply/implement the above methods in practice. In addition, students will be aware of easily accessible information retrieval systems (e.g., Apache Lucene/Solr).
[updated 26.02.2018]
|
Module content:
Information Retrieval is pervasive and its applications range from finding contacts or e-mails on your smartphone to web-search engines that index billions of web pages. This course covers the most important information retrieval methods. We will look into how these methods are defined formally, including the mathematics behind them, but also see how they can be implemented efficiently in practice. As part of the project work, we will implement a small search engine from scratch. 1. Introduction - History - Applications - Course overview 2. Natural language - Documents and terms - Stopwords and stemming/lemmatization - Synonyms, polysemes, compounds 3. Retrieval models - Boolean retrieval - Vector space model with TF.IDF term weighting - Language models 4. Indexing methods - Inverted index - Compression (d-Gaps, variable-byte encoding) - Index pruning 5. Query processing - Holistic methods (DAAT, TAAT) - Top-k methods (NRA, WAND) 6. Evaluation - Cranfield Paradigm - Benchmark initiatives (TREC, CLEF, NTCIR) - Traditional effectiveness measures (precision, recall, MAP) - Non-traditional effectiveness measures (nDCG, ERR) 7. Web retrieval - Crawling - Near-duplicate detection - Link analysis (PageRank, HITS) - Web spam 8. Information retrieval systems - Indri - Apache Lucene/Solr - ElasticSearch
[updated 26.02.2018]
|
Recommended or required reading:
Christopher D. Manning, Prabhakar Ragahavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008. (available online at: http://nlp.stanford.edu/IR-book/) Reginald Ferber: Information Retrieval: Suchmodelle und Data-Mining Verfahren für Textsammlungen und das Web, dpunkt, 2003. (available online at: http://information-retrieval.de/irb/ir.html) Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack: Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
[updated 26.02.2018]
|
Module offered in:
WS 2022/23,
WS 2021/22,
WS 2020/21,
WS 2019/20
|