htw saar
Back to Main Page

Choose Module Version:

flag flag

Information Retrieval

Module name (EN): Information Retrieval
Degree programme: Computer Science and Communication Systems, Bachelor, ASPO 01.10.2017
Module code: KIB-IRET
Hours per semester week / Teaching method: 2V+2PA (4 hours per week)
ECTS credits: 5
Semester: 5
Mandatory course: no
Language of instruction:
German
Assessment:
Written exam/Project
Curricular relevance:
KI584 Computer Science and Communication Systems, Bachelor, ASPO 01.10.2014, semester 5, optional course, informatics specific
KIB-IRET Computer Science and Communication Systems, Bachelor, ASPO 01.10.2017, semester 5, optional course, technical
PIBWI29 Applied Informatics, Bachelor, ASPO 01.10.2011, semester 5, optional course, informatics specific
PIB-IRET Applied Informatics, Bachelor, ASPO 01.10.2017, semester 5, optional course, informatics specific

Suitable for exchange students (learning agreement)
Workload:
60 class hours (= 45 clock hours) over a 15-week period.
The total student study time is 150 hours (equivalent to 5 ECTS credits).
There are therefore 105 hours available for class preparation and follow-up work and exam preparation.
Recommended prerequisites (modules):
None.
Recommended as prerequisite for:
Module coordinator:
Prof. Dr. Klaus Berberich
Lecturer: Prof. Dr. Klaus Berberich

[updated 10.11.2016]
Learning outcomes:
After successfully completing this course, students will have learned basic information retrieval methods. This
includes retrieval models (e.g., Vector Space Model), link analysis
(e.g., PageRank), and effectiveness measures (e.g., Precision/Recall
and MAP). They will be able to apply/implement the above methods in practice. In
addition, students will be aware of easily accessible information
retrieval systems (e.g., Apache Lucene/Solr).


[updated 26.02.2018]
Module content:
Information Retrieval is pervasive and its applications range from
finding contacts or e-mails on your smartphone to web-search engines
that index billions of web pages. This course covers the most
important information retrieval methods. We will look into how
these methods are defined formally, including the mathematics behind
them, but also see how they can be implemented efficiently in
practice. As part of the project work, we will implement a small
search engine from scratch.
 
1. Introduction
- History
- Applications
- Course overview
 
2. Natural language
- Documents and terms
- Stopwords and stemming/lemmatization
- Synonyms, polysemes, compounds
 
3. Retrieval models
- Boolean retrieval
- Vector space model with TF.IDF term weighting
- Language models
 
4. Indexing methods
- Inverted index
- Compression (d-Gaps, variable-byte encoding)
- Index pruning
 
5. Query processing
- Holistic methods (DAAT, TAAT)
- Top-k methods (NRA, WAND)
 
6. Evaluation
- Cranfield Paradigm
- Benchmark initiatives (TREC, CLEF, NTCIR)
- Traditional effectiveness measures (precision, recall, MAP)
- Non-traditional effectiveness measures (nDCG, ERR)
 
7. Web retrieval
- Crawling
- Near-duplicate detection
- Link analysis (PageRank, HITS)
- Web spam
 
8. Information retrieval systems
- Indri
- Apache Lucene/Solr
- ElasticSearch
 


[updated 26.02.2018]
Recommended or required reading:
Christopher D. Manning, Prabhakar Ragahavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008.
(available online at: http://nlp.stanford.edu/IR-book/)
 
Reginald Ferber: Information Retrieval: Suchmodelle und Data-Mining Verfahren für Textsammlungen und das Web, dpunkt, 2003.
(available online at: http://information-retrieval.de/irb/ir.html)
 
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack: Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.


[updated 26.02.2018]
Module offered in:
WS 2020/21 (probably), WS 2019/20
[Sun Jul  5 16:16:24 CEST 2020, CKEY=kir, BKEY=ki2, CID=KIB-IRET, LANGUAGE=en, DATE=05.07.2020]