Students know about basic methods from Information Retrieval. This includes retrieval models (e.g., Vector Space Model and Binary Independence Model), link analysis (e.g., PageRank), and effectiveness measures (e.g., Precision/Recall and MAP). They can apply/implement those methods in practice. In addition, students are aware of readily available information retrieval systems (e.g., Apache Lucene/Solr).
Information Retrieval is pervasive and its applications range from finding contacts or e-mails on your smartphone to web-search engines that index billions of web pages. This course covers the most important methods from Information Retrieval. We will look into how these methods are defined formally, including the mathematics behind them, but also see how they can be implemented efficiently in practice. As part of the project work, we will implement a small search engine from scratch. 1. Introduction - History - Applications - Overview of the Course 2. Natural Language - Documents and Terms - Stopwords and Stemming/Lemmatization - Synonyms, Polysems, Compounds 3. Retrieval Models - Boolean Retrieval - Vector Space Model with TF.IDF Term Weighting - Language Models 4. Indexing Methods - Inverted Index - Compression (d-Gaps, Variable-Byte Encoding) - Index Pruning 5. Query Processing - Holistic Methods (DAAT, TAAT) - Top-k Methods (NRA, WAND) 6. Evaluation - Cranfield Paradigm - Benchmark Initiatives (TREC, CLEF, NTCIR) - Traditional Effectiveness Measures (Precision, Recall, MAP) - Non-Traditional Effectiveness Measures (nDCG, ERR) 7. Web Retrieval - Crawling - Near-Duplicate Detection - Link Analysis (PageRank, HITS) - Web Spam 8. Information Retrieval Systems - Indri - Terrier - Anserini - Apache Lucene/Solr - ElasticSearch
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack: Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010. Reginald Ferber: Information Retrieval: Suchmodelle und Data-Mining Verfahren für Textsammlungen und das Web, dpunkt, 2003. (online verfügbar unter: http://information-retrieval.de/irb/ir.html) W. Bruce Croft, T. Strohman, D. Metzler: Search Engines Information Retrieval in Practice: Information Retrieval in Practice, Pearson, 2009 (online verfügbar unter: https://ciir.cs.umass.edu/irbook/) Christopher D. Manning, Prabhakar Ragahavan, and Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008. (online verfügbar unter: http://nlp.stanford.edu/IR-book/)
