| # | Date | Topics and Readings | Objectives | Quizzes & Assignments |
| 1. | 08/28/2009 | 1. Introduction [Slides | Notes]- Introduction to the course
- Setup your machine with necessary tools
- Overview of some UNIX commands and utilities
- Structured data access from a MySQL database
Reading: UNIX Primer | - Familiarize with basic terminology of a search system environment.
- Practice accessing structured data.
| Quiz-1 Due on: 08/28/2009 Assignment-1 Due on: 09/01/2009 |
| 2. | 09/03/2009 | 2. IR with MySQL and Text Files
[Slides | Notes]- Structured data access and display in a webpage
- Unstructured data access from (1) MySQL tables and (2) text files
Reading: Getting Started with MySQL Reading: HTML Tutorial, HTML Forms and Input Reading: PHP introduction, installation, syntax, variables. | - Practice accessing and processing structured data using MySQL.
- Demonstrate how textual data can be accessed from MySQL as well as flat-files.
| Assignment-2 Due on: TBD |
| 3. | 09/10/2009 | 3. Learning to index [Slides | Notes]- Effective indexing of text documents
- Basic understanding of information retrieval model
- Work with Lemur Toolkit
Reading: The Anatomy of a Large-Scale Hypertextual Web Search Engine Reading: Overview of Lemur | - Describe a general model of information retrieval.
- Explain the importance of indexing, stemming, and stopwords removal.
- Demonstrate how these processes are executed in a typical search environment.
- Configure Lemur Toolkit and related tools.
- Use Lemur to index a set of documents.
| Assignment-3 Due on: TBD |
| 4. | 09/17/2009 | 4. Query processing and retrieval [Slides | Notes]- Represent query "by hand" and then using Lemur
- Retrieve documents
Reading: Google Basic Search Guidelines | - Process a text query for matching it with an indexed collection.
- Retrieve a set of relevant documents matching the query using vector space model.
| Assignment-4 Due on: TBD |
| 5. | 09/24/2009 | 5. Retrieval models-1 [Slides | Notes]- Vector space, Boolean, and Langauge model
Reading: Boolean retrieval [PDF] by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze Reading: A language modeling approach to information retrieval by Jay Ponte and W. Bruce Croft | - Demonstrate use of various retrieval models.
- Describe the pros and cons of these models.
| Assignment-5 Due on: TBD |
| 6. | 10/01/2009 | 6. Retrieval models-2 [Slides | Notes]- Probabilistic models
- Relevance models
Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 1 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson Reading: A probabilistic model of information retrieval: development and comparative experiments. Part 2 [PDF] by K. Sparck Jones, S. Walker and S. E. Robertson Reading: Relevance based language models by Victor Lavrenko and W. Bruce Croft | - Demonstrate use of language and probabilistic models for retrieving and ranking.
- Utilize (pseudo-)relevance feedback in retrieval process.
| Assignment-6 Due on: TBD |
| 7. | 10/08/2009 | 7. Structured query processing [Slides | Notes]- Query term weighting
- Query term suggestions
Reading: Helping people find what they don't know by Nicolas J. Belkin Reading: Using terminological feedback for web search refinement: a log-based study by Peter Anick
8. Evaluation-1 [Slides | Notes]- Recall and precision measures in IR
- TREC evaluation
Reading: Evaluation of Evaluation in Information Retrieval [PDF] by Tefko Saracevic | - Employ a method of providing relevance feedback in a retrieval setup.
- Demonstrate how the system can provide term suggestions for a query.
- Demonstrate ways to evaluate retrieval performance.
- Employ TREC measures to evaluate and report retrieval effectiveness of an IR system.
| Assignment-7 Due on: TBD |
| 8. | 10/15/2009 | 9. Evaluation-2 [Slides | Notes]- GMAP and bpref measures
- Mean reciprocal rank and other rank-based measures
- Comparing rank-lists
| - Demonstrate ways to evaluate retrieval performance with measures other than standard recall and precision.
- Employ TREC measures to evaluate, compare, and report retrieval effectiveness of IR systems.
| Assignment-8 Due on: TBD |
| 9. | 10/29/2009 | 10. UI for search [Slides | Notes] - Basic UI for search services
- Dynamic UI for search services with AJAX
Reading: AJAX Tutorial | - Develop a functional and user-friendly UI for search.
- Add dynamic interaction components to a UI for search.
| -- |
| 10. | 11/05/2009 | 11. Web crawling [Slides | Notes]- Web crawling with "wget" and "Heritrix" crawlers
- Building a custom crawler
Reading: Focused crawling: a new approach to topic-specific Web resource discovery by Soumen Chakrabarti, Martin van den Berg, and Byron Dom Reading: Random Web Crawls [PDF] by Toufik Bennouas and Fabien de Montgolfier.
| - Collect documents from the Web using crawlers
- Use a service employing REST protocol
- Demonstrate how an XML document can be parsed
| Assignment-9 Due on: TBD |
| 11. | 11/12/2009 | 12. IR on Web 2.0 [Slides | Notes]- Using REST protocol based services
- Parsing XML documents
Reading: XML Tutorial | - Get acquainted with some issues out of scope for this course, but are still related and important
| -- |
| 12. | 11/19/2009 | 13. Information organization [Slides | Notes]- Organizing information using (1) term-clouds and (2) clustering
| - Prepare a collection visualization interface using term-clouds
- Demonstrate how documents can be clustered based on their contents
| Assignment-10 Due on: TBD |
| 13. | 12/03/2009 | 14. Wrap-up [Slides] | - Get acquainted with some issues out of scope for this course, but are still related and important
| -- |