INLS 490-154W: Information Retrieval Systems Design and Implementation

Fall 2009. Web-based Course

[ Home ] [ Syllabus ] [ Assignments ] [ Project ] [ Resources ]

Project Guidelines
As a part of this course, you are required to do a project demonstrating several of the concepts covered during the course. Before starting the actual project, you will be providing a proposal (details below), and upon the approval of the instructor, you can commence the project. This project will be due on TBD. Early submissions are encouraged. The project submission should be in form of (1) source files, (2) online working site, and (3) a report. The project will be tested with Firefox browser. Remember, the project carries 25% of the grade for this class.

Project proposal (Due: TBD)
You need to submit a brief article (2-4 pages) proposing the project that you want to do. This article should have
  • Problem description
  • Your approach/design
  • What is unique about your approach? In other words, if you're trying to sell this, why would anyone bother investing in this instead of using an off-the-shelf product?
  • Mockup and/or a brief description of what the outcome will be.
Final project (Due: TBD)
You are allowed to pick your application/domain. You can even use some existing project, but you need to specify how much is already done. Your final project MUST have the following components.
  • A text collection (unstructured data) of a "reasonable" size (preferably collected by a crawl).
  • A user interface with proper navigational tools so that even a naive user can utilize it with ease.
  • A search facility that allows one to perform full-text search in the collection.
  • A way to visualize the information (rank list, clusters, tag-cloud, etc.).
  • A mechanism to log all the user interactions.
  • At least one feature that is not commonly found in most "traditional" IR systems.
For text processing (mostly search-related), you are required to use Lemur. For structured databases, MySQL is recommended. For building UI, PHP is recommended.
You may optionally have
  • Advance search.
  • Dynamic UI to enhance user experience.
  • Interactive session support.
  • Meshing with a live Web application.
  • Clever use of CSS and Javascript for validation and site configuration.
  • Evaluation of processes/results with appropriate measures.
Finally, this should be the kind of work that you feel comfortable (and proud) demonstrating and listing on your portfolio. At the least, your project will be showcased (at your discretion) on the course website.
Your final submission should include all the source files (including the parameter files), a link to the online working site, and a brief article documenting the project. This document should have
  • Introduction - what is this project about and what it does/serves. (1/2 to 1 page)
  • Design details (may include figures). Explain your decisions behind certain design choices. (1-2 pages)
  • Usage scenario (may include screenshots). (1-3 pages)
  • Known issues and future work. (1-2 pages)
  • License.
Grade rubric
  • Proposal (15 points)
  • Source and parameter files (10 points)
  • Working site
    • Proper use of tools (10 points)
    • Ease of use of the interface (10 points)
    • Functionality (does it serve what it says in the proposal?) (10 points)
    • Features (10 points)
    • Explanation of processes and results (10 points)
  • Final report (25 points)
Project ideas
  • An interactive IR system that suggests related terms to the user based on the initial query. Based on the feedback from the user (the terms that he selects), the original query can be modified and run again.
  • A search system that supports weighted queries, where the user can specify relative importance of his/her query terms.
  • An interactive IR system that shows related queries to the user based on the pre-stored or learned queries from the past.
  • A semi-structured information retrieval system, which crawls webpages from a narrow domain, identifies some of the facets (title of the page, author, etc.) and lets one search in those fields or the whole webpage.
  • A search system for a focused crawl. For instance, crawl the blogosphere and collect blog postings on a specific topic, and present it to the user with a search interface.
  • An IR system that crawls webpages from a narrow domain, creates and presents clusters of them, along with a search interface.
  • A faceted browsing and searching system that allows one to browse through information content (text documents) based on various facets (category, author, date, etc.) as well as do full-text search.

Student Projects - Spring 2009
A Domain-Specific Search System with Term Clouds by Day Alaba
This project provides tools to explore a collection of news stories. This site employs a text box-type tool which returns a ranked list of options based on a query of keywords. This tool also has a term cloud feature to reveal important keywords to a novice user. Additionally, this site also displays feedback terms to assist a user in changing the focus of a query.
Health Info Rover by Annie Chen
Health Info Rover is designed to help users find information on fibromyalgia, Chronic Fatigue Syndrome (CFS), and other related health conditions. Aside from searching by keyword, limit searches to various types of sites: government and professional organizations, consumer health information services, fibromyalgia- and CFS-specific organizations, and community/support groups, or search in our subject area indices: Nutrition and Complementary and Alternative Medicine (CAM).
Craft Blog Search by Lee Harrison
This project allows users to search craft-related blogs using a very basic search interface. In addition, once the search results are retrieved, the user can decide to save various results to a user-specific list for later review. The user can review the list of saved links and continue with a few different actions: go to one of the websites, remove a link, or return to the search results.
RSS Feed Search Engine by Lina Huang
The working product of this project is a sample RSS search engine, with nearly 30,000 RSS feed documents mainly crawled from wordPress.com and liveJounal.com. Within the working site, user is able to conduct full-text search to retrieve the RSS feeds that are of their interests, as well as can view, subscribe to certain RSS feed.
Course Search Engine (CSE) by Jimmy Nguyen
This application helps a student create a customized search tool specifically for their courses. This can allow a student to quickly search for his/her exam study guide without having to sort through any other documents.

Selected Student Projects - Fall 2008
Video Game Review Database Search by Ben Pennell
Lemur as a Web Application by Philip Fulcher
Relevance Feedback with Query Term Suggestions by Kyle Richardson

| Chirag Shah | Last update: August 23, 2009 |