| Assignment-4: Query processing and retrieval |
Assigned on: 02/05/2009, Due on: 02/10/2009
|
| 1. Build an index using Krovetz stemmer and stop words removal, taking the first 10 countries' descriptions from the CIA World Factbook as documents. Use the following three queries to generate retrieval sets using TFIDF method: (1) columbus island, (2) british colony, (3) independence. (6 points)
|
| 2. Find the TFIDF values of the following terms for each document they occur in, using the above index: communist, combat, independence, establish. (Hint: use dumpindex to extract statistics about a term) (4 points) |