Information Storage and Retrieval: February 2011

Friday, February 25, 2011

Unit 7: Relevance Feedback and Query Expansion

Relevance feedback is a method that invloves IR system users refining their queries by marking returned results as relevant or irrelevant. Once this is done, the system uses the information to reformulate the query to give more accurate results. Relevance feedback helps when concepts can be referred to using different words. RF helps to differentiate word meaning to the system.

RF is ineffective when misspellings occur, documents in the collection are cross-language or when the collection vocabulary and user vocabulary are mismatched. RF also puts more demand on the user.

Muddiest Point: 2/21/11

When evaluating precision, what are the factors in determining whether a document is relevant or irrelevant to a specific query?

Thursday, February 17, 2011

Unit 6: Evaluation

A well-performing IR system strikes a good balance between precision (the fraction of retrieved documents that are relevant) and recall (the fraction of relevant documents that are retrieved).

These two qualities are the most important when evaluation the performance of an IR system.

Muddiest Point: 2/14/11

Since languages have a defined syntax it would seem logical that a language model that can interpret grammar would have higher accuracy than the unigram model. Why does the unigram model perform better or just as accurately as n-gram or grammar-based models?

Thursday, February 10, 2011

Muddiest Point: 2/7/11

Could an IR system employ a combination of both the boolean search model and a best match model?

Friday, February 4, 2011

Unit 4: Matching Models, Ranked Boolean and Vector Space

Boolean searches are powerful tools in information retrieval that can provide improved results for the end user.

The task for the creator of the IR system is to have the system perform the Boolean search as efficiently as possible. Without using ranked indexes it is simple enough to implement Boolean search with a basic algorithm, but accuracy may be sacrificed. Users using the AND operator will get focused results but users using the OR may get many irrelevant results.

This is where ranking terms will help accuracy. Often times a document which contains frequent use of a term may not be more important than a document that uses the term a single time. This problem can be solved by using vector space and giving a weight to terms.

Thursday, February 3, 2011

Muddiest Point: 1/31/11

How do gamma codes help with compression in a situation where the largest encoded value is not known ahead of time?