Differences between revisions 4 and 5
Revision 4 as of 2004-04-08 15:40:27
Size: 369
Editor: yakko
Comment:
Revision 5 as of 2004-04-08 16:10:57
Size: 525
Editor: yakko
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
One of the most efficient ways to implement a StopList is to incorporate them into a lexical analyzer. There are two ways to filter stoplist words from an input token stream:

   a. Examine lexical analyzer output and remove any stopwords
   a. Remove stopwords as part of the lexical analysis: This is one of the more efficient ways to implement a StopList

Back to ComputerTerms, InformationRetrieval

A list of words that for reasons of volume or ["Precision"] and ["Recall"] will not be included in the index and hence are not searchable. E.g. "and", "or", "not" etc.

There are two ways to filter stoplist words from an input token stream:

  1. Examine lexical analyzer output and remove any stopwords
  2. Remove stopwords as part of the lexical analysis: This is one of the more efficient ways to implement a StopList

Back to ComputerTerms, InformationRetrieval

StopWords (last edited 2004-04-08 16:24:35 by yakko)