Differences between revisions 1 and 5 (spanning 4 versions)
Revision 1 as of 2004-04-08 01:34:07
Size: 256
Editor: yakko
Comment:
Revision 5 as of 2004-04-08 16:10:57
Size: 525
Editor: yakko
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
A list of words that for reasons of volumne or precision and recall will not be included in the index and hence are not searchable. E.g. "and", "or", "not" etc. A list of words that for reasons of volume or ["Precision"] and ["Recall"] will not be included in the index and hence are not searchable. E.g. "and", "or", "not" etc.

There are two ways to filter stoplist words from an input token stream:

   a. Examine lexical analyzer output and remove any stopwords
   a. Remove stopwords as part of the lexical analysis: This is one of the more efficient ways to implement a StopList

Back to ComputerTerms, InformationRetrieval

A list of words that for reasons of volume or ["Precision"] and ["Recall"] will not be included in the index and hence are not searchable. E.g. "and", "or", "not" etc.

There are two ways to filter stoplist words from an input token stream:

  1. Examine lexical analyzer output and remove any stopwords
  2. Remove stopwords as part of the lexical analysis: This is one of the more efficient ways to implement a StopList

Back to ComputerTerms, InformationRetrieval

StopWords (last edited 2004-04-08 16:24:35 by yakko)