Differences between revisions 2 and 3
Revision 2 as of 2004-04-08 15:17:33
Size: 583
Editor: yakko
Comment:
Revision 3 as of 2004-04-08 15:19:52
Size: 582
Editor: yakko
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
   1. Each document is divided into logical blocks containing D distinct words (Stop words are usually removed before we make the block)    1. Each document is divided into logical blocks containing D distinct words (StopWords are usually removed before we make the block)

Back to ComputerTerms, InformationRetrieval

Signature files tipically use Super Imposed Coding

  1. Each document is divided into logical blocks containing D distinct words (StopWords are usually removed before we make the block)

  2. Each word yields a binary "word signature" using some kind of hash code that is F bits in length with m bits set to 1.

  3. The word signature are OR'd together to form the block signature

  4. The block signatures are concatenated together to form the document signature.

Back to ComputerTerms, InformationRetrieval

SignatureFile (last edited 2006-02-19 20:50:24 by yakko)