⇤ ← Revision 1 as of 2004-04-08 15:15:03
Size: 585
Comment:
|
Size: 583
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
Signature files tipically use SuperImposedCoding | Signature files tipically use Super Imposed Coding |
Line 5: | Line 5: |
1. Each document is divided into logical blocks containing D distinct words (StopList words are usually removed before we make the block) | 1. Each document is divided into logical blocks containing D distinct words (Stop words are usually removed before we make the block) |
Back to ComputerTerms, InformationRetrieval
Signature files tipically use Super Imposed Coding
- Each document is divided into logical blocks containing D distinct words (Stop words are usually removed before we make the block)
Each word yields a binary "word signature" using some kind of hash code that is F bits in length with m bits set to 1.
The word signature are OR'd together to form the block signature
- The block signatures are concatenated together to form the document signature.
Back to ComputerTerms, InformationRetrieval