Size: 583
Comment:
|
Size: 582
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
1. Each document is divided into logical blocks containing D distinct words (Stop words are usually removed before we make the block) | 1. Each document is divided into logical blocks containing D distinct words (StopWords are usually removed before we make the block) |
Back to ComputerTerms, InformationRetrieval
Signature files tipically use Super Imposed Coding
Each document is divided into logical blocks containing D distinct words (StopWords are usually removed before we make the block)
Each word yields a binary "word signature" using some kind of hash code that is F bits in length with m bits set to 1.
The word signature are OR'd together to form the block signature
- The block signatures are concatenated together to form the document signature.
Back to ComputerTerms, InformationRetrieval