Size: 755
Comment:
|
Size: 1388
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 26: | Line 26: |
== Clustering Methods == '''Nonhierarchical methods (partitioning methods)''' divide the data set of N objects into M clusters, where ''no overlap is allowed''. Since discovering the optimal grouping of N objects into M sets is Combinations(n|M)=N!/((N-M)!M!) is impossible for large sets, so we use heuristics. Computational requirements are then usually O(NM) '''Hierarchical methods''' produce a nested data set in which pairs of items or clusters are successively linked until every item in the data set is connected. The ''agglomerative'' method performs in N - 1 pairwise joins beginning from an unclustered dataset. |
Back to ComputerTerms, InformationRetrieval
Cluster analysis is a statistical technique used to generate a category structure.The groups which are formed should have a high degree of association between members of hte same group and a low degree between members of different groups.
Similarity Measures:
2C S = ------- (Di,dj) A + B Where C is the number of terms that Di and Dj have in common, and A and B are the number of termsin Di and Dj
Similarity Matrix calculates a similarity measure between document x and y
| S21 | | S31 S32 | | ... | | SN1 SN2 ...SN(N-1) |
Clustering Methods
Nonhierarchical methods (partitioning methods) divide the data set of N objects into M clusters, where no overlap is allowed. Since discovering the optimal grouping of N objects into M sets is Combinations(n|M)=N!/((N-M)!M!) is impossible for large sets, so we use heuristics. Computational requirements are then usually O(NM)
Hierarchical methods produce a nested data set in which pairs of items or clusters are successively linked until every item in the data set is connected. The agglomerative method performs in N - 1 pairwise joins beginning from an unclustered dataset.
Back to ComputerTerms, InformationRetrieval