MinHash
In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder , and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.
Wikipage redirect
Andrei BroderBag-of-words modelBloom filterComputational genomicsCount–min sketchDimensionality reductionFeature hashingFungal genomeIntersection (set theory)Jaccard indexJubatusLevenshtein distanceList of data structuresList of statistics articlesMachine learning in bioinformaticsMetabolic gene clusterMichael MitzenmacherMin-wise independenceMinhashN-gramNearest neighbor searchOutline of machine learningQuotient filterRecord valueRolling hashSalesforceSimHashTabulation hashingUniversal hashingW-shingling
Link from a Wikipage to another Wikipage
primaryTopic
MinHash
In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder , and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.
has abstract
In computer science and data m ...... larity of their sets of words.
@en
Wikipage page ID
30,632,997
page length (characters) of wiki page
Wikipage revision ID
1,024,796,074
Link from a Wikipage to another Wikipage
authorlink
Andrei Broder
@en
first
Andrei
@en
last
Broder
@en
wikiPageUsesTemplate
hypernym
type
comment
In computer science and data m ...... larity of their sets of words.
@en
label
MinHash
@en