Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences