Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.

Choosing non-redundant representative subsets of protein sequence data sets using submodular optimization.