The (black) art of runtime evaluation: Are we comparing algorithms or implementations?

The (black) art of runtime evaluation: Are we comparing algorithms or implementations?