2012-04-04

Quick search algorithm and strstr

Quick Search algorithm(Qs) is a string search algorithm. Qs is a simplification of Boyer-Moore algorithm and faster than it.

cf. http://www-igm.univ-mlv.fr/~lecroq/string/node19.html

So I measured the speed of Qs on Core 2 Duo and found it is faster than strstr for a key string with 4~6 bytes or longer length.

But latest Intel CPU such as Sandy Bridge(Core i7-2600K) or Xeon 5650 has SSE4.2 (string and test processing instructions). So I compared strstr_sse42 with Qs then found strstr_sse42 is 2~4 times faster than Qs for a key string with at least 32-bytes length.

The test code is https://github.com/herumi/opti/blob/master/str_util_test.cpp.

My implementation of strstr is https://github.com/herumi/opti/blob/master/str_util.hpp. It requires Xbyak.

The main code is


/*
  strstr(a, key);
  input key
  use save_a, save_key, c
*/
  movdqu(xm0, ptr [key]); // xm0 = *key
L(".lp");
  pcmpistri(xmm0, ptr [a], 12); // 12(1100b) = [equal ordered:unsigned:byte]
  if (isSandyBridge) {
    lea(a, ptr [a + 16]);
    ja(".lp"); // if (CF == 0 and ZF = 0) goto .lp
  } else {
    jbe(".headCmp"); // if (CF == 1 or ZF == 1) goto .headCmp
    add(a, 16);
    jmp(".lp");
L(".headCmp");
  }
  jnc(".notFound");
  ...

It is slower to use lea with pcmpistri on Xeon 5650 but a little faster on Sandy Bridge, so I check the type of CPU and generate better code according to each CPU.

I make a slide for detail, so please see http://www.slideshare.net/herumi/x86opti3.

No comments: