Update form Ansii to Unicode

Ein Thema von WojTec · begonnen am 1. Dez 2013 · letzter Beitrag vom 2. Dez 2013
Update form Ansii to Unicode

Original function in C++:

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
   // 'm' and 'r' are mixing constants generated offline.
   // They're not really 'magic', they just happen to work well.

   const unsigned int m = 0x5bd1e995;
   const int r = 24;

   // Initialize the hash to a 'random' value

   unsigned int h = seed ^ len;

   // Mix 4 bytes at a time into the hash

   const unsigned char * data = (const unsigned char *)key;

   while(len >= 4)
      unsigned int k = *(unsigned int *)data;

      k *= m;
      k ^= k >> r;
      k *= m;
      h *= m;
      h ^= k;

      data += 4;
      len -= 4;
   // Handle the last few bytes of the input array

   case 3: h ^= data[2] << 16;
   case 2: h ^= data[1] << 8;
   case 1: h ^= data[0];
           h *= m;

   // Do a few final mixes of the hash to ensure the last few
   // bytes are well-incorporated.

   h ^= h >> 13;
   h *= m;
   h ^= h >> 15;

   return h;
Translation for Delphi Ansii version:

function Murmur2(const S: AnsiString; const Seed: Cardinal = $9747b28c): Cardinal;
  // 'm' and 'r' are mixing constants generated offline.
  // They're not really 'magic', they just happen to work well.
  m = $5bd1e995;
  r = 24;
  hash: LongWord;
  len: LongWord;
  k: LongWord;
  data: Integer;
  len := Length(S);

  //The default seed, $9747b28c, is from the original C library

  // Initialize the hash to a 'random' value
  hash := seed xor len;

  // Mix 4 bytes at a time into the hash
  data := 1;

  while(len >= 4) do
      k := PLongWord(@S[data])^;

      k := k*m;
      k := k xor (k shr r);
      k := k*m;

      hash := hash*m;
      hash := hash xor k;

      data := data+4;
      len := len-4;

  Handle the last few bytes of the input
          S: ... $69 $18 $2f

  Assert(len <= 3);
  if len = 3 then
      hash := hash xor (LongWord(s[data+2]) shl 16);
  if len >= 2 then
      hash := hash xor (LongWord(s[data+1]) shl 8);
  if len >= 1 then
      hash := hash xor (LongWord(s[data]));
      hash := hash * m;

  // Do a few final mixes of the hash to ensure the last few
  // bytes are well-incorporated.
  hash := hash xor (hash shr 13);
  hash := hash * m;
  hash := hash xor (hash shr 15);

  Result := hash;

I don't like AnsiString, so I'm trying to change to string:

function Murmur2(const AValue: string; const Seed: Cardinal = $9747b28c): Cardinal;

Result is different than in Ansii version. I think problem is here:

k := PLongWord(@AValue[data])^;

How to fix it?

Also line:

data := 1;

is valid?
AW: Update form Ansii to Unicode

The input data seems not to be a string but a byte array. I would use an array of byte (TBytes type) to avoid the danger of string encoding conversion related bugs.
Michael Justin
Re: Update form Ansii to Unicode

If I'll use bytes as input, how to use it for strings and other data?
AW: Update form Ansii to Unicode

Simple convert the strings into a byte array.

Just keep in mind that AnsiString has 1 Byte/Char and UnicodeString has 2 Byte/Char
Re: Update form Ansii to Unicode

Ok, maybe it's good idea, but lets back to problem: ansii --> unicode?
AW: Re: Update form Ansii to Unicode

Ok, maybe it's good idea, but lets back to problem: ansii --> unicode?
The Delphi Unicode string has a code page information stored in its metadata. If York input data is meant to be just raw binary data without caring about encoding and code pages, you will not want this string type.

The RawByteString is a string type which does not carry encoding information, which can be used for binary data. But watch out and take care of compiler warnings about implicit string type conversions.

TBytes would be the appropriate data type, RawByteString is only easier to use as AnsiString replacement.
Michael Justin
