Einzelnen Beitrag anzeigen

Kas Ob.

Registriert seit: 3. Sep 2023
379 Beiträge
 
#16

AW: Floyd-Steinberg Dithering

  Alt 20. Okt 2023, 12:39
Hi for all,
Please bare a little with me, i mean really don't take my calling the Delphi Compiler as an offense toward anyone, it is just i know it, i used to be and still earning my bread from optimizing old and new code, i know the compiler very intimately to call it a fucking centuries old brick.

I will explain just this piece of code, and if you want to expand on it, but first and foremost please, at least consider your knowledge of the code generated by Delphi is wrong(outdated or naively trusting) or unoptimized (inefficient) and will go from there to proof a contradiction, how about that ?.. it is my best method as mathematician by education.

so many agreed on div 16 is always shr 4 or to be more concise should be sar 4, i agree and i know that the compiler wrongly do it for unsigned integers, but this is not the case, as i was talking about that specific case at hand, may be it is my mistake may to not wrote an essay for each line i wrote.

So here a proof that the compiler doesn't use sar 4 or shr 4 for div 16, the proof is just look at x64 version of it !!

try this function in the above optimized version
Code:
PROCEDURE SetPixel(XOffset,YOffset,Factor:NativeInt);
var AP:TPBGR;
begin
   // XOffset=Horizontaler Offset in Pixel
   // YOffset=Vertikaler Offset in Bytes
   AP:=P;
   Inc(AP,XOffset);
   Inc(NativeInt(AP),YOffset);
   with AP^, Delta do begin
      Blue:=EnsureRange(Blue+B*Factor shr 4,0,255);
      Green:=EnsureRange(Green+G*Factor shr 4,0,255);
      Red:=EnsureRange(Red+R*Factor shr 4,0,255);
   end;
end;
My speed shows that it is faster by 18% in Win32 and 29% on Win64 !! do it please, it is not slower by 200%, and these values as positive so no problem here.
also if you look at the generated assembly code, then this is it
2023-10-20-12_59_07-untitled-paint.png

Also try this
Code:
  procedure SetPixel(XOffset, YOffset, Factor: NativeInt);
  var
    AP: TPBGR;
    v: NativeInt;
  begin
    AP := P;
    Inc(AP, XOffset);
    Inc(NativeInt(AP), YOffset);
    with AP^, Delta do
    begin
      v := Blue + B * Factor shr 4;
      if v < 0 then
        Blue := 0
      else if v > 255 then
        Blue := 255
      else
        Blue := v;

      v := Green + G * Factor shr 4;
      if v < 0 then
        Green := 0
      else if v > 255 then
        Green := 255
      else
        Green := v;

      v := Red + R * Factor shr 4;
      if v < 0 then
        Red := 0
      else if v > 255 then
        Red := 255
      else
        Red := v;
    end;
  end;
The above function makes the whole process around double the speed, for both platform .

and again not saying that i know it all, i do mistakes, but not in this case, would love to be proven wrong, but with factual code done right not with you assumptions based on something you didn't see for sure.

My test is attached here FloydSteinberg.zip and hope it is working unlike the above attached project as it is empty.

As for "with" the compiler might fail to generate nice assembly and will revert to shuffle the data and access them continuously on the stack introducing unneeded memory access, this happen with complex loops also, with "with" it in many case will resolve the pointer and reused it from a register, alas it seems no gain in the above mentioned function, but once the function have few more local variables and it will go 90s turbo pascal mode specially in x64 platforms, i don't have the mood to sit and tweak such case for you now, but the effect is there.

ps re-reading before posting this, i sound retarded and offended, and i am sorry for that, i don't mean to offend anyone and never meant to, just had very bad experience from an neighbor forum and trying to be triggered by personal sentences.
Kas

Geändert von Kas Ob. (20. Okt 2023 um 13:19 Uhr)
  Mit Zitat antworten Zitat