Wäre eine OpenCL/CUDA Lösung viel schneller?
I really don't have any experience with OpenCL, but used CUDA few times, I think it will be faster by 1000x-10000x times than Delphi code to calculate such vk'.
While CUDA is limited to nVidia, OpenCL does support AMD and Intel GPU's, the overhead of OpenCL is minimum, the wide range of compatibility and portability and its highly optimized code will make the selection between them easier.