Wednesday, December 21, 2022

Vector synchronicity

As I was looking around for performance-enhancements as computers are supposed to do, I stumbled onto the latest .NET upgrades with vectors.

Around about 2007 we've been getting warnings about how Moore's Law is over for raw chipsets and how we should be turning our thoughts toward parallel-processing instead. That async/await stuff you see in Node.js came out of this. Lately there's been a .NET upgrade for us VS2022 enjoyers; here is the rundown. We learn here about Single Instruction Multiple Data.

Vectors have been around awhile; in my Newton simulators I'd used 2015's Vector3 for, er, vector arithmetic. This was System.Numerics because I didn't trust that the free Unity engine wouldn't be yanked from me.

I mean, vectors of very small amounts could always be mocked-up; just do bitwise arithmetic with masking (bro). That is, if you wanted "for i from 1 to 16 do this to each(i)" you do "for i from 1 to 4 do this to a vector of 4 each(i)". I don't know if they'd said "four lanes" then, but they do now. With classic Microsoft 256-bit bytes, that's not so bad for (say) colour. In practice, you run out of size; a 16bit 386 processor got you two lanes of these bytes at a time. Hence why the colours were so cartoony up to the middle 1990s. 32bit would at least get you that colour-processing, to the limit of your CRT monitor, maybe even to tetrachromacy. But we're still not getting the parallel decimal-maths we wanted - even up to 64bit.

What I didn't know, and should have, is that in a multiprocessing world, you can enlist four (say) processors at once. You are no longer stuck with the bit-depth limitations. Vectors can be processed in the CPU now! Another fine example: the Mandelbrot, which requires independent processing upon each pixel in your window. Another example would likely be Newton upon each interaction in a n body system. Cuts time to a quarter...

...if, that is, you have those four cores on hand. .NET claims version 7 can tell how many cores will be on hand. Although if you are not a noob (I am a noob) you'll still be fumbling with the binary-arithmetic and maybe even running "unsafe" C#. For best results with unsafety please use RUST.

It's a bit like that Newton-Raphson unrolling we'd seen back in Quake III's rsqrt - and in fact should start as an unrolling. Except: Newton-Raphson is iterative (even if only iterated once). Where not iterative you can do your calcs in parallel, and should.

No comments:

Post a Comment