In researching my recent blog entry on raytracing, I found a sweet elegance that I always look for in computer architectures. The algorithms are pretty straightforward, albeit pretty compute intensive, so the barrier to entry into this area seems low enough that I can work on it when I get a chance. It also looks to benefit immensely from parallel processing, another interest of mine, and will get me into that area as well. That, and the demos I saw showed some wicked shadow affects that really added to the realism of scenes, so it'll be cool to show off as a CDT demo as well (as opposed to the spinning polygons I use as an SDL/OpenGL demo right now that you may have seen at ESC).
My first step was to build a vector class that does math with 3D vectors, a critical component of all graphics programming. The sample I was looking at used regular C++ floating point arithmetic with a vector composed of a float for each of the three axis.
class vector {
public:
vector(float _x, float _y, float _z)
: x(_x), y(_y), z(_z) { }
void operator +=(const vector & v) {
x += v.x; y += v.y; z += v.z;
}
private:
float x, y, z;
};
Pretty basic. But this is the first example of an algorithm that can benefit from parallelism. Since I have a fairly new laptop, I wondered if I could leverage SSE, Streaming SIMD Extensions to implement this. I also wondered how well gcc and the MinGW variant I'm using handles SSE. So I gave it a try.
class vector {
public:
vector(float _x, float _y, float _z) {
float array[4] __attribute__((aligned(16)))
= { x, y, z, 1 };
xyz = _mm_load_ps(array);
}
void operator +=(const vector & v) {
xyz += v.xyz;
}
private:
__m128 xyz;
}
The constructor is a bit more complicated. And with most things dealing with SSE, 16 byte alignment is critical for good performance. And looking at the generated assembly, I was pleased to see that gcc, after making sure I put the -msse2 option on the compile, worked hard at keeping the instances of __m128 aligned like that. The performance tests I ran with addition showed an O.K improvement in performance, especially as the number of math operations grew. But when I tried multiplication instead of addition, the performance gains were astronomical. Well worth the extra typing.
Now that I've got that under my belt, I can't wait to actually draw something...


