Zeta Chess

Zeta v099o

Zeta v099o released as source and Linux/Windows x86-64 binary, still with HCE, handcrafted eval, the next step would be to implement NNUE neural networks.

GitHub:

https://github.com/smatovic/Zeta/releases

Alternative downloads:

https://zeta-chess.app26.de/downloads/

Please consider the README file or --help option before running the engine.

From the release notes:

Zeta (099o) alpha

  * switch to 64-bit Kogge-Stone, generalized and vectorized, move gen again
  * removed ABDADA parallel search
* activated RMO parallel search
* switch to MIT license
* * Zeta 099o on Intel HD 530, 1 worker, ~30 Knps * Zeta 099o on Intel HD 530, 24 workers, ~550 Knps * Zeta 099o on Nvidia GeForce GTX 750, 1 worker, ~50 Knps * Zeta 099o on Nvidia GeForce GTX 750, 16 workers, ~850 Knps * Zeta 099o on AMD Radeon HD 8570, 1 worker, ~22 Knps * Zeta 099o on AMD Radeon HD 8570, 48 workers, ~900 Knps -- Srdja Matovic 28 May 2022

Maschina...

Hmm, I thought I was done with collecting GPU architectures and could move on to cloud computing in 2019 for further development, but it did not work out, meanwhile you have to talk through the sales department of all big cloud players before you get access to their GPUs, and, mostly they offer only Nvidia server brands to rent, no AMD, no Intel. Hence I did set up a lil Maschina again for GPGPU development. Intel i5-6500 (Skylake 14nm with AVX2 from 2015) with HD 530 graphics, Nvidia GeForce GTX 750 (Maxwell arch) and have yet to purchase some AMD GCN, maybe an used Radeon HD 7750 or even HD 8570. These are all outdated and low-end GPUs, but meanwhile I have enough benchmarks to inter/extrapolate Zeta's performance across different models and architectures, I just need some entry-level hardware to test some ideas, it should then scale up on newer and bigger models...

Zeta v099 - FP32 - dummy benchmarks

I implemented some dummy 10x12 mailbox floating-point move generation with 32 parallel gpu-threads and get only 2x speedup for an unoptimized approach compared to 64 gpu-threads Bitboards, too lil, it does not pay off for me to explore that branch any further.

So I am stuck on ~100 Knps per worker with Zeta v099 with up to 320 workers on current gpu architectures.

Zeta - Trick 17

I will give the Zeta v099 approach one more time a try, apply programmer's trick 17, develop on old and outdated architecture. If my approach runs on the Nvidia 8800 GT, then it will run also on newer architectures with more beef.

With pen n paper I get an x2 speedup for switching from 64 gpu-threads square-wise to 32 gpu-threads piece-wise worker, and a further x2 speedup for switching from 64-bit integer Bitboards to 32-bit floats for the board representation and move generation. I still could not figure a vectorized 0x88 board representation for uchar4 8-bit move generation.

If I apply trick 17 more strict, I would have to run the 'one-thread-one-board' approach on the 8800 GT, with some thousands, independent gpu-threads, with up to 164K threads on newer architectures like the AMD Fury X. But since NNUE and upcoming NNOM this approach does not fit anymore, one single thread does not have enough beef to compute the new neural networks alone, meanwhile I have to couple multiple threads together to work in parallel on the same node for evaluation.

Home - Top