Zeta Chess

Zeta with NNUE on GPU?

I think it is possible to add the new neural network technique 'NNUE' to Zeta for upcoming GPU architectures like Nvidia Lovelace, Intel Xe and AMD RDNA3 which probably will all have support of INT8, 8 bit integer, math with higher throughput and maybe some 10 to 20 MB L3 cache per SIMD unit for the network weights file.

With INT8 optimized datatypes and instructions, one could build an vectorized 8 bit 0x88 move generator which operates over the 8 directions as vector and with 32 parallel gpu threads of one SIMD unit handles all pieces at once. Maybe reaching 1 to 2 million nodes per second per SIMD unit in an Zeta engine like framework.

With 32 SIMD gpu threads performing 32xFP32 or 32x2xFP16 operations per clock the NNUE inference performance could be 2 to 4 times faster than with current NNUE on CPUs with AVX2 (roughly estimated), considering a switch from integer to float weights.

Volta/Turing/Ampere have currently 16 cores per FPU SIMD and support doubled throughput for FP16 operations, I guess Nvidia will move with Lovelace back again to an 32 core per SIMD design with unified INT/FP16 cores. RDNA has 32 cores per SIMD, also with doubled throughput for FP16. Intel seems to use SIMD8 with 8 FP cores for its Xe GPU (with support for higher throughput for lower precision), maybe Intel will also add some kind of SIMD32, to couple 4 EUs to one compute unit.


  • up to 2 Mnps per SIMD unit possible
  • up to 4x faster inference for NNUE possible
  • up to 160 parallel workers (SIMD units) on current highend-gpus

again, just some rough numbers estimated, big grain of salt and alike...

if the above all holds, then you get a hell of NNUE monster on highend-gpus.

Zeta v099 already has an simple AB framework implemented with ABDADA or as option RMO Lazy SMP parallel search across SIMD units, hence the main part would then be to implement all those funny search extensions and tricks Stockfish does in an iterative way in Zeta for GPU - full time job ;)


I wrote 10 to 20 MB L3 cache per SIMD unit, assuming the whole net should fit in cache, doubt that this is common practice with NNUE on CPU, maybe the first layer with most of the weights resists in RAM for the incremental updates, and the further layers only get cached? Dunno.

2021-04-12 Followup:

  • I mixed up NNUE first layer INT16 and further INT8 weights, so the possible 4x inference speedup holds only if we assume 8-bit vector packed math on gpu.
  • I was not able to implement an efficient 8-bit 0x88 vector-based board representation on pen n paper, hence no 8-bit speedup for move gen in sight.

  • Even if I keep the current v099 bitboard design a switch to 32 gpu-threads piece-wise worker may pay off with certain architecture improvements of AMD's RDNA and increasing gpu clocks in mind.

Neural Networks on GPU

Currently there is much going on with neural networks for chess. With GiraffeAlphaZero, and its open source adaptation LC0 (Leela Chess Zero), it was shown that, with enough horse power, artificial neural networks are competitive in computer chess.

Currently LC0 uses an MCTS, Monte-Carlo Tree Search, approach with GPU as neural network accelerator for position evaluation.

My own experiments showed that AlphaBeta search is superior to MCTS, but current GPU architectures suffer from host-device latency, so you have to couple tasks to batches to be executed in one run on the GPU, not that conform with the serial nature of AlphaBeta.

With upcoming GPGPU architectures (or ANN accelerators) with less latency there might be AlphaBeta ANN engines possible...

Google's AlphaGo Deepmind and Chess Giraffe

It was in the news, Google's AlphaGo won against the European Champion Fan Hui in the game of GO...another frontier is fallen to computer domination.

The question if such an attempt with deep neural networks works also for chess was answered by Matthew Lai in his Master Thesis with his chess engine Giraffe, which reached the level of an FIDE International Master (about 2400 Elo), an astounding achievement considering only 4 month of work....

...so, when are we going to see AlphaChess Mr. Lai? :-)


Giraffe: Using Deep Reinforcement Learning to Play Chess by Matthew Lai, 2015

Mastering the Game of Go with Deep Neural Networks and Tree Search by Google Deepmind, 2016

Learning to Play the Game of Chess by Sebastian Thrun, 1995

NeuroChess by Sebastian Thrun on CPW

Home - Top