I got everything I need on paper:

  • Board Representation: Quad-Bitboards
  • Move Generation: vectorized Kogge-Stone
  • Search Algorithm: RMO - parallel AlphaBeta
  • Selective Search: NNOM++*
  • Evaluation: SF 13 NNUE**
  • Parallel Layers: direction-wise, square-wise, NN-wise**, AB-worker-wise, PV-Splitting-device-wise

now to find some time to implement this blueprint, first on CPU+AVX2 then on GPU.

*NNOM++ - Move Ordering Neural Networks: use SF 13 NNUE for selective search in non-QS search.

** still need to figure how to use 64 gpu threads of a worker during NNUE inference.