I got everything I need on paper:
- Board Representation: Quad-Bitboards
- Move Generation: vectorized Kogge-Stone
- Search Algorithm: RMO - parallel AlphaBeta
- Selective Search: NNOM++*
- Evaluation: SF 13 NNUE**
- Parallel Layers: direction-wise, square-wise, NN-wise**, AB-worker-wise, PV-Splitting-device-wise
now to find some time to implement this blueprint, first on CPU+AVX2 then on GPU.
*NNOM++ - Move Ordering Neural Networks: use SF 13 NNUE for selective search in non-QS search.
** still need to figure how to use 64 gpu threads of a worker during NNUE inference.