Zeta - v098 revisited
To make it short, Zeta v097 and v098 make excessive use of the slower Global Memory, therefore the computed nodes per second scale linear with the memory bandwidth and not with the amount of cores or their clock rates.
To make it short, Zeta v097 and v098 make excessive use of the slower Global Memory, therefore the computed nodes per second scale linear with the memory bandwidth and not with the amount of cores or their clock rates.
Here some alternative algorithms to plain MiniMax AlphaBeta search...
* edit on 2015-03-30 *
* updated on 2022-03-24 *
Here an overview of what happened before....
Zeta (099o)
* switch to 64-bit Kogge-Stone, generalized and vectorized, move gen again
*
* Zeta 099o on Intel HD 530, 1 worker, ~30 Knps
* Zeta 099o on Intel HD 530, 24 workers, ~550 Knps
* Zeta 099o on Nvidia GeForce GTX 750, 1 worker, ~50 Knps
* Zeta 099o on Nvidia GeForce GTX 750, 16 workers, ~850 Knps
* Zeta 099o on AMD Radeon HD 8570, 1 worker, ~22 Knps
* Zeta 099o on AMD Radeon HD 8570, 48 workers, ~900 Knps
-- Srdja Matovic 28 May 2022
Zeta (099n)
* removed ABDADA parallel search
* activated RMO parallel search
* switch to MIT license
-- Srdja Matovic Sep 2021
Zeta (099m)
* patch for ABDADA parallel search
* disabled RMO parallel search
* removed max device memory limitation
* mods in time control
* cleanups
*
* Zeta 099m on Nvidia V100, 160 workers, ~ 13.5 Mnps
* Zeta 099m on Nvidia V100, 1 worker, ~ 85 Knps
-- Srdja Matovic 13 Jul 2019
Zeta (099l)
* patch for parallel search scaling
* max device memory increased from 1 GB to 16 GB
-- Srdja Matovic Jun 2019
Zeta (099h to 099k)
* fixes n cleanups
* switch from Lazy SMP to ABDADA parallel search
* added IID - Internal Iterative Deepening
* one .cl file for all gpu generations with inlined optimizations
*
* Zeta 099k on AMD Radeon R9 Fury X, 256 workers, ~ 7.6 Mnps
* Zeta 099k on Nvidia GeForce GTX 750, 16 workers, ~ 800 Knps
* Zeta 099k on AMD Radeon HD 7750, 32 workers, ~ 700 Knps
* Zeta 099k on Nvidia GeForce 8800 GT, 14 workers, ~ 110 Knps
-- Srdja Matovic 2018
Zeta (099b to 099g)
* switch from Kogge-Stone based move generation to Dumb7Fill
* added atomic features for different gpu generations
-- Srdja Matovic 2017
Zeta (099a)
* switch from BestFirstMiniMax-Search to parallel AlphaBeta (Lazy SMP)
* ported all (except IID) search techniques from Zeta Dva v0305 to OpenCL
* ported the evaluation function of Zeta Dva v0305 to OpenCL
* vectorized and generalized 64-bit Kogge-Stone move generator
* 64 gpu-threads are now coupled to one worker, performing move generation,
move picking and evaluation, square-wise, in parallel on the same node
* portability over performance, should run on the very first GPUs with
OpenCL 1.x support (>= 2008)
-- Srdja Matovic 2017
Zeta (098d to 098g)
* mostly cleanup and fixes
* restored simple heuristics from Zeta Dva (~2000 Elo on CCRL) engine
* protocol fixes
* fixed autoconfig for AMD gpus
* switched to Kogge-Stone based move generator
* switched to rotate left based Zobrist hashes
* switched to move picker
* switched to GPL >= 2
*
* Zeta 098e on Nvidia GeForce GTX 580, ca. 6 Mnps, est. 1800 Elo on CCRL
* Zeta 098e on AMD Radeon HD 7750, ca. 1 Mnps
* Zeta 098e on AMD Phenom X4, ca. 1 Mnps
* Zeta 098e on Nvidia GeForce 8800 GT, ca. 500 Knps
-- Srdja Matovic 2016
Zeta (098a to 098c)
* improved heuristics, partly ported from the Stockfish chess engine
* AutoConfig for OpenCL devices
* parameter tuning
* Zeta 098c on Nvidia GeForce GTX 480, ca. 5 Mnps, est. 2000 Elo on CCRL
* Zeta 098c on AMD Radeon R9 290, ca. 3.2 Mnps
-- Srdja Matovic Aug 2013
Zeta (097a to 097z)
* implementation of an BestFirstMiniMax-Search algorithm with UCT parameters
for parallelization
* Zeta 097x on Nvidia GeForce GTX 480, ca. 5 Mnps, est. 1800 Elo on CCRL
* Zeta 097x on AMD Radeon HD 7750, ca. 800 Knps
-- Srdja Matovic Jan 2013
Zeta (0930 to 0960)
* tested LIFO-stack based load balancing for AlphaBeta search on one compute
unit of the GPU
* tested Monte Carlo Tree Search without UCT across multiple compute units of
the GPU
* tested 'Nagging' and 'Spam' parallelization, the multi-window approach,
for AlphaBeta search on one compute unit of the GPU
* tested 'RBFMS', Randomized BestFirstMiniMax-Search, a parallel version of
BestFirstMiniMax, across multiple compute units of the GPU
* failed to implement YBWC parallel AlphaBeta
* failed to implement Conspiracy Numbers Search
-- Srdja Matovic 2012
Zeta (0915 to 0918)
* 64-bit Magic Bitboard move generator running
* AlphaBeta search algorithm with 'SPPS'-parallelization approach plays chess,
running 128 gpu-threads on one compute unit of the GPU
-- Srdja Matovic 2011
Zeta (0900 to 0910)
* tested 32-bit 0x88 and 64-bit Magic Bitboard move generator
* ported heuristics, the evaluation function, from CPU engine 'Zeta Dva'
(~2000 Elo on CCRL) to OpenCL
-- Srdja Matovic 2010
* updated on 2022-05-28 *
Zeta and Zeta Dva support only some basic Xboard protocol commands and some users have reported problems with the configuration and interface of the last Zeta versions. So i will publish the source code again when these parts are more user friendly designed and tested for Windows Chess-GUIs like Winboard or Arena.