1. SIMT architecture of GPUs
    GPUs consists of tens to hundreds of SIMD or Vector Units that process multiple threads in multiple Warps or Wavefronts in SIMT fashion.
  2. Memory architecture of GPUs
    To hide latency of Global Memory (VRAM) GPUs can run multiple Warps or Wavefronts and prefer to do computation by the use of Local or Private Memory. So, the more Work-Items and Work-Groups you run to hide latency, the less Local and Private Memory per thread will be available.
  3. Thousands of threads on GPUs
    MiniMax search with Alpha-Beta pruning performs best serial, not parallel.


* edit on 2015-03-30 *