Evaluation Improvements

This is this the second post in a series intended to explain the Zurichess chess engine internals. Machine learning is not my expertise so let me know of any blatant mistakes. Comments are welcome.

There are lots of experiments trying to build an evaluation function using a deeper network (e.g. Giraffe, DeepChess) and I also want Zurichess improve its network.

The current model was explained here. After some experiments I found that the following network gives lower loss:

where $x$ is the input features vector of shape $[1, n]$, $W_i$ is the input weights vector of shape $[n, k]$, CReLU is a Concatenated REctifier Linear Unit, which is similar to a ReLU but keeps both negative and positive weights, and $W_o$ is outer layer weights of shape $[2 \cdot k, 1]$. The final result has shape $[1, 1]$.

One modification of the input features is that now they include the total number of pieces on the board, so the network can figure out the phase of the game by itself.

I chose k = 4, resulting in 2x more weights versus the previous network. How good is the new network?

Error Old Network New Network
Train 0.05731946 0.05587117
Validation 0.05754193 0.05608533

Similarity, on a set of 200K quiet positions, the new network evaluates improved on 41726 positions and regressed on 33756 positions.

Unfortunately in game play the new evaluation function performs worse.

Score of ./zurichess vs ./master:
180–238–291 [0.459] 709
Elo difference: -28.49 +/- 19.65
SPRT: llr -1.89, lbound -1.87, ubound 3.34 - H0 was accepted


Several reasons:

1. The training set is not good enough, often loss improvement doesn’t not lead to Elo improvement.
2. The new network is a bit slower to compute. Search can correct many eval mistakes, so a more expansive eval leads to shallower search.
3. The search is heavily tuned towards the old evaluation function (e.g. futility pruning weights are harder to compute now).