Slonik learns chess from self-play

Need for speed

Recently I wrote about my chess engine Slonik. It is able to play chess autonomously, without human guidance, and she is probably the best playing Python chess engine in existence. However, it is still not very strong by today’s standard. A rough estimate places it somewhere around 1500 ELO, which is “chess club player” strength. It’s greatest limitation is it’s speed. Most chess engines today are written in C++, a language optimized for speed. From the starting chess position, just generating all the legal positions possible five half-moves deep (4.8M positions), Slonik takes 58 seconds, so about 82K positions per second. Currently I am working on a C++ port of Slonik. Under the same conditions, C++ Slonik runs the same computation in a tenth of a second, at 4.6M positions per second. That’s over 500 times faster! The primary driver for this improvement was to make bigger strides in Slonik’s self learning experiments.

Artificial Intelligence

One of the other major goals mentioned in my original post which introduced Slonik to the world, was to have her learn from her own games. The original version of Slonik, like nearly every chess program in existence today, ran a hand-coded chess position evaluation function. Given a chess position, it outputs a number, indicating which side it thinks is better and by how much. Chess engines today differentiate themselves by implementation/performance/design differences, but more importantly, by how well they’ve taught their engine chess in the position evaluation function. Improvements to the chess knowledge, from version to version of the chess engine, are the primary means by which the chess engine improves. These improvements require an incredible amount of human effort, and more importantly are limited by the imagination of the engine author.

In September 2015, a new chess engine came out, called Giraffe, which learned to play for it’s own games. It is written in C++, like most chess engines, but unlike the others, it does not have a hand-written position evaluation function. Instead, it used a neural network and after starting from a somewhat initialized state (wherein it was fed labelled data teaching it only the approximate value of the pieces), it played games against itself for a week and reached about 2400 ELO strength. An engine that learns autonomously should be able to supersede the others. Giraffe was held back by performance (neural nets take longer to call than the optimized linear functions of other top engines).

One of my goals since my last post was to implement these ideas in Slonik. Inspired by Giraffe and eager to learn, I read Sutton and Barto’s Reinforcement Learning: An Introduction and even had some correspondence with Dr. Sutton (nice guy!), and then got to work on putting these ideas into practice on Slonik. To try to improve upon Giraffe’s ideas, I’ve experimented with various potential improvements, like Double Learning (two neural networks learning side by side, helping each other to improve and reducing bias) and prioritized sweeps (revisiting and giving priority to positions that were difficult, whilst adjusting for the artificial modification of the distribution of positions seen by the engine). Many of the ideas did not pan out, but some did, and some may still work yet when revisited (combined with other changes since). I have more ideas that I have yet to try. Besides the differences in architectures that I have tried, one major difference in Slonik that made a big improvement in self-learning was through the use of Huber loss, rather than the L1 loss used in Giraffe. These loss functions are an attempt to deal with the very frequent occurrence of outliers in the self-udpates arising during the learning process. The other major improvement was in the Reinforcement Learning algorithm itself. There is an interesting little story associated…

Slonik improvement iterations vs score graph
Slonik improving in chess playing strength over time through self-play. STS score on y axis. Starting from random initialization at approximately 1500 STS score (2 ply look-ahead search per position). Slonik is still playing and learning, this is a current snapshot as of the writing of this post.

Forbidden algorithms

Giraffe used an algorithm called TD-Leaf for self learning, which assumes that the evaluation of the successive positions (more accurately, the evaluation of the leaf node of the principal variation starting from the root position of the next move) is better than the corresponding evaluation one move prior. The “TD”, or temporal difference, part of TD-leaf makes the assumption that, in the backwards view, moves immediately prior are more to blame (or credit) for the change in evaluation of the target node, than moves still further prior. At some point after the recent AlphaGo success, I was looking into the work of David Silver, lead architect of AlphaGo, and discovered that he had been involved, in 2008, in the development of a revolutionary (but probably overlooked) self-learning chess program called Meep by Joel Veness, and was the first to achieve a high level of play (2300s ELO) in chess, entirely through self-play. Unlike Giraffe, Meep did not use a neural network, but a linear model, since linear models are faster to execute and are known to be sufficient for top-level play in chess. Furthermore, Meep did not use TD-Leaf, but a more efficient learning algorithm, which they called TreeStrap. TreeStrap differs from TD-Leaf by using each position’s search tree for updates, rather than the single evaluation of the next move.

I had some success with self-learning in Slonik using TD-Leaf, but when I read the TreeStrap paper, I was in awe. If this algorithm is as good as it sounds (and it sounds very good), then Giraffe should have been using this algorithm instead. I went ahead and replaced TD-leaf with TreeStrap in Slonik and the learning speed sky-rocketed. Why didn’t Giraffe use TreeStrap? Did Matthew Lai, the author of Giraffe, not know about it? Since this is my first major C++ project, I sometimes peruse the source code of Stockfish and Giraffe for implementation tips and ideas. About 3 weeks ago, I noticed an eyebrow raising comment in the Giraffe source code. There was an unused field on a class, with a comment that said “during TreeStrap we have to record write to the ttable”. Wait… so Matthew Lai did know about TreeStrap?

Then I realized what probably transpired. In a forum post on talkchess, Matthew Lai announced that he was hired by Google DeepMind, due to his awesome success on Giraffe (and his impressive background helped, no doubt — have a look at his resume), and that due to his work there, he can’t continue to improve his open-source engine Giraffe. He learns many techniques during his day job at Google and the divide between open knowledge and trade secrets is muddy. Matthew was hired there in December 2015. A little bit of investigating in the Giraffe source repository shows that the TreeStrap comment was added in that same month. I can see it in my head now — Matthew being interviewed by David Silver where David tells him about his work on Meep a whole 7 years earlier! It appears to me that Matthew started implementing TreeStrap, then learned he couldn’t use those ideas, and a forgotten remnant remained in the source. Well, it’s all speculation, but if that’s what happened, maybe Matthew can use TreeStrap in Giraffe now that someone else is using it openly. The white paper on TreeStrap is, after all, linked from David Silver’s personal page. However, probably that won’t happen, as Matthew Lai wrote he has other ideas and pet projects he’d rather spend his time on, and besides, there may be other ideas in the private Google hive mind that are even more performant. I suspect that this idea of learning from it’s own search was probably used in AlphaGo’s UCT routine, but I’m patiently waiting for Google to release the details of the techniques used in the recent success against world champion Ke Jie. In the meantime, I continue to work on Slonik (primarily the C++ port, currently). I have many ideas I want to try, including what I hope will be an improvement to the TreeStrap algorithm!

My own chess engine

I’ve written a chess engine named Slonik. It implements the Universal Chess Interface (UCI), so you can download any popular chess interface, like Scid vs Pc. or Chessbase, to analyze with or play against Slonik.

I’ve written this engine from scratch, and chose to write it in Python, so that I can iterate quickly. That makes the engine slower, but maybe one day I will port it to C++. However, I am happy with it’s playing strength, all considering. The details of the engine are on the github page, but to summarize:

  • Alpha-beta minimax, quiescence search
  • Bitboard piece/board representation
  • Various search heuristics, such as the history heuristic, extensions, reductions, etc.
  • Hand-coded evaluation function
  • Transposition hash table

I plan to return to working on this engine’s AI — specifically to use deep learning and reinforcement learning techniques rather than the current hand-coded evaluation function.

Deriving the Y-Combinator

The Y Combiantor is one of the most confusing pieces of code I’ve ever seen.

const Y = (f) =>
  ( x => f(v => x(x)(v)) )(
    x => f(v => x(x)(v))

Used as such:

const factorial = Y(function (fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(n - 1));
  //=> 120

Of course, that’s cheating. Intended use is like this:

((f) =>
  ( x => f(v => x(x)(v)) )(
    x => f(v => x(x)(v))
  )(function (fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(n - 1));
  //=> 120

because it’s harder to understand that way. But really, it’s because the goal of the Y Combinator is to do recursion without naming any functions. It has no practical use, but it has theoretical importance, and it’s appears to be a good way to sink your time if you’re looking for a good puzzle. I thought to myself, could I derive it, if I hadn’t seen this solution before?

We’re trying to achieve the equivalent of this, without naming the function, of course:

function fac (n) {
  return (n == 0 ? 1 : n * fac(n - 1));

A first pass:

(function (fac, n) {
  return (n == 0 ? 1 : n * fac(fac, n - 1));
}(function (fac, n) {
  return (n == 0 ? 1 : n * fac(fac, n - 1));
}, 5));
  //=> 120

That accomplishes it, but that’s silly. The second pass is much better:

(function (fac) {
  return function (n) {
    return fac(fac)(n);
}(function(fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(fac)(n - 1));
  //=> 120

That works, so I guess that qualifies as a solution, but it has a big flaw that the canonical Y Combinator doesn’t – we have to call fac(fac)(n-1) in the exposed function instead of just fac(n-1). Bad.. very bad… But wait! That function taking n is the same as the one in the Y Combinator. Or is it? I got stuck here for a while, and once I realized I’d started talking to myself, I decided to try a different approach.

Time to cheat a little bit, with a named function. Too much lambda here, hard to understand.

(function (wrapper) {
  function injector() {
    return wrapper(n => injector()(n));
  return injector();
}(function(fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(n - 1));
  //=> 120

This idea took a bit of a leap, but it turns out this is exactly what the Y Combinator is doing, in a more sane, readable way of course. I named the function injector, because it is injecting a function into the fac parameter of the wrapper function. An alternative name might have been recur or runner, because the function it’s injecting does those things. This makes things clearer. But I can’t use named functions. Let’s translate this back to crazy.

Now I can use the idea from the 2nd pass of passing the function to itself:

(function (wrapper) {
  (function (injector) {
    return wrapper(n => injector(injector)(n));
  }(function (injector) {
    return wrapper(n => injector(injector)(n));
}(function(fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(n - 1));
  //=> 120

Yikes. injector(injector) looks crazy, but that’s just what we have to do when we use the pass-function-to-self-trick. After all, it is a function passing itself to itself, so if you want to use it, you have to have it call itself. Makes perfect sense, how did I not see it before (/sarcasm).

Anyway, time to reap the benefits. This is exactly the Y Combinator! To finish it off, and for absolute correctness, we have to rewrite it to make it as obscure as possible, with less descriptive names.

var result = (f =>
 (x => f(n => x(x)(n)))
 (x => f(n => x(x)(n))))
(fac => n => (n == 0 ? 1 : n * fac(n - 1)))(5);
  //=> 120

There, perfect.

Javascript: generators for async

I stumbled across this article by James Long (@jlongster) on using generators for async as an alternative to async/await. His Q.async example, using Kris Kowal’s (@kriskowal) Q library, caught my eye and I decided to try implementing the async function without peeking at Q’s code. It took me a little while to come up with this solution, but the result is pleasingly simple!

async accepts a generator and returns a function that returns a promise. The yielded expressions in the generator are the fulfilled values. This allows running asynchronous code in synchronous style, without using async/await. Here’s a simple example of how it’s used:

Notice that you can yield promises or plain values, it doesn’t matter. Also, this supports using try/catch with asynchronous calls. Pretty cool!

For comparison, here is the Q.async implementation. After writing this up I also found the solution here (essentially same as Q.async). It differs from mine in that I’m not using an explicit try/catch to turn the synchronous error into an asynchronous one, but it happens implicitly due to being wrapped in a Promise, so I think the result/behavior is the same.

Firefox add-on released

I recently released my first Firefox add-on. I’m always copy-pasting words from the browser into my terminal to look them up with dict client. It’s simply the best dictionary tool I’ve ever used, because it looks up words from many dictionaries at once. You can find out more about DICT here and here.

It looks up words by calling dict client on your machine in a sub-process. It can also automatically save words you look up into a list to review later. Double click a word on any web page, and the extension will spawn a process to call dict and then display the output in a popup. As an added feature, the words you look up are automatically added to a list for later review.

Screenshot of dict-extension in use

This addon does not actually implement the DICT protocol, nor call any DICT servers on it’s own. It delegates that entirely to the dict client on your machine. As an alternative to my add-on, this extension is quite good and does implement the DICT protocol, if that is what you are looking for.

However, I suspect the above mentioned add-on, which does implement a DICT client, may stop working at some point relatively soon, because like many useful add-ons, it uses require('chrome'), and Mozilla is doing away with the add-on SDK and many of it’s low level APIs. A lot of developers are understandably upset about that. I was going to implement it as well, but due to these plans by Mozilla, I decided to stay away, as there’s currently no way to do it without using chrome and the low-level APIs.

I think that my add-on will only work on Linux, and maybe on Mac (though I have only tested on my machine), as you must have the dict client installed for it to work.

You can install the add-on from the Firefox add-on listing page. The code is also hosted on github.

WordPress backup script

In my previous post I showed my WordPress update script. However, it’s not safe to update without first backing everything up in case something goes wrong. This is a script that I adapted from this post. It backs up both files and the database.


echo "In $0"

if [ $# -gt 0 ]; then
    NOW=$(date +"%Y-%m-%d-%H%M")



# WWW_TRANSFORM='s,^home/public/blog,www,'
# DB_TRANSFORM='s,^home/private/backups,database,'

# tar -cvf $BACKUP_DIR/$FILE --transform $WWW_TRANSFORM $WWW_DIR

mysqldump --host=$DB_HOST -u$DB_USER -p$DB_PASS $DB_NAME > $BACKUP_DIR/$DB_FILE

# tar --append --file=$BACKUP_DIR/$FILE --transform $DB_TRANSFORM $BACKUP_DIR/$DB_FILE

You may have noticed that there is a commented out version of the tar transform variable and command. My host has a version of tar (bsdtar 2.8.5) that doesn’t have the --transform option, but does have an alternative -s option that does more or less the same thing. The idea is that the backup will have directory stucture backup/file.php rather than /home/public/blog/file.php for example.

mysqldump has many options you can pass it, which you may want to look into. However, the option --opt is a default, and does what I want. It is probably good enough for most sites. The problem with --opt is that it requires locking the table during the export, which also has implications on permissions required for your backup user. What backup user? Well, since you are storing the DB user and password in plain text in your script, you should not use your administrator user. It’s best to create a backup user with minimal permissions necessary to do the backup. Ideally that would be just SELECT privileges, but with the mentioned --opt option, LOCK TABLES privileges are required too. Here’s how you’d set that user up:

MySQL> CREATE USER backup IDENTIFIED BY 'randompassword';

I call the above script from a cron job on my local computer:


# Exit if any command fails
set -e 
# Don't allow use of unintialized variables
set -u 

# Set up some variables
NOW=$(date +"%Y-%m-%d-%H%M")

# Redirect standard output and error output to a log file.
exec > >(tee -a "${LOG_DIR}/${LOG_FILE}")
exec 2> >(tee -a "${LOG_DIR}/${LOG_FILE}" >&2)

mkdir -p $LOG_DIR

# The cool part: Run my local on the remote web server.
ssh maksle 'bash -s' < ~/bin/ $NOW

# Sync the remote server backup logs with the backups directory on my local machine. After all, what good are backups if your webserver is down and you can't access them?
rsync -havz --stats maksle:/home/private/backups/ $BACKUP_DIR

Of course, the remote server can get filled up with backups, so I have another script that removes any backups more than 5 days old. I continue to have as many as far back as I want on my local machine.


set -e
set -u

# Error out if a command in a pipe fails
set -o pipefail

# Usage example:
# /home/private/backups 5


# This would be 5 if called as in the Usage example 
declare -i allow=$2
# This gets the number of files in the directory, which we assume are all backup tgz files
declare -i num=$(ls | wc -l)

if [ $num -gt $allow ]; then
    # Remove all but latest files
    (ls -t | head -n $allow; ls) | sort | uniq -u | sed -e 's,.*,"&",g' | xargs rm -f

The above command works by first printing the latest 5 files, and then all the files. This way the latest 5 files get printed twice. This allows uniq -u to filter out the latest 5, and the rest of the files get sent to their slaughter. The intermediate sed -e 's,.*,"&",g' makes it work when there are spaces in the filenames by wrapping the filenames in quotes (avoid spaces in filenames).

Of course, I call this script via a local cron job as well.



exec > >(tee -a "${LOG_DIR}/${LOG_FILE}")
exec 2> >(tee -a "${LOG_DIR}/${LOG_FILE}" >&2)

ssh maksle 'bash -s' < ~/bin/ "/home/private/backups" 5

I hope that will help someone out!

Wordupress update script

WordPress offers the one-click update, but the file permissions required for that convenience are a security risk. For it to work, it essentially requires setting all files to the server group (usually web or apache or nobody user) and giving all those files group write permissions. Doing so trades security for convenience. Eventually there will be a security vector in the WordPress code, and with writeable PHP files everywhere, hackers will make short work of it.

WordPress provides manual updating instructions, and even gives a few code snippets here and there, but there’s really nothing there that should require human intervention. This little script updates WordPress to the latest version. The location of this script should be in a location on the web server not accessible to the web, which is /home/private/update-wp in my case.


set -u
set -e

# Cleanup from a previous call
rm -f latest.tar.gz
rm -rf wordpress
rm -rf backuptemp

# Get the latest, unzip it, and untar it
tar -xzvf latest.tar.gz

# The location of your wordpress install

# Copy these just in case
mkdir backuptemp
cp $blog/wp-config.php $blog/.htaccess backuptemp

# These are the files to be deleted as mentioned in the WordPress Manual Update link
rm $blog/wp*.php
rm $blog/license.txt $blog/readme.html $blog/xmlrpc.php
rm -rf $blog/wp-admin $blog/wp-includes

# Copy the files to overwrite what we have
# It will leave files alone that are in $blog/wp-content but not in the latest bundle which is what we want
rsync -avz wordpress/ "${blog}/"
cp backuptemp/wp-config.php backuptemp/.htaccess $blog

echo "DONE"

If something goes wrong you have your daily backups to save you (because you are backing things up, aren’t you?). I will write another post shortly showing my WordPress files and database backup script.

Tagedit.el for nxml-mode

If you use EMACS and have used lisp, you may have heard of paredit and smartparens. They allow you to operate on the Abstract Syntax Tree directly which can require a bit of a mind shift to get used to. This has been said: “If you think paredit is not for you then you need to become the kind of person that paredit is for.”

Check out this segment of a talk with Magnar Sveen, one of my biggest EMACS inspirations, discuss paredit. Here is Magnar showing off his use of paredit.

If you have used or heard of paredit, then you may have also heard about tagedit. It’s basically bringing some paredit features to html editing. I’ve been using it for a while and it’s both a pleasure to use and a huge time saver.

For a while it has been bothering me that I can’t use those awesome features when working on XML. I felt there is just no reason why I should get to enjoy that in html-mode but not in nxml-mode. nXML is the standard mode for xml in EMACS. I use it heavily at work for editing XSLT files.

This past weekend I wrote tagedit-nxml.el, a small package that makes tagedit compatible with nxml-mode. The “problem” was that tagedit was made with html-mode in mind, which derives from sgml-mode and uses sgml-mode functions to traverse the document. nxml-mode, however, is not derived from sgml-mode, but from text-mode, and traversing the document just doesn’t work the same way. Luckily, most of the functions I needed to modify were made available by tagedit.el to override. After showing the package to Magnar, the author of tagedit, he quickly provided function overrides that I needed to avoid having to use defadvice (functions like forward-list and backward-sexp). I can’t wait to start using it at work. This was a lot of fun and I learnd a lot of awesome elisp features.

XSLT dependency viewer for EMACS

I’ve written an XSLT dependency viewer for EMACS. It’s very similar to the package found here However, that library is for XLST 2.0 while I have to use XSLT 1.0 at work.

The parsing of the files to traverse the import/includes is done in EMACS lisp, which generates a dot diagram. That is then piped into the graphviz dot data visualization program and opened in your favorite PDF viewer. Graphviz is like LaTeX but for generating graphs of all kinds. Check out this graphviz dot guide that will give you an idea what it is capable of. Pretty powerful stuff.

First Pull Request

I have just made my first pull request on github.

My contribution was to Magnar Sveen’s awesome expand-region project. The fix was for nxml-mode. Expand region inside an xml attribute was including the outer quotes first before first expanding to just the inner quotes. It was also not properly expanding to the attribute when there are namespaces in the attribute. This fix amends that.

Magnar messaged me that expand-region is headed for the emacs core. Awesome! All contributors need to sign the Free Software Foundation copyright papers. See for reasons. I went ahead and emailed and signed away my copyright on this piece of code.

I’m pretty excited to see this go through, because not everyone’s first pull request ever incidentally also makes it into a major FSF project, let alone into EMACS core!

musings on programming, chess, martial arts, and other interests