# Slonik learns chess from self-play

## Need for speed

Recently I wrote about my chess engine Slonik. It is able to play chess autonomously, without human guidance, and she is probably the best playing Python chess engine in existence. However, it is still not very strong by today’s standard. A rough estimate places it somewhere around 1500 ELO, which is “chess club player” strength. It’s greatest limitation is it’s speed. Most chess engines today are written in C++, a language optimized for speed. From the starting chess position, just generating all the legal positions possible five half-moves deep (4.8M positions), Slonik takes 58 seconds, so about 82K positions per second. Currently I am working on a C++ port of Slonik. Under the same conditions, C++ Slonik runs the same computation in a tenth of a second, at 4.6M positions per second. That’s over 500 times faster! The primary driver for this improvement was to make bigger strides in Slonik’s self learning experiments.

## Artificial Intelligence

One of the other major goals mentioned in my original post which introduced Slonik to the world, was to have her learn from her own games. The original version of Slonik, like nearly every chess program in existence today, ran a hand-coded chess position evaluation function. Given a chess position, it outputs a number, indicating which side it thinks is better and by how much. Chess engines today differentiate themselves by implementation/performance/design differences, but more importantly, by how well they’ve taught their engine chess in the position evaluation function. Improvements to the chess knowledge, from version to version of the chess engine, are the primary means by which the chess engine improves. These improvements require an incredible amount of human effort, and more importantly are limited by the imagination of the engine author.

In September 2015, a new chess engine came out, called Giraffe, which learned to play for it’s own games. It is written in C++, like most chess engines, but unlike the others, it does not have a hand-written position evaluation function. Instead, it used a neural network and after starting from a somewhat initialized state (wherein it was fed labelled data teaching it only the approximate value of the pieces), it played games against itself for a week and reached about 2400 ELO strength. An engine that learns autonomously should be able to supersede the others. Giraffe was held back by performance (neural nets take longer to call than the optimized linear functions of other top engines).

One of my goals since my last post was to implement these ideas in Slonik. Inspired by Giraffe and eager to learn, I read Sutton and Barto’s Reinforcement Learning: An Introduction and even had some correspondence with Dr. Sutton (nice guy!), and then got to work on putting these ideas into practice on Slonik. To try to improve upon Giraffe’s ideas, I’ve experimented with various potential improvements, like Double Learning (two neural networks learning side by side, helping each other to improve and reducing bias) and prioritized sweeps (revisiting and giving priority to positions that were difficult, whilst adjusting for the artificial modification of the distribution of positions seen by the engine). Many of the ideas did not pan out, but some did, and some may still work yet when revisited (combined with other changes since). I have more ideas that I have yet to try. Besides the differences in architectures that I have tried, one major difference in Slonik that made a big improvement in self-learning was through the use of Huber loss, rather than the L1 loss used in Giraffe. These loss functions are an attempt to deal with the very frequent occurrence of outliers in the self-udpates arising during the learning process. The other major improvement was in the Reinforcement Learning algorithm itself. There is an interesting little story associated…

## Forbidden algorithms

Giraffe used an algorithm called TD-Leaf for self learning, which assumes that the evaluation of the successive positions (more accurately, the evaluation of the leaf node of the principal variation starting from the root position of the next move) is better than the corresponding evaluation one move prior. The “TD”, or temporal difference, part of TD-leaf makes the assumption that, in the backwards view, moves immediately prior are more to blame (or credit) for the change in evaluation of the target node, than moves still further prior. At some point after the recent AlphaGo success, I was looking into the work of David Silver, lead architect of AlphaGo, and discovered that he had been involved, in 2008, in the development of a revolutionary (but probably overlooked) self-learning chess program called Meep by Joel Veness, and was the first to achieve a high level of play (2300s ELO) in chess, entirely through self-play. Unlike Giraffe, Meep did not use a neural network, but a linear model, since linear models are faster to execute and are known to be sufficient for top-level play in chess. Furthermore, Meep did not use TD-Leaf, but a more efficient learning algorithm, which they called TreeStrap. TreeStrap differs from TD-Leaf by using each position’s search tree for updates, rather than the single evaluation of the next move.

I had some success with self-learning in Slonik using TD-Leaf, but when I read the TreeStrap paper, I was in awe. If this algorithm is as good as it sounds (and it sounds very good), then Giraffe should have been using this algorithm instead. I went ahead and replaced TD-leaf with TreeStrap in Slonik and the learning speed sky-rocketed. Why didn’t Giraffe use TreeStrap? Did Matthew Lai, the author of Giraffe, not know about it? Since this is my first major C++ project, I sometimes peruse the source code of Stockfish and Giraffe for implementation tips and ideas. About 3 weeks ago, I noticed an eyebrow raising comment in the Giraffe source code. There was an unused field on a class, with a comment that said “during TreeStrap we have to record write to the ttable”. Wait… so Matthew Lai did know about TreeStrap?

Then I realized what probably transpired. In a forum post on talkchess, Matthew Lai announced that he was hired by Google DeepMind, due to his awesome success on Giraffe (and his impressive background helped, no doubt — have a look at his resume), and that due to his work there, he can’t continue to improve his open-source engine Giraffe. He learns many techniques during his day job at Google and the divide between open knowledge and trade secrets is muddy. Matthew was hired there in December 2015. A little bit of investigating in the Giraffe source repository shows that the TreeStrap comment was added in that same month. I can see it in my head now — Matthew being interviewed by David Silver where David tells him about his work on Meep a whole 7 years earlier! It appears to me that Matthew started implementing TreeStrap, then learned he couldn’t use those ideas, and a forgotten remnant remained in the source. Well, it’s all speculation, but if that’s what happened, maybe Matthew can use TreeStrap in Giraffe now that someone else is using it openly. The white paper on TreeStrap is, after all, linked from David Silver’s personal page. However, probably that won’t happen, as Matthew Lai wrote he has other ideas and pet projects he’d rather spend his time on, and besides, there may be other ideas in the private Google hive mind that are even more performant. I suspect that this idea of learning from it’s own search was probably used in AlphaGo’s UCT routine, but I’m patiently waiting for Google to release the details of the techniques used in the recent success against world champion Ke Jie. In the meantime, I continue to work on Slonik (primarily the C++ port, currently). I have many ideas I want to try, including what I hope will be an improvement to the TreeStrap algorithm!

# Deriving the Y-Combinator

The Y Combiantor is one of the most confusing pieces of code I’ve ever seen.

const Y = (f) =>
( x => f(v => x(x)(v)) )(
x => f(v => x(x)(v))
);


Used as such:

const factorial = Y(function (fac) {
return function (n) {
return (n == 0 ? 1 : n * fac(n - 1));
}
});

factorial(5)
//=> 120


Of course, that’s cheating. Intended use is like this:

((f) =>
( x => f(v => x(x)(v)) )(
x => f(v => x(x)(v))
)(function (fac) {
return function (n) {
return (n == 0 ? 1 : n * fac(n - 1));
}
}))(5);
//=> 120


because it’s harder to understand that way. But really, it’s because the goal of the Y Combinator is to do recursion without naming any functions. It has no practical use, but it has theoretical importance, and it’s appears to be a good way to sink your time if you’re looking for a good puzzle. I thought to myself, could I derive it, if I hadn’t seen this solution before?

We’re trying to achieve the equivalent of this, without naming the function, of course:

function fac (n) {
return (n == 0 ? 1 : n * fac(n - 1));
}


A first pass:

(function (fac, n) {
return (n == 0 ? 1 : n * fac(fac, n - 1));
}(function (fac, n) {
return (n == 0 ? 1 : n * fac(fac, n - 1));
}, 5));
//=> 120


That accomplishes it, but that’s silly. The second pass is much better:

(function (fac) {
return function (n) {
return fac(fac)(n);
}
}(function(fac) {
return function (n) {
return (n == 0 ? 1 : n * fac(fac)(n - 1));
}
}))(5);
//=> 120


That works, so I guess that qualifies as a solution, but it has a big flaw that the canonical Y Combinator doesn’t – we have to call fac(fac)(n-1) in the exposed function instead of just fac(n-1). Bad.. very bad… But wait! That function taking n is the same as the one in the Y Combinator. Or is it? I got stuck here for a while, and once I realized I’d started talking to myself, I decided to try a different approach.

Time to cheat a little bit, with a named function. Too much lambda here, hard to understand.

(function (wrapper) {
function injector() {
return wrapper(n => injector()(n));
}
return injector();
}(function(fac) {
return function (n) {
return (n == 0 ? 1 : n * fac(n - 1));
}
}))(5);
//=> 120


This idea took a bit of a leap, but it turns out this is exactly what the Y Combinator is doing, in a more sane, readable way of course. I named the function injector, because it is injecting a function into the fac parameter of the wrapper function. An alternative name might have been recur or runner, because the function it’s injecting does those things. This makes things clearer. But I can’t use named functions. Let’s translate this back to crazy.

Now I can use the idea from the 2nd pass of passing the function to itself:

(function (wrapper) {
return
(function (injector) {
return wrapper(n => injector(injector)(n));
}(function (injector) {
return wrapper(n => injector(injector)(n));
}));
}(function(fac) {
return function (n) {
return (n == 0 ? 1 : n * fac(n - 1));
}
}))(5);
//=> 120


Yikes. injector(injector) looks crazy, but that’s just what we have to do when we use the pass-function-to-self-trick. After all, it is a function passing itself to itself, so if you want to use it, you have to have it call itself. Makes perfect sense, how did I not see it before (/sarcasm).

Anyway, time to reap the benefits. This is exactly the Y Combinator! To finish it off, and for absolute correctness, we have to rewrite it to make it as obscure as possible, with less descriptive names.

var result = (f =>
(x => f(n => x(x)(n)))
(x => f(n => x(x)(n))))
(fac => n => (n == 0 ? 1 : n * fac(n - 1)))(5);
//=> 120


There, perfect.

# Javascript: generators for async

I stumbled across this article by James Long (@jlongster) on using generators for async as an alternative to async/await. His Q.async example, using Kris Kowal’s (@kriskowal) Q library, caught my eye and I decided to try implementing the async function without peeking at Q’s code. It took me a little while to come up with this solution, but the result is pleasingly simple!

async accepts a generator and returns a function that returns a promise. The yielded expressions in the generator are the fulfilled values. This allows running asynchronous code in synchronous style, without using async/await. Here’s a simple example of how it’s used:

Notice that you can yield promises or plain values, it doesn’t matter. Also, this supports using try/catch with asynchronous calls. Pretty cool!

For comparison, here is the Q.async implementation. After writing this up I also found the solution here (essentially same as Q.async). It differs from mine in that I’m not using an explicit try/catch to turn the synchronous error into an asynchronous one, but it happens implicitly due to being wrapped in a Promise, so I think the result/behavior is the same.

I recently released my first Firefox add-on. I’m always copy-pasting words from the browser into my terminal to look them up with dict client. It’s simply the best dictionary tool I’ve ever used, because it looks up words from many dictionaries at once. You can find out more about DICT here and here.

It looks up words by calling dict client on your machine in a sub-process. It can also automatically save words you look up into a list to review later. Double click a word on any web page, and the extension will spawn a process to call dict and then display the output in a popup. As an added feature, the words you look up are automatically added to a list for later review.

This addon does not actually implement the DICT protocol, nor call any DICT servers on it’s own. It delegates that entirely to the dict client on your machine. As an alternative to my add-on, this extension is quite good and does implement the DICT protocol, if that is what you are looking for.

However, I suspect the above mentioned add-on, which does implement a DICT client, may stop working at some point relatively soon, because like many useful add-ons, it uses require('chrome'), and Mozilla is doing away with the add-on SDK and many of it’s low level APIs. A lot of developers are understandably upset about that. I was going to implement it as well, but due to these plans by Mozilla, I decided to stay away, as there’s currently no way to do it without using chrome and the low-level APIs.

I think that my add-on will only work on Linux, and maybe on Mac (though I have only tested on my machine), as you must have the dict client installed for it to work.

You can install the add-on from the Firefox add-on listing page. The code is also hosted on github.

# Tagedit.el for nxml-mode

If you use EMACS and have used lisp, you may have heard of paredit and smartparens. They allow you to operate on the Abstract Syntax Tree directly which can require a bit of a mind shift to get used to. This has been said: “If you think paredit is not for you then you need to become the kind of person that paredit is for.”

Check out this segment of a talk with Magnar Sveen, one of my biggest EMACS inspirations, discuss paredit. Here is Magnar showing off his use of paredit.

If you have used or heard of paredit, then you may have also heard about tagedit. It’s basically bringing some paredit features to html editing. I’ve been using it for a while and it’s both a pleasure to use and a huge time saver.

For a while it has been bothering me that I can’t use those awesome features when working on XML. I felt there is just no reason why I should get to enjoy that in html-mode but not in nxml-mode. nXML is the standard mode for xml in EMACS. I use it heavily at work for editing XSLT files.

This past weekend I wrote tagedit-nxml.el, a small package that makes tagedit compatible with nxml-mode. The “problem” was that tagedit was made with html-mode in mind, which derives from sgml-mode and uses sgml-mode functions to traverse the document. nxml-mode, however, is not derived from sgml-mode, but from text-mode, and traversing the document just doesn’t work the same way. Luckily, most of the functions I needed to modify were made available by tagedit.el to override. After showing the package to Magnar, the author of tagedit, he quickly provided function overrides that I needed to avoid having to use defadvice (functions like forward-list and backward-sexp). I can’t wait to start using it at work. This was a lot of fun and I learnd a lot of awesome elisp features.

# XSLT dependency viewer for EMACS

I’ve written an XSLT dependency viewer for EMACS. It’s very similar to the package found here http://www.thoughtcrime.us/software/xslt-dependency-graph/. However, that library is for XLST 2.0 while I have to use XSLT 1.0 at work.

The parsing of the files to traverse the import/includes is done in EMACS lisp, which generates a dot diagram. That is then piped into the graphviz dot data visualization program and opened in your favorite PDF viewer. Graphviz is like LaTeX but for generating graphs of all kinds. Check out this graphviz dot guide that will give you an idea what it is capable of. Pretty powerful stuff.

# First Pull Request

I have just made my first pull request on github. https://github.com/magnars/expand-region.el/pull/148

My contribution was to Magnar Sveen’s awesome expand-region project. The fix was for nxml-mode. Expand region inside an xml attribute was including the outer quotes first before first expanding to just the inner quotes. It was also not properly expanding to the attribute when there are namespaces in the attribute. This fix amends that.

Magnar messaged me that expand-region is headed for the emacs core. Awesome! All contributors need to sign the Free Software Foundation copyright papers. See https://gnu.org/licenses/why-assign for reasons. I went ahead and emailed assign@gnu.org and signed away my copyright on this piece of code.

I’m pretty excited to see this go through, because not everyone’s first pull request ever incidentally also makes it into a major FSF project, let alone into EMACS core!

# etags-update-mode

Just a few days ago I wrote my first EMACS minor-mode, called etags-update-mode. It updates your TAGS file on save. It’s heavily inspired by another package/minor mode with the same name by Matt Keller.

In order to update the tags for a file on save, Matt’s etags-update-mode calls a perl file to delete any previous tags for a specific file in a TAGS file before it appends the new definitions in the file. Also, with that package the minor mode is defined as a global minor mode.

I wanted the functionality that the package provided, but I didn’t want it to be a global minor mode (the only global minor mode that I’ve used that I’m aware of and that I like having everywhere is YaSnippet). I also didn’t see why there should be a reliance on perl. I wanted to do it all in elisp.

So I wrote a much simpler version of etags-update-mode that is a regular minor mode and does all it’s work in EMACS. I’ll be updating it as I continue to use it.

# EMACS etags

EMACS has an etags.el package that supports use of etags, the EMACS version of ctags. It tags your source code so you can jump directly to the source for a function, variable, or other symbol. I’ve been using it heavily with C++ and C# (though for C++, I’ve supplanted it with GNU Global, and there is an EMACS package for that too, ggtags).

I wanted the same functionality for xslt, which I use heavily at work. Luckily exuberant-ctags and etags both provide support for extending support to other languages, by supplying regular expressions.

I put the following regular expressions in ~/.ctags:

--langdef=xslt
--langmap=xslt:.xsl
--regex-xslt=/<xsl:template name="([^"]*)"/1/
--regex-xslt=/<xsl:template match="[^"]*"[ \t\n]+mode="([^"]*)"/1/
--regex-xslt=/<xsl:variable name="([^"]+)"/1/


… and generate the TAGS file

ctags -e -o TAGS *.xsl

I can now jump to the definition of any variable or template in my xsl files!

# Learning LaTeX and the result compared to Word

LaTeX is a high quality typesetting system. It is open source and more important it is Free Software. This past weekend I decided to learn some LaTeX as it has been an interest in the back of my mind for some time. At first I was hesitant because LaTeX was made for Mathematical and Scientific papers, which I don’t write. The impetus for Donald Knuth, the author of the underlying language (TeX), was that there weren’t good tools for the documentation and display of Mathematical formulas. However, my concern was misguided. LaTeX can make any document look beautiful, and it can be used for any kind of article, book, or even a resume. What sets it apart from WYSIWYG editors, like Microsoft Word, is the sheer typographical quality of the resulting documents you can produce. LaTeX algorithms under the hood calculate everything from page line height to word and letter spacing. They can be adjusted as you like and many packages exist that make the process easier.

My mother has produced a book recently using Microsoft Word, and it was no easy task. The index-making progress is difficult, headers inexplicably stop showing up correctly, page numbers stop respecting section boundaries, and blank pages pop up everywhere in the PDF result. Furthermore, the WYSIWYG nature of Word encourages you to manually edit spacing issues with the wrong tools, and if you are picking up from where someone else left off, then good luck to you reformatting everything. Even after reformatting, if text is changed and pages shift, you have to redo your work. I wanted to convince myself and my mother that the book can be produced with LaTeX in one or two nights, and look better than its Word counterpart. I was able to do it in just one afternoon.

First I found that LaTeX supports a book document type. You declare it in the first line of your document. However, after adding more pieces to the puzzle, I learned about the Koma-Script package which provides a drop-in replacement for the book (and article & report) class, packed with some additional goodies. There is also the memoir class, which was an interesting alternative. I loaded its book class with the scrbook class.

\documentclass[12pt,letterpaper]{scrbook}

I found that I did not even need to install anything as it was already installed in the LaTeX package bundled with Ubuntu. I haven’t tried it yet, but I’ve read that MiKTeX is a good package to get started with LaTeX on Windows and it makes it easy to use this and other useful packages. I plan on getting my mother to try MiKTeX once I show her my LaTeX version of the book (which undeniably looks better than the Word version). However, for this proof of concept, I didn’t use any editors specially designed for LaTeX, as I of course was working in EMACS. EMACS has a LaTeX mode with useful key bindings and syntax highlighting. I immediately got started copying all the chapter titles like this:

\chapter[Optional short name for the TOC]{My Very Long Chapter Name Here}

I did not have to wrap pagraphs in any tags as you simply skip a line to indicate that it is a new paragraph.

This book had many quotations and blockquotes, and many of them were formatted improprly in Word. Word doesn’t make that easy. I didn’t have to worry about any of that, as in LaTeX I am only semantically tagging them, not styling them. Styling comes later, when you’re done tagging, though I found that even the default styling was impressive. Here’s what the markup looks like:

\begin{quote}
All that is gold does not glitter,
Not all those who wander are lost;
The old that is strong does not wither,
Deep roots are not reached by the frost.

From the ashes a fire shall be woken,
A light from the shadows shall spring;
Renewed shall be blade that was broken,
The crownless again shall be king.
\end{quote}


Koma-Script’s scrbook gives useful variations on subsections, like addsec* and minisec, for example. The * is a modifier that keeps it from appearing in the Table of Contents (TOC).

\minisec*{My mini subsection name}
Blah Blah


Creating the index was refreshingly sane. I simply went into the points of interset in the text and dropped \index{key} tags and I was done. Once I did that, text can be added or removed and pages can shift, but we have no additional work to do as it’s all recalculated for you. All pages with the same key get pointed to in the index under the same entry. Sequential page ranges get smartly hyphenated. Footnotes were just as easy. For this book, footnotes were not used, but instead, endnotes. I googled for endnotes and found that there was a package for it already. Once again, I did not even have to download it, as it was already included in my LaTeX package. I wanted the endnote numerbers to reset every chapter, as it is in the Word version, and there’s a package for that too.

\usepackage{endnotes,chngcntr}
\counterwithin*{endnote}{chapter}  % Reset endnote numbering every new chapter

This is a brief overview of some of the tags that I used that I hope highlight how easy this was to do. Then I generated the document directly to PDF. Without even thinking about styling yet, the document that was produced was a typograhically stunning work. With a couple of easy tweaks, I purposely made it look closer to the Word document for comparison purposes, to highlight the superiority of the type produced by LaTeX. Unfortunately I can’t produce the “final” proof of concept here, as it is an entire book and I don’t hold rights to it. It would not be entirely fair for me to omit that there is a learning curve with LaTeX, of course. However, I hope that this helps anyone just starting or curious about learning LaTeX.