All posts by maksle

Preserving Color in Style Transfer

Image style transfer is extremely fun and I had to just return to it one more time. I noticed that some people have started adding color transfer, as a recent paper came out on Preserving Color in Neural Artistic Style Transfer .

Throughout this post I’m modifying the following photo I took back in 2011 in Ireland.

A street in Ireland (2011 vacation)

Take this Starry Night Van Gogh rendition.

It is… blue.

But by preserving color, the result is much nicer. The color is transferred from the content image to the style image before the stylized version is generated. The mean and covariance across the RGB channels is updated in the style image to match the content image.

The other method mentioned in the paper is by luminance transfer. I will probably implement that as well and add it here — the math is very similar, and simpler in fact, as covariance matrix is not needed as there is only one channel being changed.

Sometimes color transfer isn’t needed, but when it is, it’s pretty handy. It’s remarkable that this can be done by simply changing some statistics in the image.

Here are a couple of cool ones from today — they did not require/use color transfer but I had to include them because they are pretty cool!


The code for this implementation is on github.

As part of the Deep Learning For Coders part 2 course, we implemented the original GAN and DCGAN. The original GAN implementation uses a simple multi-layer perceptrons for the generator and discriminator, and it does not work very well. DCGAN uses a convolutional architecture and does better. However, the result in the course notebook did not look impressive despite many thousands of epochs of training. It’s possible to get it to a better state as I have succeeded in doing here. There is room for more improvement still. I suspect the instructor did not try too hard to show that though, because the newer WGAN, introduced a bit later, is better and easier to train. However, this post is about implementing the DCGAN.

It took me a while to get it to work well. Despite trying many things, what seemed to finally do it was switching from Adam optimizer to RMSPROP, and weight clipping. These things, together with Dropout and Batch Normalization, stabilized training. I did not do comprehensive experiments, but 5×5 convolutions seemed to perform better on this particular dataset.

Here you can see how the GAN learns to generate increasingly better novel handwritten digits.

output over epochs

The outputs look realistic, but they can be improved. Future improvements and code are on github.

Generating Art with Computers

In the world of deep learning, there’s never a dull moment. The breadth of interesting applications seems unbounded. It’s been applied to reach super-human performance in many areas like game playing and object recognition, and integral to exciting new technologies like self driving cars. Increasingly we find that the knowledge embodied in a trained neural network can be transferred to seemingly unrelated areas. Take for example the combination of two disparate ideas: object recognition and Picasso paintings. A net built for the sole purpose of recognizing objects in a photo has been found to be useful, completely unaltered, for the task of rendering an image in the style of any painting. This is the technology behind the scenes of apps like Prisma, but it is not hard to do yourself, and I have to say that implementing this was a lot of fun.

original image

These images were created, as described in Gatys et. al, by passing a random noise image through the VGG-16 convolutional network (with imagenet weights and top layers removed) and updating the pixels of the input image directly with gradient descent. An error function is devised, which quantifies how poorly the initial seed (random noise at first) balances between appearing like the original image, and at the same time in the style of the chosen painting. The derivative of the loss with respect to the RGB values of the noise image is calculated. The pixels are adjusted in their respective directions, and the process repeats. Amazingly, this works – the image looks more and more like a stylized version of the original image with each iteration of this optimization. This is due to the unreasonable effectiveness of gradient descent, convolutional neural networks, and well designed loss functions.

The error function combines content loss E_c and style loss E_s. This balances between preserving high level features of the original image, and with the textures of the painting. The errors compare the values of chosen convolutional layers when the input is the original image, generated image, or painting.

The content error is just the squared euclidian distance between corresponding filter activations under the stylized and original images.
\displaystyle E_c^l=\frac{1}{2}\sum _{i,j} {\left\|F_{i_j}^l-P_{i_j}^l\right\|^2} where F_{ij}^l is the activation of the i^{th} filter at position j in layer l.

The novelty that makes all this work is the style error. It can be done with a statistic on the channels of the convolutions in the higher layers of the network. This was originally designed to capture texture information in a texture synthesis algorithm. The style error of a layer is the mean squared error between the gramians, \displaystyle E_s^l=\frac{1}{4 N^2 M^2} \sum _{ij} {\left(G_{i_j}^l-A_{i_j}^l\right)^2} , where G_{ij}=\langle v_i,v_j \rangle is the inner product of the vectorized feature maps i and j in filter layer l of the style image, and A is the corresponding gramian when the input is noise. The inner product (gramian) shows the correlation between each pair of channels, and this captures texture information. The same thing in Keras code:

def gram_matrix(v):
    # In tensorflow, dim order is x,y,channel
    # Make each row a channel
    chans = K.permute_dimensions(x, (2, 0, 1))
    # vectorize the feature maps
    features = K.batch_flatten(chans)
    # gramian is just an inner product
    return, K.transpose(features)) 
             / x.get_shape().num_elements()
def style_loss(vi, vj):
    # mean squared error
    return mse(gram_matrix(vi), gram_matrix(vj))

The total error is \alpha E_c + \beta E_s with the weights on each error as hyperparameters. The results are quite astounding. Playing with weights \alpha and \beta, and with the weights on the contribution of each layer l towards the style loss allows for a large range of interesting results. This picture was created by decreasing the weight of the content loss.

One major drawback of this method can be seen in the creation above. The style loss must be tempered, or else it obstructs features of the image, like the face. The third set of images, in the style of “Woman in a Hat with Pompoms and a Printed Shirt” is another example. So portraits in general are not the best target. However, landscapes and the like come out amazing. The reason for this is that the style loss function does not take into account anything about the style image except the texture of the image. There are alternative statistics that have been tried. One successful result that I have not yet tried comes from the study of Markov Random Fields, used classically for image synthesis. The idea is to calculate the loss between patches of the filters, rather than the whole filter at once, where the loss of each patch of the generated image is calculated against the most similar patch (by cross correlation) for the painting.

Another drawback is that each generated image/painting combo must be calculated separately. This has been addressed by Johnson et. al and others by training a neural network which can turn an input image into a representation which, when passed through a loss network (such as VGG-16 as above), generates a stylized image of a particular style. The benefit is the speed – once trained, generating a stylized version of an input image is hundreds of orders of magnitude faster. The drawback is that the transformation network takes much longer to train and is only able to output images of the specific style it was trained on. However, this is the type of solution that can scale, for example to video. I have replicated exactly the network architecture as described in Johnson et. al. Here’s an example result:

I had trouble getting rid of artifacts showing up in some input images, like the blotch of white on the right of the stylized image. However, I did not train the network for very long, primarily to avoid large AWS GPU server bills, so the results are not that great. I’ll probably come back to this soon, as I am building my own GPU server! There are so many ideas to explore with this.

Slonik learns chess from self-play

Need for speed

Recently I wrote about my chess engine Slonik. It is able to play chess autonomously, without human guidance, and she is probably the best playing Python chess engine in existence. However, it is still not very strong by today’s standard. A rough estimate places it somewhere around 1500 ELO, which is “chess club player” strength. It’s greatest limitation is it’s speed. Most chess engines today are written in C++, a language optimized for speed. From the starting chess position, just generating all the legal positions possible five half-moves deep (4.8M positions), Slonik takes 58 seconds, so about 82K positions per second. Currently I am working on a C++ port of Slonik. Under the same conditions, C++ Slonik runs the same computation in a tenth of a second, at 4.6M positions per second. That’s over 500 times faster! The primary driver for this improvement was to make bigger strides in Slonik’s self learning experiments.

Artificial Intelligence

One of the other major goals mentioned in my original post which introduced Slonik to the world, was to have her learn from her own games. The original version of Slonik, like nearly every chess program in existence today, ran a hand-coded chess position evaluation function. Given a chess position, it outputs a number, indicating which side it thinks is better and by how much. Chess engines today differentiate themselves by implementation/performance/design differences, but more importantly, by how well they’ve taught their engine chess in the position evaluation function. Improvements to the chess knowledge, from version to version of the chess engine, are the primary means by which the chess engine improves. These improvements require an incredible amount of human effort, and more importantly are limited by the imagination of the engine author.

In September 2015, a new chess engine came out, called Giraffe, which learned to play for it’s own games. It is written in C++, like most chess engines, but unlike the others, it does not have a hand-written position evaluation function. Instead, it used a neural network and after starting from a somewhat initialized state (wherein it was fed labelled data teaching it only the approximate value of the pieces), it played games against itself for a week and reached about 2400 ELO strength. An engine that learns autonomously should be able to supersede the others. Giraffe was held back by performance (neural nets take longer to call than the optimized linear functions of other top engines).

One of my goals since my last post was to implement these ideas in Slonik. Inspired by Giraffe and eager to learn, I read Sutton and Barto’s Reinforcement Learning: An Introduction and even had some correspondence with Dr. Sutton (nice guy!), and then got to work on putting these ideas into practice on Slonik. To try to improve upon Giraffe’s ideas, I’ve experimented with various potential improvements, like Double Learning (two neural networks learning side by side, helping each other to improve and reducing bias) and prioritized sweeps (revisiting and giving priority to positions that were difficult, whilst adjusting for the artificial modification of the distribution of positions seen by the engine). Many of the ideas did not pan out, but some did, and some may still work yet when revisited (combined with other changes since). I have more ideas that I have yet to try. Besides the differences in architectures that I have tried, one major difference in Slonik that made a big improvement in self-learning was through the use of Huber loss, rather than the L1 loss used in Giraffe. These loss functions are an attempt to deal with the very frequent occurrence of outliers in the self-udpates arising during the learning process. The other major improvement was in the Reinforcement Learning algorithm itself. There is an interesting little story associated…

Slonik improvement iterations vs score graph
Slonik improving in chess playing strength over time through self-play. STS score on y axis. Starting from random initialization at approximately 1500 STS score (2 ply look-ahead search per position). Slonik is still playing and learning, this is a current snapshot as of the writing of this post.

Forbidden algorithms

Giraffe used an algorithm called TD-Leaf for self learning, which assumes that the evaluation of the successive positions (more accurately, the evaluation of the leaf node of the principal variation starting from the root position of the next move) is better than the corresponding evaluation one move prior. The “TD”, or temporal difference, part of TD-leaf makes the assumption that, in the backwards view, moves immediately prior are more to blame (or credit) for the change in evaluation of the target node, than moves still further prior. At some point after the recent AlphaGo success, I was looking into the work of David Silver, lead architect of AlphaGo, and discovered that he had been involved, in 2008, in the development of a revolutionary (but probably overlooked) self-learning chess program called Meep by Joel Veness, and was the first to achieve a high level of play (2300s ELO) in chess, entirely through self-play. Unlike Giraffe, Meep did not use a neural network, but a linear model, since linear models are faster to execute and are known to be sufficient for top-level play in chess. Furthermore, Meep did not use TD-Leaf, but a more efficient learning algorithm, which they called TreeStrap. TreeStrap differs from TD-Leaf by using each position’s search tree for updates, rather than the single evaluation of the next move.

I had some success with self-learning in Slonik using TD-Leaf, but when I read the TreeStrap paper, I was in awe. If this algorithm is as good as it sounds (and it sounds very good), then Giraffe should have been using this algorithm instead. I went ahead and replaced TD-leaf with TreeStrap in Slonik and the learning speed sky-rocketed. Why didn’t Giraffe use TreeStrap? Did Matthew Lai, the author of Giraffe, not know about it? Since this is my first major C++ project, I sometimes peruse the source code of Stockfish and Giraffe for implementation tips and ideas. About 3 weeks ago, I noticed an eyebrow raising comment in the Giraffe source code. There was an unused field on a class, with a comment that said “during TreeStrap we have to record write to the ttable”. Wait… so Matthew Lai did know about TreeStrap?

Then I realized what probably transpired. In a forum post on talkchess, Matthew Lai announced that he was hired by Google DeepMind, due to his awesome success on Giraffe (and his impressive background helped, no doubt — have a look at his resume), and that due to his work there, he can’t continue to improve his open-source engine Giraffe. He learns many techniques during his day job at Google and the divide between open knowledge and trade secrets is muddy. Matthew was hired there in December 2015. A little bit of investigating in the Giraffe source repository shows that the TreeStrap comment was added in that same month. I can see it in my head now — Matthew being interviewed by David Silver where David tells him about his work on Meep a whole 7 years earlier! It appears to me that Matthew started implementing TreeStrap, then learned he couldn’t use those ideas, and a forgotten remnant remained in the source. Well, it’s all speculation, but if that’s what happened, maybe Matthew can use TreeStrap in Giraffe now that someone else is using it openly. The white paper on TreeStrap is, after all, linked from David Silver’s personal page. However, probably that won’t happen, as Matthew Lai wrote he has other ideas and pet projects he’d rather spend his time on, and besides, there may be other ideas in the private Google hive mind that are even more performant. I suspect that this idea of learning from it’s own search was probably used in AlphaGo’s UCT routine, but I’m patiently waiting for Google to release the details of the techniques used in the recent success against world champion Ke Jie. In the meantime, I continue to work on Slonik (primarily the C++ port, currently). I have many ideas I want to try, including what I hope will be an improvement to the TreeStrap algorithm!

My own chess engine

I’ve written a chess engine named Slonik. It implements the Universal Chess Interface (UCI), so you can download any popular chess interface, like Scid vs Pc. or Chessbase, to analyze with or play against Slonik.

I’ve written this engine from scratch, and chose to write it in Python, so that I can iterate quickly. That makes the engine slower, but maybe one day I will port it to C++. However, I am happy with it’s playing strength, all considering. The details of the engine are on the github page, but to summarize:

  • Alpha-beta minimax, quiescence search
  • Bitboard piece/board representation
  • Various search heuristics, such as the history heuristic, extensions, reductions, etc.
  • Hand-coded evaluation function
  • Transposition hash table

I plan to return to working on this engine’s AI — specifically to use deep learning and reinforcement learning techniques rather than the current hand-coded evaluation function.

Deriving the Y-Combinator

The Y Combiantor is one of the most confusing pieces of code I’ve ever seen.

const Y = (f) =>
  ( x => f(v => x(x)(v)) )(
    x => f(v => x(x)(v))

Used as such:

const factorial = Y(function (fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(n - 1));
  //=> 120

Of course, that’s cheating. Intended use is like this:

((f) =>
  ( x => f(v => x(x)(v)) )(
    x => f(v => x(x)(v))
  )(function (fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(n - 1));
  //=> 120

because it’s harder to understand that way. But really, it’s because the goal of the Y Combinator is to do recursion without naming any functions. It has no practical use, but it has theoretical importance, and it’s appears to be a good way to sink your time if you’re looking for a good puzzle. I thought to myself, could I derive it, if I hadn’t seen this solution before?

We’re trying to achieve the equivalent of this, without naming the function, of course:

function fac (n) {
  return (n == 0 ? 1 : n * fac(n - 1));

A first pass:

(function (fac, n) {
  return (n == 0 ? 1 : n * fac(fac, n - 1));
}(function (fac, n) {
  return (n == 0 ? 1 : n * fac(fac, n - 1));
}, 5));
  //=> 120

That accomplishes it, but that’s silly. The second pass is much better:

(function (fac) {
  return function (n) {
    return fac(fac)(n);
}(function(fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(fac)(n - 1));
  //=> 120

That works, so I guess that qualifies as a solution, but it has a big flaw that the canonical Y Combinator doesn’t – we have to call fac(fac)(n-1) in the exposed function instead of just fac(n-1). Bad.. very bad… But wait! That function taking n is the same as the one in the Y Combinator. Or is it? I got stuck here for a while, and once I realized I’d started talking to myself, I decided to try a different approach.

Time to cheat a little bit, with a named function. Too much lambda here, hard to understand.

(function (wrapper) {
  function injector() {
    return wrapper(n => injector()(n));
  return injector();
}(function(fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(n - 1));
  //=> 120

This idea took a bit of a leap, but it turns out this is exactly what the Y Combinator is doing, in a more sane, readable way of course. I named the function injector, because it is injecting a function into the fac parameter of the wrapper function. An alternative name might have been recur or runner, because the function it’s injecting does those things. This makes things clearer. But I can’t use named functions. Let’s translate this back to crazy.

Now I can use the idea from the 2nd pass of passing the function to itself:

(function (wrapper) {
  (function (injector) {
    return wrapper(n => injector(injector)(n));
  }(function (injector) {
    return wrapper(n => injector(injector)(n));
}(function(fac) {
  return function (n) {
    return (n == 0 ? 1 : n * fac(n - 1));
  //=> 120

Yikes. injector(injector) looks crazy, but that’s just what we have to do when we use the pass-function-to-self-trick. After all, it is a function passing itself to itself, so if you want to use it, you have to have it call itself. Makes perfect sense, how did I not see it before (/sarcasm).

Anyway, time to reap the benefits. This is exactly the Y Combinator! To finish it off, and for absolute correctness, we have to rewrite it to make it as obscure as possible, with less descriptive names.

var result = (f =>
 (x => f(n => x(x)(n)))
 (x => f(n => x(x)(n))))
(fac => n => (n == 0 ? 1 : n * fac(n - 1)))(5);
  //=> 120

There, perfect.

Javascript: generators for async

I stumbled across this article by James Long (@jlongster) on using generators for async as an alternative to async/await. His Q.async example, using Kris Kowal’s (@kriskowal) Q library, caught my eye and I decided to try implementing the async function without peeking at Q’s code. It took me a little while to come up with this solution, but the result is pleasingly simple!

async accepts a generator and returns a function that returns a promise. The yielded expressions in the generator are the fulfilled values. This allows running asynchronous code in synchronous style, without using async/await. Here’s a simple example of how it’s used:

Notice that you can yield promises or plain values, it doesn’t matter. Also, this supports using try/catch with asynchronous calls. Pretty cool!

For comparison, here is the Q.async implementation. After writing this up I also found the solution here (essentially same as Q.async). It differs from mine in that I’m not using an explicit try/catch to turn the synchronous error into an asynchronous one, but it happens implicitly due to being wrapped in a Promise, so I think the result/behavior is the same.

Firefox add-on released

I recently released my first Firefox add-on. I’m always copy-pasting words from the browser into my terminal to look them up with dict client. It’s simply the best dictionary tool I’ve ever used, because it looks up words from many dictionaries at once. You can find out more about DICT here and here.

It looks up words by calling dict client on your machine in a sub-process. It can also automatically save words you look up into a list to review later. Double click a word on any web page, and the extension will spawn a process to call dict and then display the output in a popup. As an added feature, the words you look up are automatically added to a list for later review.

Screenshot of dict-extension in use

This addon does not actually implement the DICT protocol, nor call any DICT servers on it’s own. It delegates that entirely to the dict client on your machine. As an alternative to my add-on, this extension is quite good and does implement the DICT protocol, if that is what you are looking for.

However, I suspect the above mentioned add-on, which does implement a DICT client, may stop working at some point relatively soon, because like many useful add-ons, it uses require('chrome'), and Mozilla is doing away with the add-on SDK and many of it’s low level APIs. A lot of developers are understandably upset about that. I was going to implement it as well, but due to these plans by Mozilla, I decided to stay away, as there’s currently no way to do it without using chrome and the low-level APIs.

I think that my add-on will only work on Linux, and maybe on Mac (though I have only tested on my machine), as you must have the dict client installed for it to work.

You can install the add-on from the Firefox add-on listing page. The code is also hosted on github.

WordPress backup script

In my previous post I showed my WordPress update script. However, it’s not safe to update without first backing everything up in case something goes wrong. This is a script that I adapted from this post. It backs up both files and the database.


echo "In $0"

if [ $# -gt 0 ]; then
    NOW=$(date +"%Y-%m-%d-%H%M")



# WWW_TRANSFORM='s,^home/public/blog,www,'
# DB_TRANSFORM='s,^home/private/backups,database,'

# tar -cvf $BACKUP_DIR/$FILE --transform $WWW_TRANSFORM $WWW_DIR

mysqldump --host=$DB_HOST -u$DB_USER -p$DB_PASS $DB_NAME > $BACKUP_DIR/$DB_FILE

# tar --append --file=$BACKUP_DIR/$FILE --transform $DB_TRANSFORM $BACKUP_DIR/$DB_FILE

You may have noticed that there is a commented out version of the tar transform variable and command. My host has a version of tar (bsdtar 2.8.5) that doesn’t have the --transform option, but does have an alternative -s option that does more or less the same thing. The idea is that the backup will have directory stucture backup/file.php rather than /home/public/blog/file.php for example.

mysqldump has many options you can pass it, which you may want to look into. However, the option --opt is a default, and does what I want. It is probably good enough for most sites. The problem with --opt is that it requires locking the table during the export, which also has implications on permissions required for your backup user. What backup user? Well, since you are storing the DB user and password in plain text in your script, you should not use your administrator user. It’s best to create a backup user with minimal permissions necessary to do the backup. Ideally that would be just SELECT privileges, but with the mentioned --opt option, LOCK TABLES privileges are required too. Here’s how you’d set that user up:

MySQL> CREATE USER backup IDENTIFIED BY 'randompassword';

I call the above script from a cron job on my local computer:


# Exit if any command fails
set -e 
# Don't allow use of unintialized variables
set -u 

# Set up some variables
NOW=$(date +"%Y-%m-%d-%H%M")

# Redirect standard output and error output to a log file.
exec > >(tee -a "${LOG_DIR}/${LOG_FILE}")
exec 2> >(tee -a "${LOG_DIR}/${LOG_FILE}" >&2)

mkdir -p $LOG_DIR

# The cool part: Run my local on the remote web server.
ssh maksle 'bash -s' < ~/bin/ $NOW

# Sync the remote server backup logs with the backups directory on my local machine. After all, what good are backups if your webserver is down and you can't access them?
rsync -havz --stats maksle:/home/private/backups/ $BACKUP_DIR

Of course, the remote server can get filled up with backups, so I have another script that removes any backups more than 5 days old. I continue to have as many as far back as I want on my local machine.


set -e
set -u

# Error out if a command in a pipe fails
set -o pipefail

# Usage example:
# /home/private/backups 5


# This would be 5 if called as in the Usage example 
declare -i allow=$2
# This gets the number of files in the directory, which we assume are all backup tgz files
declare -i num=$(ls | wc -l)

if [ $num -gt $allow ]; then
    # Remove all but latest files
    (ls -t | head -n $allow; ls) | sort | uniq -u | sed -e 's,.*,"&",g' | xargs rm -f

The above command works by first printing the latest 5 files, and then all the files. This way the latest 5 files get printed twice. This allows uniq -u to filter out the latest 5, and the rest of the files get sent to their slaughter. The intermediate sed -e 's,.*,"&",g' makes it work when there are spaces in the filenames by wrapping the filenames in quotes (avoid spaces in filenames).

Of course, I call this script via a local cron job as well.



exec > >(tee -a "${LOG_DIR}/${LOG_FILE}")
exec 2> >(tee -a "${LOG_DIR}/${LOG_FILE}" >&2)

ssh maksle 'bash -s' < ~/bin/ "/home/private/backups" 5

I hope that will help someone out!

Wordupress update script

WordPress offers the one-click update, but the file permissions required for that convenience are a security risk. For it to work, it essentially requires setting all files to the server group (usually web or apache or nobody user) and giving all those files group write permissions. Doing so trades security for convenience. Eventually there will be a security vector in the WordPress code, and with writeable PHP files everywhere, hackers will make short work of it.

WordPress provides manual updating instructions, and even gives a few code snippets here and there, but there’s really nothing there that should require human intervention. This little script updates WordPress to the latest version. The location of this script should be in a location on the web server not accessible to the web, which is /home/private/update-wp in my case.


set -u
set -e

# Cleanup from a previous call
rm -f latest.tar.gz
rm -rf wordpress
rm -rf backuptemp

# Get the latest, unzip it, and untar it
tar -xzvf latest.tar.gz

# The location of your wordpress install

# Copy these just in case
mkdir backuptemp
cp $blog/wp-config.php $blog/.htaccess backuptemp

# These are the files to be deleted as mentioned in the WordPress Manual Update link
rm $blog/wp*.php
rm $blog/license.txt $blog/readme.html $blog/xmlrpc.php
rm -rf $blog/wp-admin $blog/wp-includes

# Copy the files to overwrite what we have
# It will leave files alone that are in $blog/wp-content but not in the latest bundle which is what we want
rsync -avz wordpress/ "${blog}/"
cp backuptemp/wp-config.php backuptemp/.htaccess $blog

echo "DONE"

If something goes wrong you have your daily backups to save you (because you are backing things up, aren’t you?). I will write another post shortly showing my WordPress files and database backup script.