Tagedit.el for nxml-mode

If you use EMACS and have used lisp, you may have heard of paredit and smartparens. They allow you to operate on the Abstract Syntax Tree directly which can require a bit of a mind shift to get used to. This has been said: “If you think paredit is not for you then you need to become the kind of person that paredit is for.”

Check out this segment of a talk with Magnar Sveen, one of my biggest EMACS inspirations, discuss paredit. Here is Magnar showing off his use of paredit.

If you have used or heard of paredit, then you may have also heard about tagedit. It’s basically bringing some paredit features to html editing. I’ve been using it for a while and it’s both a pleasure to use and a huge time saver.

For a while it has been bothering me that I can’t use those awesome features when working on XML. I felt there is just no reason why I should get to enjoy that in html-mode but not in nxml-mode. nXML is the standard mode for xml in EMACS. I use it heavily at work for editing XSLT files.

This past weekend I wrote tagedit-nxml.el, a small package that makes tagedit compatible with nxml-mode. The “problem” was that tagedit was made with html-mode in mind, which derives from sgml-mode and uses sgml-mode functions to traverse the document. nxml-mode, however, is not derived from sgml-mode, but from text-mode, and traversing the document just doesn’t work the same way. Luckily, most of the functions I needed to modify were made available by tagedit.el to override. After showing the package to Magnar, the author of tagedit, he quickly provided function overrides that I needed to avoid having to use defadvice (functions like forward-list and backward-sexp). I can’t wait to start using it at work. This was a lot of fun and I learnd a lot of awesome elisp features.

XSLT dependency viewer for EMACS

I’ve written an XSLT dependency viewer for EMACS. It’s very similar to the package found here http://www.thoughtcrime.us/software/xslt-dependency-graph/. However, that library is for XLST 2.0 while I have to use XSLT 1.0 at work.

The parsing of the files to traverse the import/includes is done in EMACS lisp, which generates a dot diagram. That is then piped into the graphviz dot data visualization program and opened in your favorite PDF viewer. Graphviz is like LaTeX but for generating graphs of all kinds. Check out this graphviz dot guide that will give you an idea what it is capable of. Pretty powerful stuff.

First Pull Request

I have just made my first pull request on github. https://github.com/magnars/expand-region.el/pull/148

My contribution was to Magnar Sveen’s awesome expand-region project. The fix was for nxml-mode. Expand region inside an xml attribute was including the outer quotes first before first expanding to just the inner quotes. It was also not properly expanding to the attribute when there are namespaces in the attribute. This fix amends that.

Magnar messaged me that expand-region is headed for the emacs core. Awesome! All contributors need to sign the Free Software Foundation copyright papers. See https://gnu.org/licenses/why-assign for reasons. I went ahead and emailed assign@gnu.org and signed away my copyright on this piece of code.

I’m pretty excited to see this go through, because not everyone’s first pull request ever incidentally also makes it into a major FSF project, let alone into EMACS core!

etags-update-mode

Just a few days ago I wrote my first EMACS minor-mode, called etags-update-mode. It updates your TAGS file on save. It’s heavily inspired by another package/minor mode with the same name by Matt Keller.

In order to update the tags for a file on save, Matt’s etags-update-mode calls a perl file to delete any previous tags for a specific file in a TAGS file before it appends the new definitions in the file. Also, with that package the minor mode is defined as a global minor mode.

I wanted the functionality that the package provided, but I didn’t want it to be a global minor mode (the only global minor mode that I’ve used that I’m aware of and that I like having everywhere is YaSnippet). I also didn’t see why there should be a reliance on perl. I wanted to do it all in elisp.

So I wrote a much simpler version of etags-update-mode that is a regular minor mode and does all it’s work in EMACS. I’ll be updating it as I continue to use it.

EMACS etags

EMACS has an etags.el package that supports use of etags, the EMACS version of ctags. It tags your source code so you can jump directly to the source for a function, variable, or other symbol. I’ve been using it heavily with C++ and C# (though for C++, I’ve supplanted it with GNU Global, and there is an EMACS package for that too, ggtags).

I wanted the same functionality for xslt, which I use heavily at work. Luckily exuberant-ctags and etags both provide support for extending support to other languages, by supplying regular expressions.

I put the following regular expressions in ~/.ctags:

--langdef=xslt
--langmap=xslt:.xsl
--regex-xslt=/<xsl:template name="([^"]*)"/1/
--regex-xslt=/<xsl:template match="[^"]*"[ \t\n]+mode="([^"]*)"/1/
--regex-xslt=/<xsl:variable name="([^"]+)"/1/

… and generate the TAGS file

ctags -e -o TAGS *.xsl

I can now jump to the definition of any variable or template in my xsl files!

Learning LaTeX and the result compared to Word

LaTeX is a high quality typesetting system. It is open source and more important it is Free Software. This past weekend I decided to learn some LaTeX as it has been an interest in the back of my mind for some time. At first I was hesitant because LaTeX was made for Mathematical and Scientific papers, which I don’t write. The impetus for Donald Knuth, the author of the underlying language (TeX), was that there weren’t good tools for the documentation and display of Mathematical formulas. However, my concern was misguided. LaTeX can make any document look beautiful, and it can be used for any kind of article, book, or even a resume. What sets it apart from WYSIWYG editors, like Microsoft Word, is the sheer typographical quality of the resulting documents you can produce. LaTeX algorithms under the hood calculate everything from page line height to word and letter spacing. They can be adjusted as you like and many packages exist that make the process easier.

My mother has produced a book recently using Microsoft Word, and it was no easy task. The index-making progress is difficult, headers inexplicably stop showing up correctly, page numbers stop respecting section boundaries, and blank pages pop up everywhere in the PDF result. Furthermore, the WYSIWYG nature of Word encourages you to manually edit spacing issues with the wrong tools, and if you are picking up from where someone else left off, then good luck to you reformatting everything. Even after reformatting, if text is changed and pages shift, you have to redo your work. I wanted to convince myself and my mother that the book can be produced with LaTeX in one or two nights, and look better than its Word counterpart. I was able to do it in just one afternoon.

First I found that LaTeX supports a book document type. You declare it in the first line of your document. However, after adding more pieces to the puzzle, I learned about the Koma-Script package which provides a drop-in replacement for the book (and article & report) class, packed with some additional goodies. There is also the memoir class, which was an interesting alternative. I loaded its book class with the scrbook class.

\documentclass[12pt,letterpaper]{scrbook}

I found that I did not even need to install anything as it was already installed in the LaTeX package bundled with Ubuntu. I haven’t tried it yet, but I’ve read that MiKTeX is a good package to get started with LaTeX on Windows and it makes it easy to use this and other useful packages. I plan on getting my mother to try MiKTeX once I show her my LaTeX version of the book (which undeniably looks better than the Word version). However, for this proof of concept, I didn’t use any editors specially designed for LaTeX, as I of course was working in EMACS. EMACS has a LaTeX mode with useful key bindings and syntax highlighting. I immediately got started copying all the chapter titles like this:

\chapter[Optional short name for the TOC]{My Very Long Chapter Name Here}

I did not have to wrap pagraphs in any tags as you simply skip a line to indicate that it is a new paragraph.

This book had many quotations and blockquotes, and many of them were formatted improprly in Word. Word doesn’t make that easy. I didn’t have to worry about any of that, as in LaTeX I am only semantically tagging them, not styling them. Styling comes later, when you’re done tagging, though I found that even the default styling was impressive. Here’s what the markup looks like:

\begin{quote}
All that is gold does not glitter,
Not all those who wander are lost;
The old that is strong does not wither,
Deep roots are not reached by the frost.

From the ashes a fire shall be woken,
A light from the shadows shall spring;
Renewed shall be blade that was broken,
The crownless again shall be king.
\end{quote}

Koma-Script’s scrbook gives useful variations on subsections, like addsec* and minisec, for example. The * is a modifier that keeps it from appearing in the Table of Contents (TOC).

\minisec*{My mini subsection name}
Blah Blah

Creating the index was refreshingly sane. I simply went into the points of interset in the text and dropped \index{key} tags and I was done. Once I did that, text can be added or removed and pages can shift, but we have no additional work to do as it’s all recalculated for you. All pages with the same key get pointed to in the index under the same entry. Sequential page ranges get smartly hyphenated. Footnotes were just as easy. For this book, footnotes were not used, but instead, endnotes. I googled for endnotes and found that there was a package for it already. Once again, I did not even have to download it, as it was already included in my LaTeX package. I wanted the endnote numerbers to reset every chapter, as it is in the Word version, and there’s a package for that too.

\usepackage{endnotes,chngcntr}
\counterwithin*{endnote}{chapter}  % Reset endnote numbering every new chapter

This is a brief overview of some of the tags that I used that I hope highlight how easy this was to do. Then I generated the document directly to PDF. Without even thinking about styling yet, the document that was produced was a typograhically stunning work. With a couple of easy tweaks, I purposely made it look closer to the Word document for comparison purposes, to highlight the superiority of the type produced by LaTeX. Unfortunately I can’t produce the “final” proof of concept here, as it is an entire book and I don’t hold rights to it. It would not be entirely fair for me to omit that there is a learning curve with LaTeX, of course. However, I hope that this helps anyone just starting or curious about learning LaTeX.

First try at Data Munging

I’ve been taking Udacity course Exploratory Data Analysis and decided that I wanted to try my hand at a real data set that I cared about. I ran into several obstacles that are probably common and I hope that this will help someone else.

The data I cared about was in SQL Server so first I got the data out:

bcp "select .. from .. where .." queryout data.dat -c -t"||||" -S server -U user -P pass

I chose “||||” as my delimiter because I was fairly sure that no value had four pipe characters. It’s much easier to search the file for a good delimiter once it’s in a text file. Once the data was out, I searched through the file data.dat and found that there were no asterisks in the entire file so I replaced all “||||” with “*” as my delimiters.

sed -i 's/||||/*/g' data.dat

I tried to load this into R with mydata <- read.csv("data.dat", sep="*") but ran into a problem:

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
line 2 appears to contain embedded nulls

I eventually realized that anything which was either NULL or an empty string in the SQL Server database comes out as 0x00, a binary null character. EMACS represents the binary null as ^@. I replaced these binary marks with ‘NA’ in EMACS with M-x replace-string ENT ^@ ENT NA ENT. As a side note, you can position the cursor on a symbol you want to know about and do M-x describe-char, it will tell you a lot of information about it. Another way to replace the symbol if you haven’t experienced the life and file altering wonders of EMACS is

sed -i 's/\x0/NA/g' data.dat

Now I tried read.csv and it seemed to work without errors, but I noticed that the number of ‘observations’ that R thinks are in the file (dim(mydata)) is not the same as the number of lines in the file, so I knew something was wrong. To see the number of lines in a file you can do wc -l output.dat in the terminal.

It took me quite some time to figure it out. The following finally worked correctly:

mydata <- read.table("data.dat", na.strings=c("", "NA"), sep="*", comment.char="", quote="")

?read.csv reveals that it actually calls read.table internally and makes some assumptions for you. One of those assumptions is sep="," but we specified that. The ones that got me were comment.char and quote. Actually, read.csv assumes that comment.char is "" which disables commenting altogether, which is good (for my data), but read.table sets it to "#". Additionally, read.csv sets quote="\"" by default. Initially after using read.table rather than read.csv, I started getting these types of errors:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 9237 did not have 8 elements

I checked the line it complained about but it had 8 elements. I know that sometimes errors happen earlier than where the error message indicates. For a sanity check, I wrote this quick little diddy in Python to check the element count on each line:

#!/usr/bin/env python

linenum = 0
badlines = []

with open('data.dat', 'r') as orders:
    for line in orders.readlines():
        linenum = linenum + 1
        count = line.split('*');
        if not len(count) == 8:
            badlines.append(linenum)

print badlines

However, this came back with an empty array so I knew that there was something else going on. Once I took a closer look at the documentation though, and set quote="", disabling quotes altogether, I finally had no errors, and had the correct number of observations.

Also, while in the help page for read.table/read.csv, I found that na.strings was helpful to tell R to interpret blank fields as NA. By setting na.strings=c("", "NA"), we're telling R to interpret both "" and "NA" as NA.

There's more data manipulation I may need to do but for now I can finally start looking at the data.

My Mistakes

TL;DR:

  • Being afraid to lose
  • Playing “hope chess”
  • Spending incredible amounts of time “studying” but not mentally exerting myself
  • Fixating on openings and trying to stupidly memorize chess
  • Relying too heavily on computer analysis
  • Not paying attention to the meta-learning process

I learned chess at around age 11 or 12 and soon after joined Polgar Chess Club which was in Queens, NY at the time. I was immediately one of the best kids there and regularly won almost all the scholastic tournaments.  I became afraid to lose and whenever the position got tough, even against players I’d be expected to lose against, I would often start shaking. I would wish that someone would have taught me to learn to invest in loss, a concept typically learned from the domain of martial arts, but of course applicable to any art and any learning. I remember sometimes seeing Fabiano Caruana come to the club and play when he was only about 6 years old. I didn’t know back then that today he would be one of the top 5 highest rated players in the world. It is not surprising to me now, as I recall that I never saw him cry or shiver at the chessboard when he was losing. Somehow at a very young age he had the right mindset.

Throughout my chess development, I also would often play “hope chess.” In the linked article, Dan Heisman breaks down what I call “hope chess” into more specific subcategories (one of which he calls “hope chess” so hopefully it is not confusing), but I am referring to it in a more general manner. Sometimes I would sit and concentrate at the board for a long time, but it is hard to call what I was doing “thinking”—rather it was just worrying and hoping. That is either hoping that my opponent would make a mistake and fall for a trap, or hoping that the move I played would be good despite that I didn’t think hard for my opponent.

As a kid I also recall that I had (in my opinion) the work ethic of a champion. I had incredible discipline and would spend hours going through chess positions in my books. Despite this effort, I never achieved the type of success I thought I should have. I was at some point in the top 100 list for my age (though nowhere near the top of that list), but soon after hitting my first real slump, I lost interest in chess entirely (due to depression) and stopped playing for some years.

Some time later, I had a resurgence in my efforts and started to put in what felt like a “last ditch effort.” I began to “study” chess again, not realizing that I was going to repeat my past mistakes. I recalled that once, my childhood friend, Lev Milman, who is now an International Master, made significant improvements studying on his own. When I had asked him what his method was, he said that he would “Fritz everything.” Fritz is a computer chess engine of at least Grandmaster strength. I took his advice a bit too literally. I played many blitz games online and “analyzed” all of them with whatever the top engines were at the time. I developed an impressively sized database with all of my so-called analyses, covering a tremendous breadth of opening and middle game positions that I was likely to run into or had run into in my games.

I developed some good techniques of using chess engines to aid in analysis and understand of a position, and even a sophisticated understanding of which engines to use for which positions. Some of those skills are useful. However, in the end, I made no noticeable progress to show for my efforts and I once again dropped chess almost entirely for a few more years. I can see now that I had tried too hard (once again) to use my work ethic to improve, and bypassed the type of real mental effort that is required to do real learning. Chess is a complicated game that can’t be memorized and brute forced. This is obvious from an objective standpoint, but I see many people falling into this same trap of relying on sheer effort and will. It is from Tim Ferris that I learned of the Pareto Principle: for most events, roughly 80% of the effects come from 20% of the causes. Sometimes that is more like 99% to 1%.

I’ve made many mistakes in my learning process. Some of the worst ones are psychological. The biggest mistake is not so much particular to chess, but to learning in general. It’s important when you’re trying to improve at anything, to be critical of the process itself. That is everything from the plan you set out for yourself to the actual mental processes that you are going through in executing that plan. Just as important is to be keenly aware of your emotions. It is sometimes, as it was with me, just when you are making breakthroughs that you feel like giving up, and sometimes when you feel very confident that you make your biggest mistakes.

I’ve outlined here some of the mistakes I’ve made and the utter failures that have led me to quit chess twice now. In future posts I want to outline and dive into their opposites—positive changes that I’ve made which are contributing to my improvement.

Basic Mistakes To Avoid: Improve Your Chess Significantly

I posted the following answer on Quora quite a while ago. It was my first post on Quora (one of my only posts there so far), just to see how it works, but the response turned out to be popular, so I figured — what better first real post than a test-proven success?

Here’s that post:

  1. An amateur’s chess game is most improved by avoiding cheap tactics. Amateurs often spend too much time on ‘strategy,’ trying to look at the positional pros and cons of hypothetical situations 10 moves ahead, only to be forked by a pawn on the next move.
  2. Amateurs make the mistake of spending too much time studying openings. It is possible to become a 2200 player on almost no opening knowledge, by improving your tactics.
  3. A common mistake, not just for amateurs, is to avoid thinking about certain moves as if thinking about them meant playing them. For example, a person will often avoid considering a Rook sacrifice for a pawn, because their brain immediately assigns a negative feeling to such a move (fear of losing material). Awareness of that will give yourself permission to contemplate the said brilliant Rook sacrifice.
  4. Having said all this, when no tactics are in sight, some general guidelines are available to guide your play. They are to
  • control the center (not by occupying those squares, but by having pieces attack them. For example, a knight on f3 controls the center squares d4 and e5, while a knight jumping to e5 releases control of those squares).
  • avoid having any hanging/unprotected pieces or fortify those said pieces.
  • avoid letting your opponent occupy any advanced posts for too long.
  • double check your move for any tactical errors before it is played.

Looking back at this post, I would modify the word “tactics” in point #2 with “chess vision,” because tactics, in the sense that it is usually defined, is only one (albeit important) piece of the puzzle that must be developed when improving chess vision. I want to go into this in more detail in future posts, because that difference is pivotal in understanding how to study chess. Also, the list here is rather simplistic and misses many other important parts of the thought process, but I’ll be sure to get into that in future posts!

musings on programming, chess, martial arts, and other interests