Monday 9 November 2015

Packer, Vagrant, CentOS, VirtualBox, Docker and so on

At work we're using Docker to easily package our applications into predictable, repeatable bunches, the recipes for which can also easily be pushed into Git, complete with diffs, code reviews, pull requests etc. This is pretty cool and Docker is pretty cool.

Looking deeper into how Docker works I noticed this post called Docker image insecurity. I don't know if the situation is still the same as described there, but it was (and maybe is) pretty bad. However, it seems that in a big company setting, one would not want to rely on public Docker images anyway; creating your own private Docker registry, containing only images properly vetted and verified by your company's security team along with each of their dockerfiles, should still be alright.

Out of interest I put on my techops hat (for the first time in a long time) and started looking at how one might arrive at such a secure Docker image, starting from scratch. As it happens I didn't quite get to the Docker part. Rather, I figured out how to use Packer to download and verify a Linux ISO image, which can then be automatically installed into a virtual machine and used as a base image. This answers a different but potentially related question: given such a known-good complete VM image, smaller Docker images could then be partitioned off on a "list of files" basis, for example using one of the many Docker image creation scripts, or just rolling your own. This would enable the entire chain of software to be specified in the config files stored in your internal repository, easily verified by your security team and easily improved and tweaked by developers and techops.

I think the main reason to use tools like Packer and Docker is that they enable easy automation of otherwise tedious and error-prone installation and creation of base systems, and that they also make this process easy enough to verify and secure, given proper support for checksums etc (which hopefully exists in Docker by now). This should make all our lives easier.

Tuesday 8 September 2015

Recurrent neural networks

Just got back from one of the best meetups I've been to, featuring Andrej Karpathy going through his Recurrent Neural Networks tutorial in detail. Basically, recurrent neural networks are powerful enough that even with a very low-level model, such as a character-based one, an RNN can still learn words, spelling, upper- and lowercase letters, punctuation, line lengths etc stupendously well - and even LaTeX or C code.

Also a good presentation was Semantic Image Segmentation, a different application for recurrent neural networks using a more complicated model.

An interesting takeaway is that when specifying neural network models one desirable feature is differentiability of each transform, which enables the use of straightforward stochastic gradient descent for model fitting. And apparently, even though SGD is a fairly simple technique and thus somewhat too-rough in some respects, there exist fairly simple improvements such as AdaGrad that make it much better. It seems that RNNs in general are capable of being very expressive while also being relatively easy to implement using e.g. AdaGrad. Good stuff all around.

Edited to add: Karpathy's slides.
Edited to add: Romera's slides.

Saturday 15 August 2015

Jaynes: Probability Theory & Gödel's incompleteness theorem

I've recently been dabbling in statistics and probability, and it was only a matter of time before my attention became drawn to the book Probability Theory: The Logic of Science by E. T. Jaynes. In it, Jaynes proposes to start from the smallest possible set of common-sense axioms and proceed to derive more or less the entire theory of probability, demonstrating how desirable properties such as consistency and paradox-free reasoning can thus be achieved for the whole system.

I decided to buy this book, and at 50 pages in (of about 700 pages total) I can already say it will be worth the full price. Here's a quote to show what I mean (chapter 2.6.2, pp. 45-46):

To understand [Gödel's incompleteness theorem], the essential point is the principle of elementary logic that a contradiction A and not-A implies all propositions, true and false. -- Then let A = {A1, A2,..., An} be the system of axioms underlying a mathematical theory and T any proposition, or theorem, deducible from them:

A => T.

Now, whatever T may assert, the fact that T can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction, T could certainly be deduced from them!

This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. --

Recommended.

Monday 13 July 2015

Silly Scala tricks, part 1

I've recently been working (very slowly, but still) on a hobby project that I'm writing with Scala. In the first steps there's some basic object-oriented modeling to be done, which has served as a good introduction to/reminder of how Scala works in that respect and how things are best arranged in it. It's also been interesting to compare and contrast this to Java.

The project is a simple game. There are two pets fighting each other. A pet has six base skills, in three slots of two skills each. Before the game, each player chooses a skill for each slot to use in that game. So you'll choose between skills 1A and 1B for the first slot, skills 2A and 2B for the second slot etc. Most skills will simply deal damage to the other pet; some have cooldowns, damage-over-time effects, healing effects and so on.

I decided to model the game structure as a tree, where a Game has two Pets who each have three Skills, with both the Pets and their Skills linking to each other for convenience. (Not sure if this is the best way to do it, but it'll be good enough.)

So let's see how to model this stuff in Scala. For brevity I'll list just the class definitions, with methods omitted. Start with skills, where the most common type of skill is one that damages the other pet:

abstract class Skill(val pet: Pet, val cooldown: Int = 0)
abstract class DamageOther(val family: Family, val baseDamage: Int, pet: Pet, cooldown: Int = 0) extends Skill(pet, cooldown)
case class Zap(p: Pet) extends DamageOther(Mechanical, 20, p)

Pretty simple and straightforward. Basically, class variables can be defined right there in the "headline" (val means the same as final in Java), and default parameters are supported to reduce hassle.

Now let's see how the same would look in Java:

public abstract class Skill {

        public final Pet pet;
        public final int cooldown;

        public Skill(Pet pet) {
                this(pet, 0);
        }

        public Skill(Pet pet, int cooldown) {
                this.pet = pet;
                this.cooldown = cooldown;
        }
}
public abstract class DamageOther extends Skill {

        public final Family family;
        public final int baseDamage;

        public DamageOther(Family family, int baseDamage, Pet pet) {
                super(pet);
                this.family = family;
                this.baseDamage = baseDamage;
        }

        public DamageOther(Family family, int baseDamage, Pet pet, int cooldown) {
                super(pet, cooldown);
                this.family = family;
                this.baseDamage = baseDamage;
        }
}
public class Zap extends DamageOther {

        public Zap(Pet pet) {
                super(Family.MECHANICAL, 20, pet);
        }
}

Ugh, right? The code is not horrible as such, but it's pretty clunky. The worst offender to me is Zap; with Java, there's just no way to compactly define actual individual things like a Skill in a way that would make you want to list 20 of them in the same data file. This kind of easy "in-program data definition" is just inelegant in Java.

How about the pets themselves? Here we want to do two things: define individual pets which have certain base skills and attributes; and then for a game, pick one of these and select just the this-time skills for it. Let's see this in Java first:

public abstract class Pet {

        public final String name;
        public final Family family;
        public final int baseHealth;
        public final int baseAttack;
        public final int baseSpeed;
        public final List<Skill> baseSkills;
        public final SkillChoice sc1;
        public final SkillChoice sc2;
        public final SkillChoice sc3;
        public final List<Skill> skills;

        /**
        * @param baseSkills In the order S1A, S2A, S3A, S1B, S2B, S3B
        */
        public Pet(String name, Family family, int baseHealth, int baseAttack, int baseSpeed, List<Skill> baseSkills, SkillChoice sc1, SkillChoice sc2, SkillChoice sc3) {
                if(baseSkills == null || baseSkills.size() != 6) {
                        throw new IllegalArgumentException("baseSkills must be non-null and contain exactly 6 things");
                }
                this.name = name;
                this.family = family;
                this.baseHealth = baseHealth;
                this.baseAttack = baseAttack;
                this.baseSpeed = baseSpeed;
                this.baseSkills = Collections.unmodifiableList(baseSkills);
                this.sc1 = sc1;
                this.sc2 = sc2;
                this.sc3 = sc3;
                final List<Skill> s = new ArrayList<>();
                s.add(this.baseSkills.get(this.sc1 == SkillChoice.SC1 ? 0 : 3));
                s.add(this.baseSkills.get(this.sc2 == SkillChoice.SC1 ? 1 : 4));
                s.add(this.baseSkills.get(this.sc3 == SkillChoice.SC1 ? 2 : 5));
                this.skills = Collections.unmodifiableList(s);
        }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + baseAttack;
        result = prime * result + baseHealth;
        result = prime * result
                + ((baseSkills == null) ? 0 : baseSkills.hashCode());
        result = prime * result + baseSpeed;
        result = prime * result + ((family == null) ? 0 : family.hashCode());
        result = prime * result + ((name == null) ? 0 : name.hashCode());
        result = prime * result + ((sc1 == null) ? 0 : sc1.hashCode());
        result = prime * result + ((sc2 == null) ? 0 : sc2.hashCode());
        result = prime * result + ((sc3 == null) ? 0 : sc3.hashCode());
        result = prime * result + ((skills == null) ? 0 : skills.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Pet other = (Pet) obj;
        if (baseAttack != other.baseAttack)
            return false;
        if (baseHealth != other.baseHealth)
            return false;
        if (baseSkills == null) {
            if (other.baseSkills != null)
                return false;
        } else if (!baseSkills.equals(other.baseSkills))
            return false;
        if (baseSpeed != other.baseSpeed)
            return false;
        if (family != other.family)
            return false;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        if (sc1 != other.sc1)
            return false;
        if (sc2 != other.sc2)
            return false;
        if (sc3 != other.sc3)
            return false;
        if (skills == null) {
            if (other.skills != null)
                return false;
        } else if (!skills.equals(other.skills))
            return false;
        return true;
    }
}

Plenty of boilerplate, as always, but it's understandable enough.

Now in case you haven't noticed, I like things being immutable when they don't need to be mutable - for instance, the base skills and chosen skills for the pets just don't need to change over the course of the game. So for the actual pets, what I'd really like to do is something like the following:

public class LilXT extends Pet {

        public LilXT(SkillChoice c1, SkillChoice c2, SkillChoice c3) {
                super("Lil' XT", Family.MECHANICAL, 1546, 322, 228,
                        listOf(new Zap(this) // error: Cannot refer to 'this' nor 'super' while explicitly invoking a constructor 
                                // , other skills...
                                ),
                                c1, c2, c3);
        }

        private static List<Skill> listOf(final Skill... skills) {
            final List<Skill> l = new ArrayList<>();
            for(Skill s: skills) {
                l.add(s);
            }
            return l;
        }
}

But of course that cannot work, since we can't both refer to this and also be constructing it at the same time. So we're forced to do a two-part construction instead, where we first set everything else, then create the skills and link them up with this, then set this.skills. Meh. This, again, is not the end of the world - it works, but it is a bit clunky. (What happens if someone calls setSkills() a second time? You'll have to remember to check for that, which adds more boilerplate.)

Can we do better? Actually, with Scala, we kinda can. I'm not sure if the following is the best or most sane way of doind things, but I found it pretty cool.

In Scala you can override not just methods, but values. And I love that. So I figured I could define an abstract Pet's base skills first as null, and override that in the actual implementing subclasses. This way each actual pet is very clean to construct:

abstract class Pet(val name: String, val family: Family, val baseHealth: Int, val baseAttack: Int, val baseSpeed: Int, val sc1: SkillChoice, val sc2: SkillChoice, val sc3: SkillChoice) {
  val baseSkills: List[Skill]
  lazy val skills: List[Skill] = {
    val s = baseSkills
    List(
      (s(0), s(3), sc1),
      (s(1), s(4), sc2),
      (s(2), s(5), sc3)
    ) map { case (a,b,c) => if(c == C1) a else b }
  }
}
case class LilXT(s1: SkillChoice, s2: SkillChoice, s3: SkillChoice) extends Pet("Lil' XT", Mechanical, 1546, 322, 228, s1, s2, s3) {
    override val baseSkills = List(Zap(this), Repair(this), XE321Boombot(this), Thrash(this), Heartbroken(this), TympanicTantrum(this)
  }

So what's going on here? To clarify, let's follow what happens when a new LilXT is constructed:
  1. I've decided I want to use a Lil' XT as my pet. So, to construct a Lil' XT instance, I decide which of the two skills I want for each slot and pass those to the constructor, as in LilXT(1,2,1).

  2. The constructor for LilXT calls the Pet superclass constructor, with the hardcoded arguments "Lil' XT" (name), Mechanical (family), and the appropriate stats; and with the three SkillChoices I just gave to LilXT. The Pet is constructed with those arguments.

  3. Now the Pet's baseSkills are null at this point, so trying to figure out the SkillChoice stuff directly at construction time would cause a null pointer exception. This is the same problem as before, where we have a chicken-and-egg dependency in the constructor.

    So here in Scala, what I did is I made skills a lazy val; this means it's not resolved immediately, but only when needed. So the fancy map computation thing isn't actually executed yet, it's just "remembered".

  4. The rest of the stuff in the LilXT class definition is run. This will override the baseSkills value with the default base skills that a LilXT has, which are constructed at this time, with the pointer back to this.

  5. The skills variable of the LilXT object is now ready to be accessed; the first time it's accessed, the defining code is run, the skills variable is populated based on the baseSkills and the skillChoices, and things work.
This is pretty alright. Everything is a val, all the lists are immutable, and stuff works. The definitions of the actual concrete things are concise and clear, and overall there's much less pointless busywork code than in Java.

I like Scala.