Tuesday, 8 September 2015

Recurrent neural networks

Just got back from one of the best meetups I've been to, featuring Andrej Karpathy going through his Recurrent Neural Networks tutorial in detail. Basically, recurrent neural networks are powerful enough that even with a very low-level model, such as a character-based one, an RNN can still learn words, spelling, upper- and lowercase letters, punctuation, line lengths etc stupendously well - and even LaTeX or C code.

Also a good presentation was Semantic Image Segmentation, a different application for recurrent neural networks using a more complicated model.

An interesting takeaway is that when specifying neural network models one desirable feature is differentiability of each transform, which enables the use of straightforward stochastic gradient descent for model fitting. And apparently, even though SGD is a fairly simple technique and thus somewhat too-rough in some respects, there exist fairly simple improvements such as AdaGrad that make it much better. It seems that RNNs in general are capable of being very expressive while also being relatively easy to implement using e.g. AdaGrad. Good stuff all around.

Edited to add: Karpathy's slides.
Edited to add: Romera's slides.