Natural Language Processing: The Age of Transformers

baalimago · on Aug 30, 2019

>Next we shall take a moment to remember the fallen heros, without whom we would not be where we are today. I am, of course, referring to the RNNs - Recurrent Neural Networks, a concept that became almost synonymous with NLP in the deep learning field.

XLNet (https://arxiv.org/abs/1906.08237) is in essence a recurrent neural network, using a transformer (which is based on neural networks) which recurrently keeps context between different batches. But the gated RNN's, such as AWD-LSTM/GRU, are fading out to the superior transformer architectures, this is true.

That's my only complain though, excellent theoretical introduction.

Although, if anyone wanted to actually implement a transformer, be ware that you want to have a 8+ GB GPU unit available, or be prepared to use cloud computing (Google Colab is free, for now). Training neural networks is quite hardware dependent still.

karambahh · on Aug 30, 2019

Scaleway (where the author of this post works, as I do) is a cloud service provider with a pretty interesting GPU instance: Nvidia P100 16-GB NVIDIA Tesla P100 at 1€ per hour

bitL · on Aug 30, 2019

RNNs are still useful in actual time-dependent sequences like activity detection, self-driving car steering etc. though even those are getting enhanced by using attention; use of RNNs in NLP was more of a necessity as there were no other Deep Learning models capable of delivering some results on arguably sequential nature of NLP (let's say that is a quite imperfect assumption). As attention allows viewing the whole input at once, it's easier for non-linear optimizer to set meaningful weights without getting into recursion, though that comes at massive memory cost (i.e. forget about using 2080Ti for NLP).

abhgh · on Aug 30, 2019

I was going to mention XLNet before I saw your comment.

Also, a recent piece of interesting work [1] shows that with the right control parameters, you could still use gated RNNs, like LSTMs, for pretty good language modeling.

[1] http://www.abigailsee.com/2019/08/13/what-makes-a-good-conve...

sgt101 · on Aug 30, 2019

If anyone wants to use these tools practically I urge you to have a good look at this paper : https://www.aclweb.org/anthology/P19-1439/

My take away - pretraining achieves excellent paper results but robust application is hard. There is still quite a way to go down this road for fault intolerant users and applications.

Jack000 · on Aug 30, 2019

I'm not an expert, but after playing with some pre-trained transformers I think they are mostly good at the exact thing they're trained for. eg. GPT-2 is great for text generation, but if you try to use it for say translation, it will tend to add imagined details not in the source text. Similarly, BERT is great at sequence and token-level classification but quite bad at text generation.

bra-ket · on Aug 30, 2019

Previous discussion on “Attention is all you need”: https://news.ycombinator.com/item?id=15938082

Recent work by Jakob and team:https://ai.google/research/people/author37567/

Image transformer is particularly interesting

cjauvin · on Aug 30, 2019

Also, recent discussion about a very good intro article about Transformers: https://news.ycombinator.com/item?id=20773992

kuu · on Aug 30, 2019

I was about to share it too, really GREAT intro :)

macando · on Aug 30, 2019

"This spring, the Elon-Musk-founded AI research lab OpenAI made a splash with an AI system that generates text. It can write convincing fake reviews, fake news articles, and even poetry."

You can check it out for yourself. https://talktotransformer.com/ Pretty awesome.

3pt14159 · on Aug 30, 2019

For the lazy:

<human>The bridge wasn't quite right. There were small sparks coming off the incomplete sections of material hanging just to the right of the traffic lights. Gary wasn't worried, however. He walked up carefully.</human><computer> "This is it, guys!"

A huge explosion rocked the bridge, and a fireball engulfed the entire bridge. "Shit! Shit!"

It seemed as though the fire was just blowing out, and the bridges were back in place. Gary was a little concerned, but it wasn't a problem at the time. At least it was under control.

He walked forward, peering over the top. "This is terrible. I'm not leaving the village tonight! How about a break for some water and a drink?"

When he looked down, the large chunk he had just taken off of the bridge was gone. As soon as he realized it was gone, he turned back to the village to check to see what had happened. Not sure what to do, he continued down the hill, heading toward the village at all times. The entire town was completely surrounded, and everyone was either inside or had</computer>

Not bad, but it's strange that it autogenerated text with multiple spaces at the end of sentences. Also, it is far more dramatic than I would have guessed.

noobiemcfoob · on Aug 30, 2019

It's not just "not bad." It's scary. It makes enough coherent sense from sentence to sentence that I doubt my mom (missing a few marbles) would notice it doesn't make a whole lot of sense. Combine this...some official looking logos and a request for money for a fine.

Determining an actual official government or other institutional request is going to get much harder.

sean2 · on Aug 31, 2019

I don't see the danger here though; how is this scarier than what a couple guys in Nigeria could concoct to fool your mom? Any English speaker can still put together a much more coherent an official sounding institutional request.

ReDeiPirati · on Aug 30, 2019

2 researchers from the Brown university were able to reproduce the full, and never released GPT-2 model a couple of days ago. They wrote an amazing blog post and released the weights!! Here's the blog with all the details: https://medium.com/@vanya_cohen/opengpt-2-we-replicated-gpt-...

For those of you who don't know what GPT-2 is, here's the simplest & fastest way with a bit of humor: https://blog.floydhub.com/gpt2/

p1esk · on Aug 30, 2019

They spent $500k doing it (in research credits for compute). Not sure if this is the best way to spend so much money, considering they made no novel contribution to NLP field, and OpenAI would have most likely released it soon anyway.

julien_c · on Aug 30, 2019

Also https://transformer.huggingface.co

Disclaimer: built it.

p1esk · on Aug 30, 2019

What was most challenging about building it?

hint23 · on Aug 30, 2019

You can also look at http://textsynth.org .

mark_l_watson · on Aug 30, 2019

Good explanation of transformers, and the history leading up to them. I look forward to the next installment covering BERT.

As someone who spent a lot of time trying to manually code up solutions to anaphora resolution (pronoun coreference), BERT seemed like a small miracle to me. As a side comment: I love that getting training data for BERT is so cheap: any text source, and randomly remove words, target output is predicting the words removed.

ArtWomb · on Aug 30, 2019

Conversational AI is much closer than we think. Neural sequence-to-sequence models are successful in domain specific domains. But in the context of chit-chat based dialogue systems, the responses lack humanity. Undoubtedly due to the fact they don't comprehend our world. Transfer learning alleviates some of that awkwardness.

If anyone's interested in further experiments on their own. There is now a unified Python framework for dialogue models ;)

https://parl.ai/about/

abhishek0318 · on Aug 30, 2019

Really? Most of the chatbots in production are rule based systems.

blurbleblurble · on Sept 2, 2019

The transformer can also be used at different levels of abstraction, e.g. to do interesting stuff with knowledge graphs. I think the transformer architecture is about to make things very, very interesting.

https://arxiv.org/pdf/1904.02342.pdf

p1esk · on Aug 30, 2019

We don’t even know how many breakthroughs it will take to get to the level of 3yo’s conversational abilities. And I’m talking major level breakthroughs, like LSTM or Transformer, probably trained on much more data.

fellahst · on Aug 30, 2019

Great blog!