Word Embeddings

Word embeddings were a breakthrough finding in the field of NLP. It meant huge amounts of rich, contextual word relationship information could be contained in a condensed and efficient manner. Since their genesis with Word2Vec, there has been a number of advancements in the field, arriving at BERT today.

Word embeddings are either Skip-Gram (SG) or Continuous Bag of Words (CBOW). The difference between them is illustrated below:

CBOWSG

CBOW's output will be a word prediction based on it's surrounding words. The order of context words does not matter.

SG's output will be a number of predicted surrounding words based on the word.

CBOW is faster while skip-gram is slower but does a better job for infrequent words.

They all use vector cosine similarity to calculate how similar one word is to another.

Word2Vec, Google, Mikolov et al.

Word2Vec was the first popularized neural word embedding method. It takes advantage of only local contexts and is a predictive model. It comes in both CBOW and SG forms.

fastText, Facebook, Bojanowski et al.

fastText builds on Word2Vec by learning the vector representation of each word and the n-grams within each word. The averages of the sub-words are then averaged into one vector at each training step. This allows it to infer the meaning of words not seen in the training vocabularly by breaking them down into sub-words. FastText vectors have been shown to be more accurate than Word2Vec vectors by a number of different measures, but take a lot longer to train.

GloVe, Stanford, Pennington et al.

Global Vectors for Word Representation

GloVe, unlike Word2Vec, is not a predictive model. It leverages the same intuition behind the co-occuring matrix used for distributional embeddings, but uses neural methods to decompose the co-occurrence matrix into more expressive and dense word vectors. It hasn't been proven to outperform Word2Vec, but it is faster to train. Both should be experimented with in any case.

ELMo, AllenNLP, Peters et. al

Embeddings from Language Models

BERT, Google, Devlin et. al.

Bidirectional Encoder Representations from Transformers

Sidebar

General

LUCAS (Backend)

API

Endpoint Definitions

Data Science

ACLSW 2019
Our datasets
Experiment Results
Research Analysis
Hypothesis
Machine Learning
- Naive Bayes
- Logistic Regression
Deep Learning
Paper Section Drafts
Word Embeddings
References/Resources
Correspondence with H. Aghakhani
The Gotcha! Collection

Word Embeddings

Word Embeddings

Word2Vec, Google, Mikolov et al.

fastText, Facebook, Bojanowski et al.

GloVe, Stanford, Pennington et al.

ELMo, AllenNLP, Peters et. al

BERT, Google, Devlin et. al.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sidebar

General

LUCAS (Backend)

API

Data Science

Lucify (Frontend)

Clone this wiki locally