CS224n learning part2
- how do we represent the meaning of a words?
Definition: meaning (webster dictionary)
- the idea that is represented by a word, phrase, etc.
- and so on.
Commonest linguistic way of thinking of meaning: 最常见的意义思维方式
- signifier <====> signified (idea or thing) = denotation
- main idea of word2vecTwo algorithms
1
Predict between every word and its context words.
- Skip-grams (SG)
- Predict context words given target (position independent)
- Continuous Bag of Words (CBOW)
- Predict target word from bag of words context
Two (moderately efficient) training methods
- Hierarchical softmax
- Negative sampling
Details of Word2Vec
Predict surrounding words in a window of radius m of every word.
For p(wt+j|wt)
the simple first formulation is
where o is the outside (or output) word index, c is the center word index, vc and uo are “center” and “outside” vectors of indices c and o.
每个词都可以有两个向量(一个也可以 但是两个更简单)
Sentence embedding
Compute sentence similarity using the inner product.
1 | S1: Mexico wishes to guarantee citizen's safety. |
Use as features for sentence classification. 语意感情分析
From Bag-of-words to Complex models
- Bag-of-words BoW
1
v("natural language processing") = 1/3(v("natural") + v("language") + v("processing"))
- Recurrent neural networks, recursive neural networks, convolutional neural networks…