CS224n learning part2

  1. how do we represent the meaning of a words?
    Definition: meaning (webster dictionary)
  • the idea that is represented by a word, phrase, etc.
  • and so on.

Commonest linguistic way of thinking of meaning: 最常见的意义思维方式

  • signifier <====> signified (idea or thing) = denotation
  1. main idea of word2vec
    1
    Predict between every word and its context words.
    Two algorithms
  • Skip-grams (SG)
    • Predict context words given target (position independent)
  • Continuous Bag of Words (CBOW)
    • Predict target word from bag of words context

Two (moderately efficient) training methods

  • Hierarchical softmax
  • Negative sampling

Skip-grams Prediction

Details of Word2Vec
Predict surrounding words in a window of radius m of every word.
For p(wt+j|wt) the simple first formulation is

where o is the outside (or output) word index, c is the center word index, vc and uo are “center” and “outside” vectors of indices c and o.
每个词都可以有两个向量(一个也可以 但是两个更简单)

Sentence embedding
Compute sentence similarity using the inner product.

1
2
3
4
5
6
7
S1: Mexico wishes to guarantee citizen's safety.
S2: Mexico wishes to avoid more violence.
Score: 4/5

S1: Iranians Vote in Presidential Election
S2: Keita Wins Mali Presidential Election
Score: 0.4/5

Use as features for sentence classification. 语意感情分析

From Bag-of-words to Complex models

  • Bag-of-words BoW
    1
    v("natural language processing") = 1/3(v("natural") + v("language") + v("processing"))
  • Recurrent neural networks, recursive neural networks, convolutional neural networks…
作者

Felix Chen

发布于

2022-01-17

更新于

2022-01-19

许可协议

评论