analogy_task.Rd
Solve the term analogy task for given embeddings and pairs of analogies.
analogy_task(term_vectors, n = 1, terms)
term_vectors | A matrix of embeddings of the terms |
---|---|
n | A number of neighbors included in analogies (default: 1) |
terms | A list of two character vectors: the first and the second elements of pairs |
A list of:
An accuracy of the fulfilled analogies
A data frame of the all analogy questions and results
This function is an implementation of the word analogy task introduced by Mikolov et al. (2013)
to validate the quality of word embeddings. For given pairs of terms being in
the same relation (e.g. man - woman) there is formed a list of questions
by taking all two-element ordered subsets of the pairs. If (term1, term2)
and (term3, term4)
are the embeddings of the selected pairs, we expect that term1 - term2
is close to term3 - term4
.
Hence, there is computed the vector e = term1 - term2 + term4
and compared with term3
.
The analogy is fulfilled if term3
is in the n
closest terms to e
.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
#> Error in .subset2(public_bind_env, "initialize")(...): unused arguments (word_vectors_size = 5, vocabulary = list(c("fever", "man", "mother", "cough", "heart", "patient", "thyroid", "eye", "rhinitis", "woman", "father"), c(2, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7), c(2, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7)))terms <- terms_pairs_test[[1]] analogy_task(term_vectors, 1, terms)#> Error in rownames(term_vectors): object 'term_vectors' not foundanalogy_task(term_vectors, 3, terms)#> Error in rownames(term_vectors): object 'term_vectors' not foundanalogy_task(term_vectors, 5, terms)#> Error in rownames(term_vectors): object 'term_vectors' not found