Solve the term analogy task for given embeddings and pairs of analogies.

analogy_task(term_vectors, n = 1, terms)

Arguments

term_vectors

A matrix of embeddings of the terms

n

A number of neighbors included in analogies (default: 1)

terms

A list of two character vectors: the first and the second elements of pairs

Value

A list of:

accuracy

An accuracy of the fulfilled analogies

questions

A data frame of the all analogy questions and results

Details

This function is an implementation of the word analogy task introduced by Mikolov et al. (2013) to validate the quality of word embeddings. For given pairs of terms being in the same relation (e.g. man - woman) there is formed a list of questions by taking all two-element ordered subsets of the pairs. If (term1, term2) and (term3, term4) are the embeddings of the selected pairs, we expect that term1 - term2 is close to term3 - term4. Hence, there is computed the vector e = term1 - term2 + term4 and compared with term3. The analogy is fulfilled if term3 is in the n closest terms to e.

References

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

See also

Examples

term_vectors <- embed_terms(examinations, embedding_size = 5L, term_count_min = 1L)
#> Error in .subset2(public_bind_env, "initialize")(...): unused arguments (word_vectors_size = 5, vocabulary = list(c("fever", "man", "mother", "cough", "heart", "patient", "thyroid", "eye", "rhinitis", "woman", "father"), c(2, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7), c(2, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7)))
terms <- terms_pairs_test[[1]] analogy_task(term_vectors, 1, terms)
#> Error in rownames(term_vectors): object 'term_vectors' not found
analogy_task(term_vectors, 3, terms)
#> Error in rownames(term_vectors): object 'term_vectors' not found
analogy_task(term_vectors, 5, terms)
#> Error in rownames(term_vectors): object 'term_vectors' not found