An example dataset containing pairs of analogies of medical terms to evaluate the quality of embedddings of terms by functions analogy_task and synonym_task.

terms_pairs_test

Format

A list with elements:

spec

Terms with relation: specialty -- body part (e.g. "cardiologist" -- "heart"); a list of two character vectors of length 2

person

Terms with relation: man -- woman; a list of two character vectors of length 2

synonym

Synonym terms; a list of two character vector of length 1

Details

Assessing the quality of real-data embeddings was performed by 7 types of analogies, described in the paper Dobrakowski et al., 2019.

References

Dobrakowski, A., A. Mykowiecka, M. Marciniak, W. Jaworski, and P. Biecek 2019. Interpretable Segmentation of Medical Free-Text Records Based on Word Embeddings. arXiv preprint arXiv:1907.04152.