A dataset containing example terms that could be extracted by linguists from the medical free-text records. The procedure of extracting medical terms from the real medical data is described in the paper Dobrakowski et al., 2019. Each term has been assigned to at least one category.

terms_categories

Format

A data frame with 19 rows and 2 variables:

term

A medical term

category

A category of the term

References

Dobrakowski, A., A. Mykowiecka, M. Marciniak, W. Jaworski, and P. Biecek 2019. Interpretable Segmentation of Medical Free-Text Records Based on Word Embeddings. arXiv preprint arXiv:1907.04152.