Visualize given embeddings of visits of specified specialties of doctors. The visualization is generated on the 2D plane by t-SNE algorithm or by PCA (two main components) and plotted with ggplot. If t-SNE cannot be generated (because for example perplexity is too large for the number of samples), there are plotted two main components from PCA. Points representing the visits can be colored by doctor ID, the first letter of ICD-10 code or by cluster number. The plot can be optionally saved to the given PDF file.

visualize_visit_embeddings(visits_vectors, visit_table, method = "tsne",
  color_by, spec = NULL, clusters, n_doctors = 5L, n_codes = 9L,
  save = FALSE, path_to_save)

Arguments

visit_table

A data frame with columns:

visit_id

icd10

(Optional, if color_by == "icd10"): ICD-10 code of the visit

doctor_id

(Optional, if color_by == "doctor"): doctor's ID

specialties

Specialties of doctors, separated by ", "

method

One of "tsne" (default) or "pca" - a method of generating the plot

color_by

A string, one of {"doctor", "icd10", "cluster"}

spec

A specialty to be plotted (optional, required if color_by != "clusters")

clusters

An output of the function cluster_visits (optional, required if color_by == "clusters")

n_doctors

A number of doctors to be colored, the rest will be grey (optional, required if color_by == "doctor"), default: 5

n_codes

A number of codes to be colored, the rest will be grey (optional, required if color_by == "icd10"), default: 9

save

A logical indicating if the plot should be saved to the file

path_to_save

An optional string of the path to the target PDF file

visit_vectors

A matrix of embeddings of visits

Value

A generated plot of embeddings.

Examples

inter_term_vectors <- embed_terms(interviews, embedding_size = 10L, term_count_min = 1L)
#> Error in .subset2(public_bind_env, "initialize")(...): unused arguments (word_vectors_size = 10, vocabulary = list(c("fever", "rhinitis", "cough", "eye", "thyroid"), c(3, 3, 4, 4, 6), c(3, 3, 4, 4, 6)))
exam_term_vectors <- embed_terms(examinations, embedding_size = 10L, term_count_min = 1L)
#> Error in .subset2(public_bind_env, "initialize")(...): unused arguments (word_vectors_size = 10, vocabulary = list(c("fever", "man", "mother", "cough", "heart", "patient", "thyroid", "eye", "rhinitis", "woman", "father"), c(2, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7), c(2, 2, 2, 3, 3, 3, 3, 4, 5, 6, 7)))
visits_vectors <- embed_list_visits(interviews, examinations, inter_term_vectors, exam_term_vectors)
#> Error in ncol(term_vectors): object 'inter_term_vectors' not found
visualize_visit_embeddings(visits_vectors, visits, "tsne", color_by = "doctor", spec = "internist", n_doctors = 10)
#> Error in visualize_visit_embeddings(visits_vectors, visits, "tsne", color_by = "doctor", spec = "internist", n_doctors = 10): object 'visits_vectors' not found
visualize_visit_embeddings(visits_vectors, visits, "pca", color_by = "doctor", spec = "internist", n_doctors = 9)
#> Error in visualize_visit_embeddings(visits_vectors, visits, "pca", color_by = "doctor", spec = "internist", n_doctors = 9): object 'visits_vectors' not found
visualize_visit_embeddings(visits_vectors, visits, color_by = "icd10", spec = "internist", n_codes = 4)
#> Error in visualize_visit_embeddings(visits_vectors, visits, color_by = "icd10", spec = "internist", n_codes = 4): object 'visits_vectors' not found
clusters <- cluster_visits(visits_vectors, visits, spec = "internist", cluster_number = 2)
#> Error in rownames(visits_vectors): object 'visits_vectors' not found
visualize_visit_embeddings(visits_vectors, visits, color_by = "cluster", clusters = clusters)
#> Error in visualize_visit_embeddings(visits_vectors, visits, color_by = "cluster", clusters = clusters): object 'visits_vectors' not found