This vignette shows the usage of the tidycharts package. It contains different chart types examples and tips for proper data visualization.

Prerequisites

This package and vignette are created for a user who:

has basic programming skills in R
is familiar with R data structures, such as vector, data.frame

How to download and install the package?


# install from CRAN
# install.packages(tidycharts)

library(tidycharts)

Bar charts

Data

Bar charts shoud be used to show structure in one moment of time. One of typical usecase of the barchart is to visualize profit of a company in a division by departments. The data structure could look the following:

data_barchart <- data.frame(
  dep = c('Services', 'Production', 'Marketing', 'Purchasing'),
  profit = c(17, 15, 2, -3),
  operational = c(9, 7, 1.5, -0.4),
  property = c(4, 4, 0.5, -0.6),
  bonus = c(4, 4, 0, -2),
  prev_year = c(10, 16, 4, -1),
  plan = c(11, 13, 2, -2.5)
)

In the example data operational, property and bonus are parts of profit and sum up to it.

Basic

Creation of the barchart is simple. We use barchart_plot function to do that. After calling the function chart will be automatically printed. It can also be assigned to a variable as one element character vector with SVG content.

bar_chart(data = data_barchart, cat = 'dep', series = 'profit')

A plot should contain an informative title. We can use add_title function to make one. We can chain the commands by pipe operator (%>%).

bar_chart(data = data_barchart, cat = 'dep', series = 'profit') %>% 
  add_title('The company XYZ', 'Profit', 'in mEUR', 'by departments, 2020')

We can show the structure of each value by specyfing different series argument. It can be a vector of column names. It that case, stacked barchart will be generated.

bar_chart(data = data_barchart, cat = 'dep', 
          series = c('operational', 'property', 'bonus'),
          series_labels = c('op', 'prop', 'bon')) %>%  
  add_title('The company XYZ', 'Profit', 'in mEUR',
            'by departments and profit type, 2020')

Normalized

Normalized barplot should be used to show the proportions of parts in each category. Typical intention of using this kind of plot could be to visualize the percentage structure of profit among different departments in a company.

bar_chart_normalized(data = data_barchart, cat = 'dep',
                     series = c('operational', 'property', 'bonus'),
                     series_labels = c('op', 'porp', 'bon')) %>% 
  add_title('The company XYZ', 'Profit', 'in mEUR', 'normalized in department')

Referenced

We use reference values (indecies) to show a reference value on the plot. In the following example, index line is used to show the best result in previous year (PY).

bar_chart_reference(data = data_barchart,
                    cat = 'dep', series = 'profit', 
                    ref_val = 10, ref_label = 'PY best result')  %>% 
  add_title('The company XYZ', 'Profit', 'in mEUR', 
            'with reference value of 10 mEUR')

Grouped

To visualize 2 or 3 series of data, which do not sum up to some value, grouped barchart should be used. First series is visualized by bars in the foreground, second by bars in the background and third in the form of triangles. Style of the bars and triangles indicates type of data, so called scenerios.

The most typical usecase of this chart is to visualize profit of different departments in a company with comparison to bugdet and previous year data.


# names of columns in styles data frame are not important, 
# only order of column is
styles <- data.frame(style_foreground = rep('actual', 4),
                     style_background = rep('previous',4),
                     style_markers = rep('plan', 4))

bar_chart_grouped(data = data_barchart,
                  cat = 'dep',
                  foreground = 'profit',
                  background = 'prev_year',
                  markers = 'plan',
                  series_labels = c('PY', 'AC', 'PL'),
                  styles = styles)  %>% 
  add_title('The company XYZ', 'Profit', 'in mEUR', 
            'compared to different scenarios')

Variances

Show variance in data using relative variance plots or absolute variance plots. Define the baseline and the real values. Axis on variance plots use styles to scenario of baseline data. Relative variance plot shows difference in percents and absolute variance plot shows it in base units.

Example usage: Visualize difference between two scenarios in division by departments.

bar_chart_absolute_variance(data = data_barchart,
                            cat = 'dep', 
                            baseline = 'plan',
                            real = 'profit',
                            y_title = 'Plan vs. actual',
                            y_style = 'plan') %>% 
  add_title('The company XYZ', 'Profit variance', 'in mEUR', 
            'between plan and actual')

bar_chart_relative_variance(data = data_barchart,
                            cat = 'dep', 
                            baseline = 'plan',
                            real = 'profit',
                            y_title = 'Plan vs. actual',
                            y_style = 'plan') %>% 
  add_title('The company XYZ', 'Profit variance', 'in %', 
            'between plan and actual (plan=100%)')

Scatter plots

For demonstration we will use mtcars dataset available in R as built-in.

scatter_data <- mtcars[c('hp','qsec','cyl', 'wt')]

Scatter plots, also known as point plots, are used to visualize mulitidimensional relationships between variables. Therefore, they are extensively used in exploratory data analysis.

Scatter

In scatter plot 2 numerical dimensions are visualized by position of a point on the Cartesian plane.

scatter_plot(scatter_data, x = 'hp', y = 'qsec', 
             legend_title = '', 
             x_names = c('Horsepower', 'in hp'),
             y_names = c('1/4 mile time', 'in s')) %>% 
  add_title('The mtcars dataset', '', '', '')

Optionally, categorical dimension can be added in a form of a point color.

scatter_plot(scatter_data, x = 'hp',
             y = 'qsec', 
             cat = 'cyl', 
             legend_title = 'No. cylinder', 
             x_names = c('Horsepower', 'in hp'),
             y_names = c('1/4 mile time', 'in s')) %>% 
  add_title('The mtcars dataset', '', '', '')

Bubble

Bubble plots can visualize the same dimensions as scatter plots. However even third numeric dimension can be added in a form of point size. On the other hand, there is a tradeoff between dimensionality and size of your data and readability of generated plots, so be carefull when using bubble charts.

scatter_plot(scatter_data, x = 'hp',
             y = 'qsec', 
             cat = 'cyl', 
             bubble_value = 'wt',
             legend_title = 'No. cylinders',
             x_names = c('Horsepower', 'in hp'),
             y_names = c('1/4 mile time', 'in s')) %>% 
  add_title('The mtcars dataset', '', '', '')

Column charts

Charts with vertical columns are intended to visualize time series data. What is worth noticing, column width dependt on the x-axis interval. The longer the interval, the wider the column. General guideline for this kind of chart is to plot up to 24 columns. If your data has more than 24 time points see line chart section.

Data

Here is how an example column chart data frame could look like:

data_time_series <- data.frame(
  time = month.abb,
  Poland = 2 + 0.5 * sin(1:12),
  Germany = 3 + sin(3:14),
  Slovakia = 2 + 2 * cos(1:12)
)

The time column consists of the three-letter abbreviations for the English month names and other columns constit of some artificial data, it could be for example sales in different countries.

Basic

Use basic column chart to make a simple visualization of a time series. Pass interval parameter to change the width of columns.

Typical task related to this kind of plot could be the following: Show sales from different countries over the months.

column_chart(data_time_series, x = 'time', 
             series = c('Poland', 'Germany', 'Slovakia'), interval = 'months') %>% 
  add_title('The company XYZ', 'Profit', 'in mEUR', 'by country, 2020')

Waterfall

To visualize contribution waterfall charts can be used. We need to transform the data a little bit to before passing it into the plotting function.

Example usage: visualize contribution of monthly sales as a part of year sales.

data_time_series %>% group_by(time) %>%
  summarise(Sales = sum(Poland, Germany, Slovakia)) %>%
  arrange(match(time, month.abb)) %>% 
  mutate(Sales = round(cumsum(Sales), 2)) -> df_summarized

column_chart_waterfall(df_summarized, x = 'time', series = 'Sales') %>% 
  add_title('The company XYZ', 'Profit', 'in mEUR', 'cumulative, 2020')

Other column charts

Other types of column charts are available, ie. column_chart_grouped or column_chart_normalized. When using them similar data visualization rules apply as for bar charts. Feel free to explore them and see reference page if need help.

Line charts

Line charts, as column charts, should be used to show time series data. Some lineplots however require more complicated data structure.

Basic

The basic lineplot uses lines with markers to show the data. Typical usage is to visualize several data series, which do not sum up, for example the market value of different companies among the years.

set.seed(123)
data_companies <- data.frame(
  time = 2010:2020,
  Alpha.inc = 25 + round(1:11 + rnorm(11)),
  Beta.inc = round(40 + rnorm(11, sd = 2)),
  Gamma.inc = round(50 + rnorm(11, sd = 5))
)

line_chart_markers(data_companies, x = 'time',
          series = c('Alpha.inc', 'Beta.inc', 'Gamma.inc'),
          series_labels = c('Alpha.inc', 'Beta.inc', 'Gamma.inc'),
          interval = 'years') %>% 
  add_title('Some companies', 'Market value', 'in mEUR', '2010...2020')

Dense

Use desnse line plot to visualize up to 6 time series with more than one point in a category on x-axis. The more advanced users are encouraged to use the line_chart_dense_custom function, where they can choose points that will be highlited by value label.

The most typical example is to show data with time granularity of 1 day among the years (mean day temperature in the course of 16 months).


data_dense <- data.frame(
  dates = seq.Date(as.Date('2019-01-01'), as.Date('2019-12-31'), by = 1),
  Warsaw = 7 + 9 * sin((365:1 - 60)/ 365 * 2 * pi) + rnorm(365, sd = 2),
  London = 8 + 5 * sin((365:1 - 55)/ 365 * 2 * pi) + rnorm(365, sd = 1.2)
)

line_chart_dense(data_dense, 'dates', c('Warsaw', 'London')) %>%
  add_title('Temperature in European Cities', 'Daily mean', 'in deg. C', 'In 2019')

Columns vs lines

One can wonder what type of plot choose: line chart or column chart. The answer depends on the data. If you want to visualize only one series, both line and column chart are appropriate. More differences occur when number of series increases. If the sum of series means something reasonable, use stacked columns, optionally stacked lines. If not, use line plot.

Getting Started

Prerequisites

How to download and install the package?

Bar charts

Data

Basic

Normalized

Referenced

Grouped

Variances

Scatter plots

Scatter

Bubble

Column charts

Data

Basic

Waterfall

Other column charts

Line charts

Basic

Dense

Columns vs lines