This post demonstrates use of the crosstalk extension to coordinate filtering across multiple DT datatable
widgets. The process is embedded in a standard R Markdown document, and is an example of server(Shiny)-less dashboarding/reporting.
Objective
I often need to create reports or dashboards that present information across multiple tables from a database. Frequently, this deliverable is best displayed as a filterable view of the tables themselves in a web browser. This can happen when a primary key in one table (e.g. subject ID) appears in multiple rows of another table (e.g. event based lines of therapy), and I don’t want to rectangle the data into a single flat file because it would complicate or mask necessary information.
A seemingly obvious solution here is Shiny, and that’s certainly true when you have large data / are querying in real time against a database. However, if your data are small to mid-sized (MB, not GB), you can achieve the above goal in a single html page from R Markdown, without any Shiny server overhead. The following outlines a workflow I’ve been using to make this possible; something I wanted to document is how to use crosstalk when you’re calling multiple DT datatable
widgets in the same filter_select()
call.
Simulate data
Let’s start by simulating some data that looks similar to what we would get from a clinical trial or other healthcare-related database. There will be three tables: demographics, treatments, and follow-up. The code to do this is a bit long, and ancillary to the main message of this post, so it’s hidden below (but feel free to toggle and browse it if you wish!).
Toggle simulation code
library(tidyverse)
library(lubridate)
### Function to simulate follow up tables
sim_data_fup <- function(
n = 100,
dt_sim_start = "2000-01-01",
dt_sim_end = "2019-12-31",
seed = 12345
) {
set.seed(seed)
dt_sim_start <- as_date(dt_sim_start)
dt_sim_end <- as_date(dt_sim_end)
# generate patient IDs
ids <- sample(111111:999999, n, replace = FALSE)
# simulate the data table
data_fup <- tibble(
id = ids,
ind_event = sample(0:1, n, replace = TRUE, prob = c(.8, .2)),
dt_first_contact = sample(dt_sim_start:dt_sim_end, n) %>% as_date(),
) %>%
mutate(
dt_last_contact =
dt_first_contact + sample(1:3000, n, replace = TRUE),
) %>%
mutate(
dt_last_contact =
if_else(dt_last_contact > dt_sim_end, dt_sim_end, dt_last_contact)
)
data_fup
}
### Function to simulate demographics tables
### Accepts a vector of IDs (usually identified from the follow up table)
sim_data_demo <- function(
ids,
age_range = 18:100,
race_opts = c("Black", "White")
) {
n = length(ids)
data_demo <- tibble(
id = ids,
age = sample(age_range, n, replace = TRUE),
sex = rep("F", n),
race = sample(race_opts, n, replace = TRUE, prob = c(.2, .8)),
ind_her2 = sample(0:1, n, replace = TRUE, prob = c(.2, .8))
)
data_demo
}
### Function to simulate treatment tables
### Accepts a follow-up table for IDs and first/last contact dates
sim_data_trt <- function(
data_fup,
n_trt_opts = 4, #the maximum number of treatments any patient can have
treatment_choices #a character vector of therapy options
) {
n <- data_fup %>% n_distinct("id")
n_trt_opts <- 1:n_trt_opts
# initialize patient_trt table; we'll bind rows at the end of each loop iteration
patient_trt <- tibble()
for(i in 1:n) {
n_trt <- sample(n_trt_opts, size = 1)
ith_patient <- data_fup %>% slice(i)
ith_patient_trt <- ith_patient %>%
select(id) %>%
slice(rep(1, times = n_trt)) %>% # elongate the table with a row for each treatment
mutate(
trt = sample(treatment_choices, size = n_trt, replace = TRUE),
dt_trt_start = sample(
seq(ith_patient$dt_first_contact, ith_patient$dt_last_contact, by="days") %>%
sort(),
size = n_trt, replace = FALSE
)
) %>%
arrange(dt_trt_start) %>%
mutate(
dt_trt_end = lead(dt_trt_start) - 1,
dt_trt_end = case_when(
row_number() == n() ~
sample(seq(last(dt_trt_start), ith_patient$dt_last_contact, by="days"), 1),
TRUE ~ dt_trt_end
)
)
patient_trt <- patient_trt %>% bind_rows(ith_patient_trt)
}
patient_trt
}
# simulate follow-up data
data_fup <- sim_data_fup(n = 500)
# simulate demographic data
data_demo <- sim_data_demo(ids = data_fup$id)
# simulate treatment data
data_trt <-
sim_data_trt(
data_fup, n_trt_opts = 5,
treatment_choices = c(
"TRASTUZUMAB",
"PACLITAXEL/CAPECITABINE",
"PERTUZUMAB",
"PACLITAXEL",
"HERCEPTIN",
"HERCEPTIN/PACLITAXEL"
)
)
Ultimately, we end up with 3 tables like the following, which we would like the end user to filter interactively by subject ID.
data_fup
## # A tibble: 500 x 4
## id ind_event dt_first_contact dt_last_contact
## <int> <int> <date> <date>
## 1 842910 1 2005-12-04 2006-12-01
## 2 734307 0 2013-08-17 2019-12-31
## 3 831334 0 2019-08-01 2019-12-31
## 4 579848 0 2018-09-01 2019-02-13
## 5 438865 0 2015-04-18 2017-12-13
## 6 469101 1 2011-04-29 2014-09-17
## 7 989456 0 2004-10-11 2009-05-22
## 8 173478 0 2008-07-07 2010-11-21
## 9 591277 0 2005-05-16 2005-09-18
## 10 485156 1 2002-01-13 2006-04-02
## # … with 490 more rows
data_demo
## # A tibble: 500 x 5
## id age sex race ind_her2
## <int> <int> <chr> <chr> <int>
## 1 842910 62 F White 1
## 2 734307 30 F White 1
## 3 831334 99 F White 1
## 4 579848 48 F White 1
## 5 438865 36 F White 1
## 6 469101 55 F Black 1
## 7 989456 53 F White 1
## 8 173478 49 F White 1
## 9 591277 67 F White 0
## 10 485156 65 F White 1
## # … with 490 more rows
data_trt
## # A tibble: 1,487 x 4
## id trt dt_trt_start dt_trt_end
## <int> <chr> <date> <date>
## 1 842910 HERCEPTIN/PACLITAXEL 2005-12-13 2006-06-23
## 2 842910 PACLITAXEL/CAPECITABINE 2006-06-24 2006-07-09
## 3 842910 PACLITAXEL/CAPECITABINE 2006-07-10 2006-10-23
## 4 842910 PERTUZUMAB 2006-10-24 2006-11-27
## 5 734307 PACLITAXEL 2013-10-02 2015-11-08
## 6 831334 PERTUZUMAB 2019-12-08 2019-12-15
## 7 579848 PERTUZUMAB 2018-11-24 2018-12-28
## 8 438865 PACLITAXEL/CAPECITABINE 2015-08-01 2016-03-17
## 9 438865 PERTUZUMAB 2016-03-18 2016-04-15
## 10 438865 PACLITAXEL 2016-04-16 2016-08-12
## # … with 1,477 more rows
Filter the tables with crosstalk
Since DT
is a crosstalk
compatible widget, it really only takes a few lines to get the desired outcome. The general strategy is to send your data frame into a new SharedData
object with a line like SharedData$new(my_data_frame)
. Often, a single SharedData
object is sufficient for crosstalk
to coordinate multiple widgets, such as on the crosstalk
examples page. Sometimes though, you might want crosstalk
to operate across several data frames with the same key, meaning you need multiple calls to SharedData$new()
. The trick here is to use the key
and group
arguments in SharedData$new()
. In this case, key = ~id
and we assign the name group = "shared_obj"
. Next, we pass any of the SharedData
objects to filter_select()
which recognizes that the object belongs to a group. Something that was a “gotcha” for me: the group
argument in filter_select()
is not the same as group
in SharedData
, rather, it’s the values that populate the select box (in this case, id
).
All of this comes together as follows:
library(DT)
library(crosstalk)
# set up a group of SharedData objects
shared_demo <- SharedData$new(data_demo, key = ~id, group = "shared_obj")
shared_trt <- SharedData$new(data_trt, key = ~id, group = "shared_obj")
shared_fup <- SharedData$new(data_fup, key = ~id, group = "shared_obj")
# send the group to the filtering selection box
filter_select(
id = "id-selector", label = "ID",
sharedData = shared_demo, group = ~id
)
Demographics
datatable(shared_demo)
Treatment
datatable(shared_trt)
Follow-up
datatable(shared_fup)
Summary
Note that this approach can work particularly well with rmdformats or other R Markdown theming. For example, here is the above workflow embedded in rmdformats::robobook
which can be found in this repo with corresponding interactive webpage. Enjoy!