An example of a data index for navigation through metadata and preview images
Author
G.Fraga Gonzalez
Published
December 16, 2024
This is a demo of a data index that allows to navigate through a table with metadata and thumbnail images. The idea is to facilitate finding and sharing your data. Visit our Tutorials page for more details on the workflow of producing such a data index. In short, for the index you need: (1) a source table of structured metadata (2) if you are to display thumbnails you need the picture files (3) some R code to render the table interactive with the package DT and (4) R Quarto to output the table as an HTML page.
More information about this example
This demo uses a dummy dataset, i.e., the source metadata table and images have no meaning. The example demonstrates three frequent operations that can be useful for creating a data index:
Reading filenames and extracting information into columns from filename parts
Joining two metadata tables according to a key variable with shared values between them
Adding some HTML code to a column value (in this case filenames) to render the images (the same approach would be done to render hyperlinks).
Use the filter boxes to select which data are displayed. Use select Columns to choose which columns are displayed or change the order of the column if you click and drag on the variable names. You can select rows (one or multiple) clicking on them or by using the select button. Click on a thumbnail to open the full-size image. Click the button Copy to copy the filtered or selected rows to the clipboard and click the button csv to save it in that format.
Code
library(crosstalk)# for filter boxeslibrary(DT)# For html table# first we need an object that will be shared by filter panels and datatable shared_joined<-SharedData$new(tbl_joined, key =~subjID, group ="shared_obj")# to make two columns: one with filter panels and one with the tablebscols(widths =c(2,10), device =c("xs", "sm", "md", "lg"),# filter panels. Other formats are sliders and checkboxes https://rstudio.github.io/crosstalk/using.htmllist(filter_select( id ="subjID", label ="subject",sharedData =shared_joined, group =~subjID),filter_checkbox("Sex","Sex",shared_joined, ~Sex, inline =FALSE)),# tabledatatable(shared_joined,#filter = "top", escape =FALSE, rownames =FALSE, width ="100%", class ='cell-border hover', extensions =c('Buttons', 'Select','ColReorder', 'Scroller', 'FixedHeader', 'KeyTable'), selection ='none', options =list( fixedHeader =TRUE, pageLength =20, paging =FALSE, dom ='Bftrip',#buttons = c('colvis','selectAll', 'selectNone', 'copy', 'csv'), buttons =list(list(extend ="colvis", text ="select Columns", background='yellow'),'selectAll', 'selectNone', 'copy', 'csv'), select =list(style ='os', items ='row'), scrollX =TRUE, scrollY ="800px", scrollCollapse =TRUE, autoWidth =TRUE, colReorder =TRUE, columnDefs =list(list( keys =TRUE, search =list(regex =TRUE), targets=0)))))
The following steps will vary depending on the index content. Steps 1 to 4 will change depending on the used metadata tables and if we want to render links, images, etc. Step 5 will vary depending on the many possible options we have to display the table and filters. For this example we did:
1. Read metadata table
We first read a source metadata table with information about subjects and experiments (image recording).
We read the names of files with the images we want to showcase. The files with subject images start with subject identifier separated by the rest by a hyphen ‘-’. We use this to create a new variable containing the subject identifier of each image file.
Code
files<-dir('Images')# Find all files in our images folderfname<-files[grepl(paste0('*.jpg$'),files)]#take only .jpg filestbl_files<-as.data.frame(fname)# make table with filenamestbl_files$subject<-sapply(strsplit(fname,'_'),'[[',1)#take 1st filename part
3. Join the two tables
Now we link the image filenames to the source metadata table.
Code
tbl_joined<-full_join(x=tbl, y =tbl_files, by=join_by("subjID"=="subject"), keep=FALSE)
4. Add HTML code to render the images
Since we want to display images we will add some HTML including the address of the image files (the repository). The URL is added as href so that we access the image when we click. The path relative to this script is added to src so that the table renders the image thumbnail. We use file.path() to write the directories to avoid having to specify the operating system separator ‘\’ or ‘/’ (except for the URL address).
Code
# define image pathspic_folder<-'https://gitlab.uzh.ch/crsuzh/afford_website/-/tree/master/ORD_index/Images'pic_fullpath<-file.path(pic_folder,tbl_joined$fname)pic_relpath<-file.path('..','Images',tbl_joined$fname)# add paths and HTML code tbl_joined$pic<-paste0('<a href=\'', pic_fullpath,'\' target=\'_blank\'>', '<img src=\'',pic_relpath, '\' height=\'70\'></a>')# Add location tbl_joined$location<-paste0('<a href=\'',pic_folder,'\' target=\'_blank\'>', 'Gitlab folder >>','</a>')# Move to first columntbl_joined<-tbl_joined%>%relocate("pic",.before =1)tbl_joined<-tbl_joined%>%relocate("fname",.before =2)tbl_joined<-tbl_joined%>%relocate("location",.before =3)
Note on relative paths and Gitlab pages for advanced users >>
The img src usually has the path to the images relative to the current path, e.g., “.” if they are in subfolder of the current working directory (in R you can find it with getwd()). The current path “../../Images” indicates they are in a folder named ‘Images’ two folders above the current folder. This is related to how the Continuous integration (CI/CD) pipeline is set. There are many other possible configurations. Attention: your local relative paths may not function the same if you use CI/CD and Gitlab pages. Find more information about CI/CD is in the Tutorial section.
5. Render the HTML table with DT package
Finally we render the table interactive and add some custom filter boxes. We need two packages (they were installed in advanced with install.packages() and then we load their libraries with the command library()). The main package is DT which allows to render the table interactive. The R package crosstalk is used to allow filter boxes to change what rows we see in the table. Its function bscols() allows to lay out the filters on a column next to the table.
Code
library(crosstalk)# for filter boxeslibrary(DT)# For html table# first we need an object that will be shared by filter panels and datatable shared_joined<-SharedData$new(tbl_joined, key =~subjID, group ="shared_obj")# to make two columns: one with filter panels and one with the tablebscols(widths =c(2,10), device =c("xs", "sm", "md", "lg"),# filter panels. Other formats are sliders and checkboxes https://rstudio.github.io/crosstalk/using.htmllist(filter_select( id ="subjID", label ="subject",sharedData =shared_joined, group =~subjID),filter_checkbox("Sex","Sex",shared_joined, ~Sex, inline =FALSE)),# tabledatatable(shared_joined,#filter = "top", escape =FALSE, rownames =FALSE, width ="100%", class ='cell-border hover', extensions =c('Buttons', 'Select','ColReorder', 'Scroller', 'FixedHeader', 'KeyTable'), selection ='none', options =list( fixedHeader =TRUE, pageLength =20, paging =FALSE, dom ='Bftrip',#buttons = c('colvis','selectAll', 'selectNone', 'copy', 'csv'), buttons =list(list(extend ="colvis", text ="select Columns", background='yellow'),'selectAll', 'selectNone', 'copy', 'csv'), select =list(style ='os', items ='row'), scrollX =TRUE, scrollY ="800px", scrollCollapse =TRUE, autoWidth =TRUE, colReorder =TRUE, columnDefs =list(list( keys =TRUE, search =list(regex =TRUE), targets=0)))))
---title: "Data Index"subtitle: "An example of a data index for navigation through metadata and preview images" author: "G.Fraga Gonzalez"affiliation: "Center for Reproducible Science, UZH"date: last-modifiedformat: html: code-overflow: scroll code-tools: true code-copy: true code-fold: true page-layout: full---This is a demo of a data index that allows to navigate through a table with metadata and thumbnail images. The idea is to facilitate finding and sharing your data. Visit our [Tutorials page](https://crsuzh.pages.uzh.ch/AFFORD_website/tutorials/) for more details on the workflow of producing such a data index. In short, for the index you need: (1) a source table of structured metadata (2) if you are to display thumbnails you need the picture files (3) some R code to render the table interactive with the package [DT](https://rstudio.github.io/DT/) and (4) R Quarto to output the table as an HTML page.::: {.callout-note collapse=true}#### More information about this example This demo uses a dummy dataset, i.e., the source metadata table and images have no meaning. The example demonstrates three frequent operations that can be useful for creating a data index:- Reading filenames and extracting information into columns from filename parts- Joining two metadata tables according to a key variable with shared values between them- Adding some HTML code to a column value (in this case filenames) to render the images (the same approach would be done to render hyperlinks).:::```{r readtable}#| include: falselibrary(dplyr)tbl <- readxl::read_excel('DummyData1_20241234_subjects.xlsx',na=c("","N/A"))# Minor adjustment of time formatcolsWithTime <- colnames(tbl)[grep('*time*',colnames(tbl))] # find variables with tbl <- tbl %>% mutate(across(all_of(colsWithTime), ~ format(., format = "%H:%M")))``````{r filenames}#| include: falsefiles <- dir('Images') # Find all files in our images folderfname <- files[grepl(paste0('*.jpg$'),files)] #take only .jpg filestbl_files <- as.data.frame(fname) # make table with filenamestbl_files$subject <- sapply(strsplit(fname,'_'),'[[',1) #take 1st filename part``````{r jointables}#| include: falsetbl_joined <- full_join(x=tbl, y = tbl_files, by=join_by("subjID"=="subject"), keep=FALSE)``````{r addhtml}#| include: false# define image pathspic_folder <- 'https://gitlab.uzh.ch/crsuzh/afford_website/-/tree/master/ORD_index/Images'pic_fullpath <- file.path(pic_folder,tbl_joined$fname)pic_relpath <- file.path('..','Images',tbl_joined$fname)# add paths and HTML code tbl_joined$pic <- paste0('<a href=\'', pic_fullpath,'\' target=\'_blank\'>', '<img src=\'',pic_relpath, '\' height=\'70\'></a>')# Add location tbl_joined$location <- paste0('<a href=\'',pic_folder,'\' target=\'_blank\'>', 'Gitlab folder >>','</a>')# Move to first columntbl_joined <- tbl_joined %>% relocate("pic",.before = 1)tbl_joined <- tbl_joined %>% relocate("fname",.before = 2)tbl_joined <- tbl_joined %>% relocate("location",.before = 3)``````{r rendertable}#| include: false library(crosstalk) # for filter boxeslibrary(DT) # For html table# first we need an object that will be shared by filter panels and datatable shared_joined <- SharedData$new(tbl_joined, key = ~subjID, group = "shared_obj")# to make two columns: one with filter panels and one with the tablebscols(widths = c(2,10), device = c("xs", "sm", "md", "lg"),# filter panels. Other formats are sliders and checkboxes https://rstudio.github.io/crosstalk/using.htmllist( filter_select( id = "subjID", label = "subject",sharedData = shared_joined, group = ~subjID), filter_checkbox("Sex","Sex",shared_joined, ~Sex, inline = FALSE)),# tabledatatable( shared_joined, #filter = "top", escape = FALSE, rownames = FALSE, width = "100%", class = 'cell-border hover', extensions = c('Buttons', 'Select','ColReorder', 'Scroller', 'FixedHeader', 'KeyTable'), selection = 'none', options = list( fixedHeader = TRUE, pageLength = 20, paging = FALSE, dom = 'Bftrip', #buttons = c('colvis','selectAll', 'selectNone', 'copy', 'csv'), buttons = list(list(extend = "colvis", text = "select Columns", background='yellow'), 'selectAll', 'selectNone', 'copy', 'csv'), select = list(style = 'os', items = 'row'), scrollX = TRUE, scrollY = "800px", scrollCollapse = TRUE, autoWidth = TRUE, colReorder = TRUE, columnDefs = list( list( keys = TRUE, search = list(regex = TRUE), targets=0 ) ) ) ))```::: {.panel-tabset}# Navigate data {.unnumbered}<!-- ::: {.callout-tip style="font-size: 4px" collapse=false} -->::: {.callout-tip collapse=false}# InstructionsUse the **filter boxes** to select which data are displayed. Use **select Columns** to choose which columns are displayed or change the order of the column if you *click and drag* on the variable names. You can **select rows** (one or multiple) clicking on them or by using the select button. **Click** on a thumbnail to open the full-size image. Click the button **Copy** to copy the filtered or selected rows to the clipboard and click the button **csv** to save it in that format. ::: ```{r }<<rendertable>>```# How-to The following steps will vary depending on the index content. Steps 1 to 4 will change depending on the used metadata tables and if we want to render links, images, etc. Step 5 will vary depending on the many possible options we have to display the table and filters. For this example we did: ### 1. Read metadata tableWe first read a *source metadata table* with information about subjects and experiments (image recording).```{r }#| code-fold: show<<readtable>>```### 2. Make a table with filenames We read the names of *files* with the images we want to showcase. The files with subject images start with subject identifier separated by the rest by a hyphen '-'. We use this to create a new variable containing the subject identifier of each image file.```{r }#| code-fold: show<<filenames>>```### 3. Join the two tables Now we link the image filenames to the source metadata table.```{r }#| code-fold: show<<jointables>>```### 4. Add HTML code to render the imagesSince we want to display images we will add some HTML including the address of the image files (the repository). The URL is added as *href * so that we access the image when we click. The path relative to this script is added to *src* so that the table renders the image thumbnail. We use `file.path()` to write the directories to avoid having to specify the operating system separator '\\' or '/' (except for the URL address). ```{r }#| code-fold: show<<addhtml>>```::: {.callout-warning collapse=true}#### Note on relative paths and Gitlab pages for advanced users >>The *img src* usually has the path to the images relative to the current path, e.g., ".\Images" if they are in subfolder of the current working directory (in R you can find it with `getwd()`). The current path "../../Images" indicates they are in a folder named 'Images' two folders above the current folder. This is related to how the Continuous integration (CI/CD) pipeline is set. There are many other possible configurations. Attention: your local relative paths may not function the same if you use CI/CD and Gitlab pages. Find more information about CI/CD is in the Tutorial section. :::### 5. Render the HTML table with DT packageFinally we render the table interactive and add some custom filter boxes. We need two [packages](https://r-pkgs.org/) (they were installed in advanced with `install.packages()` and then we load their libraries with the command `library()`). The main package is [DT](https://rstudio.github.io/DT/) which allows to render the table interactive. The R package [crosstalk](https://rstudio.github.io/crosstalk/) is used to allow filter boxes to change what rows we see in the table. Its function `bscols()` allows to lay out the filters on a column next to the table. ```{r }#| code-fold: show<<rendertable>>```:::