Table Operations

R templates to do frequent table operations

Quarto
R
Markdown
YAML
Published

December 16, 2024

Template 1 - Combine two Excels by a shared key variable

This template is based on the following use case. We have one table with subject information with one row per subject and a table with scan lists which has many more rows as subjects have multiple scans. Both tables have a variable identifying the subjects. The subject IDs are common to both tables. We want to combine them and have the list of scans with the subject information in the same table, this means the rows with the subject information will appear multiple times, one for each scan of a given subject. For this we use the function dplyr::full_join() (Note: the libraries loaded with library() are additional R packages that need to be installed with install.packages() before first use)

Note. The code in this template is deliberately explicit for the sake of clarity, but it could be made more compact.

Always check the resulting output table!
library(openxlsx)
library(dplyr)

# User inputs
# -------------

dirinput =                  # folder with files
diroutput =                 # output folder to save the files 

# Files
file_subjects =             # insert filename of table with subject info 
sheet_subjects  =           # insert name of sheet in table

file_scanlist =             # filename of table with scanlist
sheet_scanlist  =           # insert name of sheet in table

# missing values definition 
labels_missing_subjects =c("","N/A") # labels used to specify missing value, first value is always: ""
labels_missing_scanlist =c("","N/A") 


# Define columns of interest (key variables: column cell values should be identical across tables)
sID_subjects =              # name of column identifying subject in subject file
sID_scans  =                # name of column identifying subject in scan list file 
   

# Read tables 
# ---------------

tbl_subj <- openxlsx::read.xlsx(xlsxFile = file.path(dirinput,file_subjects),
                               sheet= sheet_subjects, 
                               na.strings =labels_missing_subj)
                               
print(paste0('The table: has ' , nrow(tbl_subj), ' rows and ',ncol(tbl_subj), ' columns'))

# 

tbl_scans <- openxlsx::read.xlsx(xlsxFile = file.path(dirinput,file_scanlist),
                                sheet= sheet_scanlist, 
                                na.strings=labels_missing_scanlist)

print(paste0('The table : has ' , nrow(tbl_scans), ' rows and ',ncol(tbl_scans), ' columns'))


# Join tables 
# ---------------

tbl_joined  <- full_join(x=tbl,
                         y = tbl_files,
                         by=join_by(sID_scans==sID_subjects),
                         keep=FALSE)

print(paste0('The new table has ' , nrow(tbl_joined), ' rows and ',ncol(tbl_joined), ' columns'))

# Save (specify your own output filename)
#----- 
 openxlsx::write.xlsx(tbl_joined, file = file.path(diroutput,"table_merged.xlsx") 
Back to top