Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.



  • Recently I got a lot of files with columns that are all missing. I had to modify the ‘data_dictionary’ function to cope with these so have released version 1.1
  • Windows Version is Downloadable Here
  • Linux and Mac users can just run this R code

R Code:

install_github("disentangle", "ivanhanigan")

  • In my ‘disentangle’ package of miscellaneous tools, the data_dictionary function is designed to produce descriptive summary statistics in a familiar way to the FREQUENCIES command for SPSS
  • I chose to emulate the SPSS output because I think it is a pretty decent summary statistics table, and there are loads of users who have already been introduced to this style of output
  • I wrote it because I had not got exactly what I wanted from the reporttools or stargazer packages which also have similar summary stats functions
  • I chose to call it data_dictionary because I think that the alternatives (like ‘frequencise’ or ‘code_book’ are not as immediately intuitive as to what you get)
  • the function I wrote returns a data.frame with the variable name, a simplified type (character, number or date)


# functions
install_github("disentangle", "ivanhanigan")

# load
fpath <- system.file("extdata/civst_gend_sector.csv", package = "disentangle")
civst_gend_sector <- read.csv(fpath)
civst_gend_sector$datevar <- as.Date(round(rnorm(nrow(civst_gend_sector), Sys.Date(),10)), origin = "1970-01-01")
civst_gend_sector$missing_blahblah_variable  <- NA

# check

data_dict(civst_gend_sector, "civil_status")
data_dict(civst_gend_sector, "datevar")
data_dict(civst_gend_sector, "number_of_cases")
data_dict(civst_gend_sector, "missing_blahblah_variable")

dataDictionary <- data_dictionary(civst_gend_sector,
                                  show_levels = -1)

print(xtable(dataDictionary), type = "html", include.rownames = F)

Variable Type Attributes Value Count Percent
civil_status character divorced/widowed 6 33.33
married 6 33.33
single 6 33.33
gender character female 9 50.00
male 9 50.00
activity_sector character primary 6 33.33
secondary 6 33.33
tertiary 6 33.33
number_of_cases number Min. 0
1st Qu. 5
Median 9
Mean 15.17
3rd Qu. 17
Max. 50
datevar date Min. 2014-05-26
1st Qu. 2014-06-07
Median 2014-06-15
Mean 2014-06-14
3rd Qu. 2014-06-18
Max. 2014-07-05
missing_blahblah_variable missing NA's 18 100.00

Posted in  disentangle

blog comments powered by Disqus