Disentangle Things by Ivan Hanigan

Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

data-dictionary-function-needed-upgrade-to-my-misc-package

Recently I got a lot of files with columns that are all missing. I had to modify the ‘data_dictionary’ function to cope with these so have released version 1.1
Windows Version is Downloadable Here
Linux and Mac users can just run this R code

R Code:

require(devtools)
install_github("disentangle", "ivanhanigan")

In my ‘disentangle’ package of miscellaneous tools, the data_dictionary function is designed to produce descriptive summary statistics in a familiar way to the FREQUENCIES command for SPSS
I chose to emulate the SPSS output because I think it is a pretty decent summary statistics table, and there are loads of users who have already been introduced to this style of output
I wrote it because I had not got exactly what I wanted from the reporttools or stargazer packages which also have similar summary stats functions
I chose to call it data_dictionary because I think that the alternatives (like ‘frequencise’ or ‘code_book’ are not as immediately intuitive as to what you get)
the function I wrote returns a data.frame with the variable name, a simplified type (character, number or date)

Code:data-dictionary-function-needed-upgrade-to-my-misc-package

# functions
require(devtools)
install_github("disentangle", "ivanhanigan")
require(disentangle)
require(xtable)

# load
fpath <- system.file("extdata/civst_gend_sector.csv", package = "disentangle")
fpath
civst_gend_sector <- read.csv(fpath)
civst_gend_sector$datevar <- as.Date(round(rnorm(nrow(civst_gend_sector), Sys.Date(),10)), origin = "1970-01-01")
civst_gend_sector$missing_blahblah_variable  <- NA

# check
str(civst_gend_sector)

#do
data_dict(civst_gend_sector, "civil_status")
data_dict(civst_gend_sector, "datevar")
data_dict(civst_gend_sector, "number_of_cases")
data_dict(civst_gend_sector, "missing_blahblah_variable")

dataDictionary <- data_dictionary(civst_gend_sector,
                                  show_levels = -1)

print(xtable(dataDictionary), type = "html", include.rownames = F)

Variable	Type	Attributes	Value	Count	Percent
civil_status	character	divorced/widowed		6	33.33
		married		6	33.33
		single		6	33.33
gender	character	female		9	50.00
		male		9	50.00
activity_sector	character	primary		6	33.33
		secondary		6	33.33
		tertiary		6	33.33
number_of_cases	number	Min.	0
		1st Qu.	5
		Median	9
		Mean	15.17
		3rd Qu.	17
		Max.	50
datevar	date	Min.	2014-05-26
		1st Qu.	2014-06-07
		Median	2014-06-15
		Mean	2014-06-14
		3rd Qu.	2014-06-18
		Max.	2014-07-05
missing_blahblah_variable	missing	NA's		18	100.00