- Recently I got a lot of files with columns that are all missing. I had to modify the ‘data_dictionary’ function to cope with these so have released version 1.1
- Windows Version is Downloadable Here
- Linux and Mac users can just run this R code
R Code:
require(devtools)
install_github("disentangle", "ivanhanigan")
- In my ‘disentangle’ package of miscellaneous tools, the data_dictionary function is designed to produce descriptive summary statistics in a familiar way to the FREQUENCIES command for SPSS
- I chose to emulate the SPSS output because I think it is a pretty decent summary statistics table, and there are loads of users who have already been introduced to this style of output
- I wrote it because I had not got exactly what I wanted from the reporttools or stargazer packages which also have similar summary stats functions
- I chose to call it data_dictionary because I think that the alternatives (like ‘frequencise’ or ‘code_book’ are not as immediately intuitive as to what you get)
- the function I wrote returns a data.frame with the variable name, a simplified type (character, number or date)
Code:data-dictionary-function-needed-upgrade-to-my-misc-package
# functions
require(devtools)
install_github("disentangle", "ivanhanigan")
require(disentangle)
require(xtable)
# load
fpath <- system.file("extdata/civst_gend_sector.csv", package = "disentangle")
fpath
civst_gend_sector <- read.csv(fpath)
civst_gend_sector$datevar <- as.Date(round(rnorm(nrow(civst_gend_sector), Sys.Date(),10)), origin = "1970-01-01")
civst_gend_sector$missing_blahblah_variable <- NA
# check
str(civst_gend_sector)
#do
data_dict(civst_gend_sector, "civil_status")
data_dict(civst_gend_sector, "datevar")
data_dict(civst_gend_sector, "number_of_cases")
data_dict(civst_gend_sector, "missing_blahblah_variable")
dataDictionary <- data_dictionary(civst_gend_sector,
show_levels = -1)
print(xtable(dataDictionary), type = "html", include.rownames = F)
Variable | Type | Attributes | Value | Count | Percent |
---|---|---|---|---|---|
civil_status | character | divorced/widowed | 6 | 33.33 | |
married | 6 | 33.33 | |||
single | 6 | 33.33 | |||
gender | character | female | 9 | 50.00 | |
male | 9 | 50.00 | |||
activity_sector | character | primary | 6 | 33.33 | |
secondary | 6 | 33.33 | |||
tertiary | 6 | 33.33 | |||
number_of_cases | number | Min. | 0 | ||
1st Qu. | 5 | ||||
Median | 9 | ||||
Mean | 15.17 | ||||
3rd Qu. | 17 | ||||
Max. | 50 | ||||
datevar | date | Min. | 2014-05-26 | ||
1st Qu. | 2014-06-07 | ||||
Median | 2014-06-15 | ||||
Mean | 2014-06-14 | ||||
3rd Qu. | 2014-06-18 | ||||
Max. | 2014-07-05 | ||||
missing_blahblah_variable | missing | NA's | 18 | 100.00 |