Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

colours-to-use-for-maps

Purpose

Selected subset of all colour palettes

  • This is the agreed set of colour palettes to be used for maps and figures
  • They are all 1) Colour blind safe, 2) Computer screen safe and 3) printer safe
### Load the package or install if not present
if (!require("RColorBrewer")) {
install.packages("RColorBrewer")
library(RColorBrewer)
}
## Loading required package: RColorBrewer
par(mfrow = c(3,3))
for(col_i in c('YlGn','RdPu', 'PuRd', 'BrBG', 'RdBu', 'RdYlBu', 'Set3', 'Set1')){
  display.brewer.pal(n = 5, name = col_i)
}

  • A machine-readable approach to color specification is as hexadecimal triplets.
  • Here is how the RColorBrewer RdYlBu palette is actually stored:
for(col_i in c('YlGn','RdPu', 'PuRd', 'BrBG', 'RdBu', 'RdYlBu', 'Set3', 'Set1')){
  print(col_i)
  print(brewer.pal(n = 5, name = col_i))
}
## [1] "YlGn"
## [1] "#FFFFCC" "#C2E699" "#78C679" "#31A354" "#006837"
## [1] "RdPu"
## [1] "#FEEBE2" "#FBB4B9" "#F768A1" "#C51B8A" "#7A0177"
## [1] "PuRd"
## [1] "#F1EEF6" "#D7B5D8" "#DF65B0" "#DD1C77" "#980043"
## [1] "BrBG"
## [1] "#A6611A" "#DFC27D" "#F5F5F5" "#80CDC1" "#018571"
## [1] "RdBu"
## [1] "#CA0020" "#F4A582" "#F7F7F7" "#92C5DE" "#0571B0"
## [1] "RdYlBu"
## [1] "#D7191C" "#FDAE61" "#FFFFBF" "#ABD9E9" "#2C7BB6"
## [1] "Set3"
## [1] "#8DD3C7" "#FFFFB3" "#BEBADA" "#FB8072" "#80B1D3"
## [1] "Set1"
## [1] "#E41A1C" "#377EB8" "#4DAF4A" "#984EA3" "#FF7F00"
# or for more levels
brewer.pal(n = 10, name = "RdYlBu")
##  [1] "#A50026" "#D73027" "#F46D43" "#FDAE61" "#FEE090" "#E0F3F8" "#ABD9E9"
##  [8] "#74ADD1" "#4575B4" "#313695"
The leading # is just there by convention. Parse the hexadecimal string like so: #rrggbb, where rr, gg, and bb refer to color intensity in the red, green, and blue channels, respectively. Each is specified as a two-digit base 16 number, which is the meaning of "hexadecimal" (or "hex" for short). Here's a table relating base 16 numbers to the beloved base 10 system.

All colour palettes

### Show all the colour schemes available
par(cex = .6)
display.brewer.all()

### Set the display a 2 by 2 grid
par(mfrow=c(2,2))


### Generate random data matrix
rand.data <- replicate(8,rnorm(100,100,sd=1.5))

### Draw a box plot, with each box coloured by the 'Set3' palette
boxplot(rand.data,col=brewer.pal(8,"Set3"))

### Draw plot of counts coloured by the 'Set3' pallatte
br.range <- seq(min(rand.data),max(rand.data),length.out=10)
results <- sapply(1:ncol(rand.data),function(x) hist(rand.data[,x],plot=F,br=br.range)$counts)
plot(x=br.range,ylim=range(results),type="n",ylab="Counts")
cols <- brewer.pal(8,"Set3")
lapply(1:ncol(results),function(x) lines(results[,x],col=cols[x],lwd=3))

### Draw a bar chart
table.data <- table(round(rand.data))
cols <- colorRampPalette(brewer.pal(8,"Dark2"))(length(table.data))
barplot(table.data,col=cols)

Other reference material

Posted in  exploratory data analysis


Australian Postal Areas In Geographical Vs Projected Coordinates

Download

'name:poa06-area-lambert'
setwd("~/projects/POA_centroids/POA2006_centroids")
library(swishdbtools)
ch <- connect2postgres2("delphe")

fout_geo=dbGetQuery(ch,
'select poa_2006,
  st_area(st_transform(the_geom, 3112))/1000000 as Geoscience_Australia_Lambert_area_km2,
st_x(st_centroid(st_transform(the_geom,3112))) as geocentx,
st_y(st_centroid(st_transform(the_geom,3112))) as geocenty
from abs_poa.auspoa06')
str(fout_geo)
sum(fout_geo$geoscience_australia_lambert_area_km2)
write.table(fout_geo,'data_derived/auspoa06_geocentroids_lambert_20160624.csv',
            row.names=F, sep=',')

plot(fout_geo[,3:4])
head(fout_geo)
nrow(fout_geo)
2507

Posted in  spatial


spatial-lag-and-timeseries-model-with-nmmaps-UPDATE

Posted in  spatial dependence


Judging the evidence part 2

I previously reported on a lecture slide deck called 'Judging the Evidence' by Adrian Sleigh for a course PUBH7001 Introduction to Epidemiology, April 30, 2001. http://ivanhanigan.github.com/2015/11/judging-the-evidence-using-a-literature-review-database

I have also now extracted several slides into a template outline for reviewing epidemiological and other research.

Adrian Sleigh's Protocol

Object of Study, Hypotheses or Research Questions
  1. Purpose of Study: Objectives of study; why was it done?
  2. Reference Population:
    • To whom do authors generalize results?
    • To whom should the findings be generalized?
Sampling

From the Reference Pop (target population) ->

Source Pop -> Eligible population

The source population may be defined directly, as a matter of defining its membership criteria; or the definition may be indirect, as the catchment population of a defined way of identifying cases of the illness. The catchment population is, at any given time, the totality of those in the ‘were-would’ state of: were the illness now to occur, it would be ‘caught’ by that case identification scheme Source: Miettinen OS, 2007 [http://www.teachepi.org/documents/courses/fundamentals/Pai_Lecture6_Selection%20bias.pdf](http://www.teachepi.org/documents/courses/fundamentals/Pai_Lecture6_Selection%20bias.pdf)

Sample Pop:
  1. Refusals, Dropouts
  2. Participants -> Study Pop
Design of study
  1. Study setting: Where and when was the study done? What were the circumstances? Ethics?
  2. Type of study: Experimental vs natural, descriptive vs analytical (trial, cohort, case-control, prevalence, ecological, case-report, etc). If case-control or cohort, was the timing of data collection retrospective or prospective?
  3. Subjects: Who (number, age, sex, etc.)? How were they selected?
  4. Comparison groups: What control group or standard of comparison? How appropriate?
  5. Study size: Was the sample size adequate to give you confidence in the finding of "no association
  6. Bias and Confounding
    • a) Selection bias: Were groups comparable for subjects who entered and stayed in study? Selection influenced by exposure (c-c) or effect (cohort) under study? Drop-outs?
    • b) Confounding: Control of potential confounding variables in design of the study - matching or subject restriction?
Observations
  1. Procedure: How are the variables in the study defined and measured, ie how were data collected?
  2. Definition of terms: Are definitions of diagnostic criteria, measurements and outcome unambiguous? Could be reproduced?
  3. Bias and Confounding
    • a) Observation bias: Were study groups comparable for measurements or mode of observation? Mis-classification in determining exposure or disease categories? Differential between groups, or 'random'?
    • b) Confounding: Information recorded on variables that could confound the association under study (to permit adjustment in the analysis)?

THANKS Prof Sleigh!

Posted in  bibliometrics and literature reviewing


R base graphics are fine except barplot

I concur with Jeff Leek that once spent time learning base graphics in R there is less incentive to learn ggplot2 http://simplystatistics.org/2016/02/11/why-i-dont-use-ggplot2/

However I always hate the way barplot works. Here is an example:

qc <- read.csv(textConnection("id,  OnlinePaper, Q, freq, totals,       prop
1,      Online,         ,1768,   9950, 0.17768844
2,      Online,      No ,4022,   9950, 0.40422111
3,      Online,     Yes ,4160,   9950, 0.41809045
4,       Paper,         , 256,   3355, 0.07630402
5,       Paper,      No , 979,   3355, 0.29180328
6,       Paper,     Yes ,2120,   3355, 0.63189270"))

qc1 <- cast(qc, OnlinePaper ~ Q74 , value = "prop")
qc1
barplot(as.matrix(qc1), beside = T, legend.text = qc1[,1], ylim = c(0,1))

/images/barplot_base.png

ggplot(data=qc, aes(x=Q, y=prop, fill=OnlinePaper)) +
    geom_bar(stat="identity", position=position_dodge())

/images/barplot_gg.png

Going to extremes

I should say though that I have found barplot can produce very customised graphs that serve a specific purpose such as that below (I have de-identified the content as this is unpublished research)

/images/barplot-gonuts.png

This made heavy use of the following approach

# original by Joseph Guillaume 2009
SideBySideBarPlot2 <- function(aggAllData, ...) {
  par(mar=c(8,7,4,2))
  bp<-barplot(aggAllData,
              horiz=FALSE,
              col=gray.colors(nrow(aggAllData)),
              las=1, axisnames = FALSE, ...)
  labels <- names(as.data.frame(aggAllData))
  text(bp, par('usr')[3], labels = labels, srt = 45, 
       adj = c(1.1,1.1), xpd = TRUE, cex=.9)
    return(bp)
}
# with width = xvar (proportions)

Posted in  exploratory data analysis