Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

ONS-SCD.png

Australian Postal Areas In Geographical Vs Projected Coordinates

Download

'name:poa06-area-lambert'
setwd("~/projects/POA_centroids/POA2006_centroids")
library(swishdbtools)
ch <- connect2postgres2("delphe")

fout_geo=dbGetQuery(ch,
'select poa_2006,
  st_area(st_transform(the_geom, 3112))/1000000 as Geoscience_Australia_Lambert_area_km2,
st_x(st_centroid(st_transform(the_geom,3112))) as geocentx,
st_y(st_centroid(st_transform(the_geom,3112))) as geocenty
from abs_poa.auspoa06')
str(fout_geo)
sum(fout_geo$geoscience_australia_lambert_area_km2)
write.table(fout_geo,'data_derived/auspoa06_geocentroids_lambert_20160624.csv',
            row.names=F, sep=',')

plot(fout_geo[,3:4])
head(fout_geo)
nrow(fout_geo)
2507

Posted in  spatial


spatial-lag-and-timeseries-model-with-nmmaps-UPDATE

Posted in  spatial dependence


Judging the evidence part 2

I previously reported on a lecture slide deck called ‘Judging the Evidence’ by Adrian Sleigh for a course PUBH7001 Introduction to Epidemiology, April 30, 2001. http://ivanhanigan.github.com/2015/11/judging-the-evidence-using-a-literature-review-database

I have also now extracted several slides into a template outline for reviewing epidemiological and other research.

Adrian Sleigh’s Protocol

Object of Study, Hypotheses or Research Questions
  1. Purpose of Study: Objectives of study; why was it done?
  2. Reference Population:
    • To whom do authors generalize results?
    • To whom should the findings be generalized?
Sampling

From the Reference Pop (target population) ->

Source Pop -> Eligible population

` The source population may be defined directly, as a matter of defining its membership criteria; or the definition may be indirect, as the catchment population of a defined way of identifying cases of the illness. The catchment population is, at any given time, the totality of those in the ‘were-would’ state of: were the illness now to occur, it would be ‘caught’ by that case identification scheme Source: Miettinen OS, 2007 http://www.teachepi.org/documents/courses/fundamentals/Pai_Lecture6_Selection%20bias.pdf `

Sample Pop:
  1. Refusals, Dropouts
  2. Participants -> Study Pop
Design of study
  1. Study setting: Where and when was the study done? What were the circumstances? Ethics?
  2. Type of study: Experimental vs natural, descriptive vs analytical (trial, cohort, case-control, prevalence, ecological, case-report, etc). If case-control or cohort, was the timing of data collection retrospective or prospective?
  3. Subjects: Who (number, age, sex, etc.)? How were they selected?
  4. Comparison groups: What control group or standard of comparison? How appropriate?
  5. Study size: Was the sample size adequate to give you confidence in the finding of “no association
  6. Bias and Confounding
    • a) Selection bias: Were groups comparable for subjects who entered and stayed in study? Selection influenced by exposure (c-c) or effect (cohort) under study? Drop-outs?
    • b) Confounding: Control of potential confounding variables in design of the study - matching or subject restriction?
Observations
  1. Procedure: How are the variables in the study defined and measured, ie how were data collected?
  2. Definition of terms: Are definitions of diagnostic criteria, measurements and outcome unambiguous? Could be reproduced?
  3. Bias and Confounding
    • a) Observation bias: Were study groups comparable for measurements or mode of observation? Mis-classification in determining exposure or disease categories? Differential between groups, or ‘random’?
    • b) Confounding: Information recorded on variables that could confound the association under study (to permit adjustment in the analysis)?

THANKS Prof Sleigh!

Posted in  bibliometrics and literature reviewing


R base graphics are fine except barplot

I concur with Jeff Leek that once spent time learning base graphics in R there is less incentive to learn ggplot2 http://simplystatistics.org/2016/02/11/why-i-dont-use-ggplot2/

However I always hate the way barplot works. Here is an example:

qc <- read.csv(textConnection("id,  OnlinePaper, Q, freq, totals,       prop
1,      Online,         ,1768,   9950, 0.17768844
2,      Online,      No ,4022,   9950, 0.40422111
3,      Online,     Yes ,4160,   9950, 0.41809045
4,       Paper,         , 256,   3355, 0.07630402
5,       Paper,      No , 979,   3355, 0.29180328
6,       Paper,     Yes ,2120,   3355, 0.63189270"))

qc1 <- cast(qc, OnlinePaper ~ Q74 , value = "prop")
qc1
barplot(as.matrix(qc1), beside = T, legend.text = qc1[,1], ylim = c(0,1))

/images/barplot_base.png

ggplot(data=qc, aes(x=Q, y=prop, fill=OnlinePaper)) +
    geom_bar(stat="identity", position=position_dodge())

/images/barplot_gg.png

Going to extremes

I should say though that I have found barplot can produce very customised graphs that serve a specific purpose such as that below (I have de-identified the content as this is unpublished research)

/images/barplot-gonuts.png

This made heavy use of the following approach

# original by Joseph Guillaume 2009

SideBySideBarPlot2 <- function(aggAllData, ...) {
  par(mar=c(8,7,4,2))
  bp<-barplot(aggAllData,
              horiz=FALSE,
              col=gray.colors(nrow(aggAllData)),
              las=1, axisnames = FALSE, ...)
  labels <- names(as.data.frame(aggAllData))
  text(bp, par('usr')[3], labels = labels, srt = 45, 
       adj = c(1.1,1.1), xpd = TRUE, cex=.9)
    return(bp)
}
# with width = xvar (proportions)

Posted in  exploratory data analysis


r-syntax-highlights-for-my-jekyll-powered-blog.md

Syntax Highlights

Until today I had no idea how to make code pretty in my blog posts which go to github after being first rendered locally so I can get the categories and tags.

Because github disables any plugins when it processes your blog I took Charlie Park’s advice. http://charliepark.org/jekyll-with-plugins/

This blog post solved it for me http://tuxette.nathalievilla.org/?p=1574

The trick is to write highlighter: pygments into the _config.yml and then:

% highlight r % # with curly braces
data("iris")
plot(iris$Sepal.Length ~ iris$Sepal.Width)
dat <- rnorm(1000,1,2)
% endhighlight % # with curly braces

Will render as:

data("iris")
plot(iris$Sepal.Length ~ iris$Sepal.Width)
dat <- rnorm(1000,1,2)

But I also pushed this to another site that I do use gh-pages to build and it sent me an email complaining:

You are attempting to use the 'pygments' highlighter, 
which is currently unsupported on GitHub Pages. 
Your site will use 'rouge' for highlighting instead. 
To suppress this warning, change the 'highlighter' value to 
'rouge' in your '_config.yml'. 

So there.

Posted in