Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

r-eml-pgis-setting-up-the-software-stack

r-eml

this is the R package for authoring EML

pgis

This is a spatial database

System Description

Ubuntu 12.04 LTS
Macpro Tower

EML is a useful beast

setting-up-the-software-stack

First I needed devtools and this required RCurl linux package dependencies

Code

sudo apt-get install libcurl libcurl-devel libcurl4-gnutls-dev

Now trying to install EML

Code

require(devtools)
install_github("EML", "ropensci")

then XML Depends

Code

sudo apt-get install libcurl4-openssl-dev libxml2-dev
sudo apt-get install r-cran-xml

rJava can be difficult, I was really dreading this but eventually stackoverflow to the rescue!

Code

sudo apt-get install openjdk-7-*
sudo apt-get install r-cran-rjava
update-java-alternatives -l
sudo update-java-alternatives -s java-1.7.0-openjdk-amd64 
sudo R CMD javareconf
R
install.packages("rJava")
library(rJava)

now complaining about ‘RHTMLForms’ and ‘RWordXML’

Code:

install_github("RHTMLForms", "omegahat")
install.packages("RWordXML", repos="http://www.omegahat.org/R", type="source")

on we go

Code

install.packages(c("knitr", "rfigshare", "testthat", "RCurl", "dataone", "rrdf"))

RGDAL

Spatial toolbox

Code

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install libgdal1-dev libproj-dev
gdal-config --version
sudo R
install.packages("rgdal")

PostgreSQL 9.3

following comes from http://technobytz.com/install-postgresql-9-3-ubuntu.html

Code:

sudo apt-get update
sudo apt-get -y install python-software-properties
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ precise-pgdg main" >> /etc/apt/sources.list.d/postgresql.list'
sudo apt-get update
sudo apt-get install postgresql-9.3 pgadmin3
sudo -u postgres psql
create role user_name login password 'passwd';
show hba_file ;
#/etc/postgresql/9.3/main/pg_hba.conf
\q
sudo nano /etc/postgresql/9.3/main/pg_hba.conf
# add users to listen for at the firewall
sudo nano /etc/postgresql/9.3/main/postgresql.conf
# listenaddresses = *
sudo services postgresql restart
sudo ufw allow from my.ip.address.0/24 to any port 5432

PostGIS 2.0

following comes from http://technobytz.com/install-postgis-postgresql-9-3-ubuntu.html

Code:

sudo apt-get update
sudo apt-get install postgresql-9.3-postgis-2.1 -f
sudo su
su - postgres
createdb postgis_ltern
psql -d postgis_ltern -U postgres  
CREATE EXTENSION postgis;  
CREATE EXTENSION postgis_topology;  
CREATE ROLE public_group;
CREATE ROLE ivan_hanigan LOGIN PASSWORD 'password';
GRANT public_group TO ivan_hanigan;
grant usage on schema public to public_group;
GRANT select ON ALL TABLES IN SCHEMA public TO public_group;
grant execute on all functions in schema public to public_group;
grant select on all sequences in schema public to public_group;
grant select on table geometry_columns to public_group;
grant select on table spatial_ref_sys to public_group;
grant select on table geography_columns to public_group;
grant select on table raster_columns to public_group;
grant select on table raster_overviews to public_group;

For transforming AGD 66 to GDA94

A special transformations grid file is required to be added to the PROJ.4 files for reprojecting the Australian projections AGD66 to GDA94.
Thanks to Joe Guillaume and Francis Markham for providing this solution.

Code: transformations grid for Australian projections

cd /usr/share/proj
wget http://www.icsm.gov.au/gda/gdatm/national66.zip  
unzip national66.zip
mv "A66 National (13.09.01).gsb" aust_national_agd66_13.09.01.gsb
su - postgres 
psql -d mydb

UPDATE spatial_ref_sys SET
proj4text='+proj=longlat +ellps=aust_SA +nadgrids=aust_national_agd66_13.09.01.gsb +wktext'
where srid=4202;
\q
exit

make a spatial dataset

Code

require(devtools)
install_github("swishdbtools", "swish-climate-impact-assessment")
require(swishdbtools)
require(ggmap)
 
ch  <- connect2postgres2("postgis_ltern")
 
location_names <- c("linnaeus way acton canberra act", "biology place acton canberra act")
locations <- geocode(location_names)
locations <- cbind(location_names, locations)
 
dbWriteTable(ch, "anu_gisforum_locations", locations, row.names = F)
sql <- points2geom("public", "anu_gisforum_locations", col_lat = "lat", col_long = "lon")
cat(sql)
dbSendQuery(ch,
            sql
            )

This can now be shown in Quantum GIS

images/gisforum_locations.png

TODO:

add EML for this using R
finesse with Morpho (requires further Java shennanigans)
publish something to KNB using R dataone package

Posted in disentangle

12 Jul 2014

data-dictionary-function-needed-further-upgrades-to-my-misc-package

I’ve been enjoying summarising datasets with my data_dictionary function
There was a bug when a date variable had any missings.
I have modified the function to cope with these so have released version 1.2
Windows Version is Downloadable Here
Linux and Mac users can just run this R code

R Code:

require(devtools)
install_github("disentangle", "ivanhanigan")

Posted in disentangle

24 Jun 2014

data-dictionary-function-needed-upgrade-to-my-misc-package

Recently I got a lot of files with columns that are all missing. I had to modify the ‘data_dictionary’ function to cope with these so have released version 1.1
Windows Version is Downloadable Here
Linux and Mac users can just run this R code

R Code:

require(devtools)
install_github("disentangle", "ivanhanigan")

In my ‘disentangle’ package of miscellaneous tools, the data_dictionary function is designed to produce descriptive summary statistics in a familiar way to the FREQUENCIES command for SPSS
I chose to emulate the SPSS output because I think it is a pretty decent summary statistics table, and there are loads of users who have already been introduced to this style of output
I wrote it because I had not got exactly what I wanted from the reporttools or stargazer packages which also have similar summary stats functions
I chose to call it data_dictionary because I think that the alternatives (like ‘frequencise’ or ‘code_book’ are not as immediately intuitive as to what you get)
the function I wrote returns a data.frame with the variable name, a simplified type (character, number or date)

Code:data-dictionary-function-needed-upgrade-to-my-misc-package

# functions
require(devtools)
install_github("disentangle", "ivanhanigan")
require(disentangle)
require(xtable)

# load
fpath <- system.file("extdata/civst_gend_sector.csv", package = "disentangle")
fpath
civst_gend_sector <- read.csv(fpath)
civst_gend_sector$datevar <- as.Date(round(rnorm(nrow(civst_gend_sector), Sys.Date(),10)), origin = "1970-01-01")
civst_gend_sector$missing_blahblah_variable  <- NA

# check
str(civst_gend_sector)

#do
data_dict(civst_gend_sector, "civil_status")
data_dict(civst_gend_sector, "datevar")
data_dict(civst_gend_sector, "number_of_cases")
data_dict(civst_gend_sector, "missing_blahblah_variable")

dataDictionary <- data_dictionary(civst_gend_sector,
                                  show_levels = -1)

print(xtable(dataDictionary), type = "html", include.rownames = F)

Variable	Type	Attributes	Value	Count	Percent
civil_status	character	divorced/widowed		6	33.33
		married		6	33.33
		single		6	33.33
gender	character	female		9	50.00
		male		9	50.00
activity_sector	character	primary		6	33.33
		secondary		6	33.33
		tertiary		6	33.33
number_of_cases	number	Min.	0
		1st Qu.	5
		Median	9
		Mean	15.17
		3rd Qu.	17
		Max.	50
datevar	date	Min.	2014-05-26
		1st Qu.	2014-06-07
		Median	2014-06-15
		Mean	2014-06-14
		3rd Qu.	2014-06-18
		Max.	2014-07-05
missing_blahblah_variable	missing	NA's		18	100.00

Posted in disentangle

18 Jun 2014

reflecting-on-aekos-data-portal-test

In my last post on AEKOS Dta Portal I reported my test of the Australian Terrestrial Ecosystem Research Network (TERN) AEKOS Data Portal system implemented by Uni Adelaide’s Eco-informatics Facility.
I’ve reflected a bit on this process and I want to jot down some notes

Database restore scripts are complicated and inhibit access

I’d re-iterate my conclusion that I really like how this model emphasises the use of databases for data management (and postgres or mysql are great options)
but for a less-technically savvy person or for a casual browser just looking for some quick data to play with this is not easy enough
for a good description of alternative approaches see http://flowingdata.com/2014/06/10/how-to-make-government-data-sites-better/ which is a good set of requests for simple and efficient access to data via Government portals in the USA
I spoke with Squid and Matt the developers and they described the new enhancement they are working on now to provide a simple flat file download so that will be great.

Linking Exploratory Data Analyses back to metadata and source data documentation

In my normal mode of operating I spend more time reviewing the metadata and source data documentation before exploring data (much like the workflow described at http://simplystatistics.org/2014/06/13/what-i-do-when-i-get-a-new-data-set-as-told-through-tweets/, especially see Step 1: Learn about the elephant)
but this time I just wanted to get some data out and so went full steam ahead like a bull at a gate
Once I had my map visualisation done I decided to go back to the documentation to read up a bit more on why these locations were studied and other contextual metadata
But when I look in the zip I downloaded I investigated the citation.pdf and the tabledefinitions.pdf but these don’t really have much in it for me to link back to the source of the data and the documentation I need
See the citation link h ref=”http://www.portal.aekos.org.au/” target=”_blank”
so hunting around the portal again I search for the survey name “Department of Environment, Water and Natural Resources, South Australia - Dalhousie Survey (Scientific Expedition Group) Survey”
and got 10 hits, chose number 1
skim-read of overview and scope. seems the link out from the overview/abstract is the most useful to get an overview
moved on to methods and I really like the way the segments collapse and expand as you read them, but didn’t like how I had to keep navigating back and forward with the browser (like between Study Location Selection Method and Study Location Visit Method, each time I went back I had to scroll to bottom of page to get to list of sections)
Finally I got to explore the observation diagram and methods diagram which I have not seen before. These seem like a great way to show the information and let people browse through the concepts etc…
But I think I still prefer an old school document with a table of contents and some kind of index.

Posted in Data Documentation

15 Jun 2014

my-feed-filesize-is-larger-than-512k-need-to-reduce-its-size-in-order-for-feedburner-to-process-it

I am not sure but I suspect this post caused my feedburner xml to become too big and stop the updates to my feedly account
Found this out by logging in to http://feedburner.google.com and trying to re-sync my feed (under troubleshootize)
I may have tipped the size over the edge with this post
the issue is probably due to me starting to use knitr (see this post ) to produce the HTML reports which encodes the images inline, whereas I used to use the old school orgmode approach of referencing the png in the images directory
looking at the feed/index.xml I saw that this includes every post I ever wrote
I fixed it by going back to the instructions I used to set up the feed
and adding limit:10 to the line of the loop ‘for post in site.posts’

Posted in Data Documentation

15 Jun 2014

« Previous Next »

Welcome to my Open Notebook

r-eml-pgis-setting-up-the-software-stack

r-eml

pgis

System Description

EML is a useful beast

setting-up-the-software-stack

Code

Code

Code

Code

Code:

Code

RGDAL

Code

PostgreSQL 9.3

Code:

PostGIS 2.0

Code:

For transforming AGD 66 to GDA94

Code: transformations grid for Australian projections

make a spatial dataset

Code

This can now be shown in Quantum GIS

TODO:

data-dictionary-function-needed-further-upgrades-to-my-misc-package

R Code:

data-dictionary-function-needed-upgrade-to-my-misc-package

R Code:

Code:data-dictionary-function-needed-upgrade-to-my-misc-package

reflecting-on-aekos-data-portal-test

Database restore scripts are complicated and inhibit access

Linking Exploratory Data Analyses back to metadata and source data documentation

my-feed-filesize-is-larger-than-512k-need-to-reduce-its-size-in-order-for-feedburner-to-process-it

About

Recent Entries

Categories

Entries grouped by Tags

Welcome to my Open Notebook

r-eml-pgis-setting-up-the-software-stack

r-eml

pgis

System Description

EML is a useful beast

setting-up-the-software-stack

Code

Code

Code

Code

Code:

Code

RGDAL

Code

PostgreSQL 9.3

Code:

PostGIS 2.0

Code:

For transforming AGD 66 to GDA94

Code: transformations grid for Australian projections

make a spatial dataset

Code

This can now be shown in Quantum GIS

TODO:

data-dictionary-function-needed-further-upgrades-to-my-misc-package

R Code:

data-dictionary-function-needed-upgrade-to-my-misc-package

R Code:

Code:data-dictionary-function-needed-upgrade-to-my-misc-package

reflecting-on-aekos-data-portal-test

Database restore scripts are complicated and inhibit access

Linking Exploratory Data Analyses back to metadata and source data documentation

my-feed-filesize-is-larger-than-512k-need-to-reduce-its-size-in-order-for-feedburner-to-process-it

Subscribe

About

Recent Entries

Categories

Entries grouped by Tags