AEKOS Test 1
Introduction
- This is a report of the first attempt to play with the new Australian Terrestrial Ecosystem Research Network's data portal 'AEKOS'
- I have to admit at the outset I am not that interested in searching for actual data, just interested in how AEKOS exposes the data.
- This is a Reproducible Research Report, written using Emacs orgmode to construct a knitr document and creates a HTML output. I've made the code available on github here
Methods
Download from portal
- went to the website and tried a few queries until I found something with data, added to cart and proceeded to checkout
- NB I don't even remember what search criteria I put together to get these data. The first couple I tried (focused on Drought) did not find any data. I think this was about some species or other.
- got an email with link to download the data 
- downloaded zip to my ~/data/aekos_tests directory 
- unzipped and now have some SQL setup files and some pdfs 
- I can now see that the appropriate citation is 
State of South Australia (Department of Environment, Water and Natural Resources, South Australia) (2012)
Dalhousie Survey (Scientific Expedition Group) Survey, South Australian Biological Survey Program, Version 1 /2014. <em>State of South Australia
(Department of Environment, Water and Natural Resources, South Australia)</em>, Adelaide, South Australia. Obtained via ÆKOS Data Portal (<a
href="http://www.portal.aekos.org.au/" target="_blank" >http://www.portal.aekos.org.au/</a>) at TERN Eco-informatics, The University of Adelaide.
Accessed [<em>dd mmm yyyy</em>, e.g. 01 Apr 2010].
Load to postgres database
- I run postgres so I didn't need to install it but it is good that there are instructions here if you do need to.
- I went straight to the linux step 8 (page 6) “Create the database schema from the downloaded schema”
psql -U postgres -d aekos -h localhost -p 5432 -f _DatabaseImport-PostgreSQLSchemaUpdater.sql  
- then on step 10 “Establish the relationships between the tables:
psql -U postgres -d aekos -h localhost -p 5432 -f relationships_PostgreSQL.sql
- Now I have a database…. so what do I do now?
Exploratory Data Analysis
Load R packages
- Load some useful tools I made for exploring data with R and PostgreSQL
if (!require(swishdbtools)) {
    require(devtools)
    install_github("swishdbtools", "swish-climate-impact-assessment")
} else {
    require(swishdbtools)
}
if (!require(disentangle)) {
    require(devtools)
    install_github("disentangle", "ivanhanigan")
} else {
    require(disentangle)
}
if (!require(gisviz)) {
    require(devtools)
    install_github("gisviz", "ivanhanigan")
} else {
    require(gisviz)
}
List Tables
- The first thing I usually do is look at the list of tables available
require(swishdbtools)
require(xtable)
ch <- connect2postgres2("aekos_tests")
lst <- pgListTables(ch, "public")
nrow(lst)
[1] 95
# there are a lot.  here is the head
tbl <- xtable(head(lst))
print(tbl, type = "html", include.rownames = F)
| relname | nspname | 
|---|---|
| address | public | 
| aerialimagecoordinate | public | 
| aerialphotograph | public | 
| ageclass | public | 
| agevalue | public | 
| areavalue | public | 
Variable Distributions
- choose a table and check it out
lst <- pgListTables(ch, "public")
tbl <- sample(lst[, 1], 1)
print(tbl)
[1] "vct_text_addressline”
# First time I tried this I retreived [1] 'crownseparationratiovalue' from
# the _TableDefinitions.pdf I can see this is defined as
### 'Average distance between the edges of the crowns divided by the average
### width of the crowns'
# with only two cols, boring, try EXTRACTSPECIESPRESENCE
tbl <- tolower("EXTRACTSPECIESPRESENCE")
dat <- dbGetQuery(ch, sprintf("select *\n  from %s\n  ", tbl))
# str(dat) what is the code for speciesname?
Identify the relational tables
- I was not using the DbVisualizer tool before so I had to install it
- this lets me easily see which tables are related to this table/variable.names
- I think pgAdmin > Reports > Dependents report could have been used to figure out what the related tables/variable.names were, but it would have been a bit harder.
Summarise the variable distributions
# from this I can see I need to get speciesname from speciesconcept
dat <- dbGetQuery(ch, "select *\nfrom extractspeciespresence tb1\n left join speciesconcept tb2\n on tb1.speciesname = tb2._pk_id")
# str(dat) dat
dd <- data_dictionary(dat, show_levels = 6)
print(xtable(dd), type = "html", include.rownames = F)
| Variable | Type | Attributes | Value | Count | Percent | 
|---|---|---|---|---|---|
| _pk_id | character | EXTRACTSPECIESPRESENCE_T1399266119935 | 1 | 0.74 | |
| EXTRACTSPECIESPRESENCE_T1399266119945 | 1 | 0.74 | |||
| EXTRACTSPECIESPRESENCE_T1399266119952 | 1 | 0.74 | |||
| EXTRACTSPECIESPRESENCE_T1399266119959 | 1 | 0.74 | |||
| EXTRACTSPECIESPRESENCE_T1399266119966 | 1 | 0.74 | |||
| EXTRACTSPECIESPRESENCE_T1399266119973 | 1 | 0.74 | |||
| more than 6 levels. list truncated. | |||||
| abundanceunits | character | MEASUREMENTUNIT_T1399259767120 | 135 | 100 | |
| aekoslink | character | aekos.org.au/collection/sa.gov.au/bdbsa_veg/survey_161/19536 | 11 | 8.15 | |
| aekos.org.au/collection/sa.gov.au/bdbsa_veg/survey_161/19537 | 12 | 8.89 | |||
| aekos.org.au/collection/sa.gov.au/bdbsa_veg/survey_161/19538 | 12 | 8.89 | |||
| aekos.org.au/collection/sa.gov.au/bdbsa_veg/survey_161/19541 | 14 | 10.37 | |||
| aekos.org.au/collection/sa.gov.au/bdbsa_veg/survey_161/19543 | 22 | 16.3 | |||
| aekos.org.au/collection/sa.gov.au/bdbsa_veg/survey_161/19544 | 8 | 5.93 | |||
| more than 6 levels. list truncated. | |||||
| location | character | SPATIALPOINT_1400572564352 | 1 | 0.74 | |
| SPATIALPOINT_1400572564353 | 1 | 0.74 | |||
| SPATIALPOINT_1400572564354 | 1 | 0.74 | |||
| SPATIALPOINT_1400572564355 | 1 | 0.74 | |||
| SPATIALPOINT_1400572564356 | 1 | 0.74 | |||
| SPATIALPOINT_1400572564357 | 1 | 0.74 | |||
| more than 6 levels. list truncated. | |||||
| locationreliability | character | LOCATIONRELIABILITY_T1399266119936 | 1 | 0.74 | |
| LOCATIONRELIABILITY_T1399266119946 | 1 | 0.74 | |||
| LOCATIONRELIABILITY_T1399266119953 | 1 | 0.74 | |||
| LOCATIONRELIABILITY_T1399266119960 | 1 | 0.74 | |||
| LOCATIONRELIABILITY_T1399266119967 | 1 | 0.74 | |||
| LOCATIONRELIABILITY_T1399266119974 | 1 | 0.74 | |||
| more than 6 levels. list truncated. | |||||
| lowerabundance | number | Min. | 0 | ||
| 1st Qu. | 0 | ||||
| Median | 0 | ||||
| Mean | 0 | ||||
| 3rd Qu. | 0 | ||||
| Max. | 0 | ||||
| speciesname | character | SPECIESCONCEPT_T1399262855831 | 1 | 0.74 | |
| SPECIESCONCEPT_T1399262856463 | 4 | 2.96 | |||
| SPECIESCONCEPT_T1399262856593 | 2 | 1.48 | |||
| SPECIESCONCEPT_T1399262857040 | 1 | 0.74 | |||
| SPECIESCONCEPT_T1399262857772 | 5 | 3.7 | |||
| SPECIESCONCEPT_T1399262857780 | 1 | 0.74 | |||
| more than 6 levels. list truncated. | |||||
| timeend | date | Min. | 2003-07-20 | ||
| 1st Qu. | 2003-07-21 | ||||
| Median | 2003-07-21 | ||||
| Mean | 2003-07-21 | ||||
| 3rd Qu. | 2003-07-22 | ||||
| Max. | 2003-07-22 | ||||
| timestart | date | Min. | 2003-07-20 | ||
| 1st Qu. | 2003-07-21 | ||||
| Median | 2003-07-21 | ||||
| Mean | 2003-07-21 | ||||
| 3rd Qu. | 2003-07-22 | ||||
| Max. | 2003-07-22 | ||||
| upperabundance | number | Min. | 0 | ||
| 1st Qu. | 0 | ||||
| Median | 0 | ||||
| Mean | 0 | ||||
| 3rd Qu. | 0 | ||||
| Max. | 0 | ||||
| _pk_id | character | EXTRACTSPECIESPRESENCE_T1399266119935 | 1 | 0.74 | |
| EXTRACTSPECIESPRESENCE_T1399266119945 | 1 | 0.74 | |||
| EXTRACTSPECIESPRESENCE_T1399266119952 | 1 | 0.74 | |||
| EXTRACTSPECIESPRESENCE_T1399266119959 | 1 | 0.74 | |||
| EXTRACTSPECIESPRESENCE_T1399266119966 | 1 | 0.74 | |||
| EXTRACTSPECIESPRESENCE_T1399266119973 | 1 | 0.74 | |||
| more than 6 levels. list truncated. | |||||
| authorship | character | A.Cunn. ex Benth. | 2 | 1.48 | |
| (Benth.) Paul G.Wilson | 2 | 1.48 | |||
| Blakely & Jacobs | 1 | 0.74 | |||
| (Boiss.) D.C.Hassall | 2 | 1.48 | |||
| DC. | 8 | 5.93 | |||
| Domin | 1 | 0.74 | |||
| more than 6 levels. list truncated. | |||||
| genuspart | character | Acacia | 11 | 8.15 | |
| Aristida | 1 | 0.74 | |||
| Atriplex | 6 | 4.44 | |||
| Chamaesyce | 2 | 1.48 | |||
| Cucumis | 1 | 0.74 | |||
| Cyperus | 3 | 2.22 | |||
| more than 6 levels. list truncated. | |||||
| hybridtype | character | SPECIESCONCEPTHYBRIDTYPE_T1399259772337 | 135 | 100 | |
| indexnames | character | SPECIESCONCEPT_T1399262855831 | 1 | 0.74 | |
| SPECIESCONCEPT_T1399262856463 | 4 | 2.96 | |||
| SPECIESCONCEPT_T1399262856593 | 2 | 1.48 | |||
| SPECIESCONCEPT_T1399262857040 | 1 | 0.74 | |||
| SPECIESCONCEPT_T1399262857772 | 5 | 3.7 | |||
| SPECIESCONCEPT_T1399262857780 | 1 | 0.74 | |||
| more than 6 levels. list truncated. | |||||
| infraspecificepithet | character | arida | 1 | 0.74 | |
| dielsii | 2 | 1.48 | |||
| glabra | 8 | 5.93 | |||
| holathera | 1 | 0.74 | |||
| microcephala | 1 | 0.74 | |||
| nummularia | 4 | 2.96 | |||
| more than 6 levels. list truncated. | |||||
| namewithauthorship | character | Acacia ligulata A.Cunn. ex Benth. | 2 | 1.48 | |
| Acacia salicina Lindl. | 8 | 5.93 | |||
| Acacia victoriae subsp. arida | 1 | 0.74 | |||
| Aristida holathera Domin var. holathera | 1 | 0.74 | |||
| Atriplex holocarpa F.Muell. | 2 | 1.48 | |||
| Atriplex nummularia Lindl. subsp. nummularia | 4 | 2.96 | |||
| more than 6 levels. list truncated. | |||||
| namewithoutauthorship | character | Acacia ligulata | 2 | 1.48 | |
| Acacia salicina | 8 | 5.93 | |||
| Acacia victoriae subsp. arida | 1 | 0.74 | |||
| Aristida holathera var. holathera | 1 | 0.74 | |||
| Atriplex holocarpa | 2 | 1.48 | |||
| Atriplex nummularia subsp. nummularia | 4 | 2.96 | |||
| more than 6 levels. list truncated. | |||||
| rank | character | TAXONOMICRANK_T1399259772301 | 118 | 87.41 | |
| TAXONOMICRANK_T1399259772302 | 6 | 4.44 | |||
| TAXONOMICRANK_T1399259772304 | 11 | 8.15 | |||
| speciesconcepttype | character | SPECIESCONCEPTTYPE_T1399259772313 | 102 | 75.56 | |
| SPECIESCONCEPTTYPE_T1399259772322 | 16 | 11.85 | |||
| SPECIESCONCEPTTYPE_T1399259772325 | 8 | 5.93 | |||
| SPECIESCONCEPTTYPE_T1399259772327 | 9 | 6.67 | |||
| specificepithet | character | appressa | 2 | 1.48 | |
| argenteus | 1 | 0.74 | |||
| australis | 7 | 5.19 | |||
| billardierei | 8 | 5.93 | |||
| constricta | 1 | 0.74 | |||
| coolabah | 1 | 0.74 | |||
| more than 6 levels. list truncated. | |||||
| vernacularname | character | SPECIESCONCEPT_T1399262856463 | 4 | 2.96 | |
| SPECIESCONCEPT_T1399262856593 | 2 | 1.48 | |||
| SPECIESCONCEPT_T1399262857040 | 1 | 0.74 | |||
| SPECIESCONCEPT_T1399262857772 | 5 | 3.7 | |||
| SPECIESCONCEPT_T1399262858298 | 1 | 0.74 | |||
| SPECIESCONCEPT_T1399262858442 | 2 | 1.48 | |||
| more than 6 levels. list truncated. | 
- it is weird that the lowerabundance and upperabundance are both zero, must be because these are species presence data
- but it would be better for this to be NA
Spatial Data
- I can also see from the DbVisualizer graph that the spatialpoint table has the x and y coordinates for the location variable
- I then also had to find the table with the coordinatereferencesystem (crs)
- The code to make the plot is next and the plot of the spatial data is below.
#### name:spatial####
require(swishdbtools)
require(gisviz)
require(sqldf)
ch <- connect2postgres2("aekos_tests")
lst <- pgListTables(ch, "public")
#lst 
tbl <- lst[grep("spatial", lst[,1]),]
pts  <- dbGetQuery(ch, paste("select * from ", tbl))
# str(pts)
# t(pts[1,])
  dat <- dbGetQuery(ch,
  "select *
  )
pts <- dbGetQuery(ch,
"select tb1._pk_id as id, namewithoutauthorship, x, y, tb4.name
 from extractspeciespresence tb1
 join
 speciesconcept tb2
 on tb1.speciesname = tb2._pk_id
 join 
 spatialpoint tb3
 on tb1.location = tb3._pk_id
 join 
 coordinatereferencesystem tb4
 on tb3.crs = tb4._pk_id
")
# str(pts)
# GDA94
epsg <- make_EPSG()
# names(epsg)
crs <- epsg[which(epsg$code == 4283),'prj4']
# there are multiple species per site, so summarise first
pts <- sqldf("select x, y, count(*)
from pts
group by x, y
", drv = 'SQLite')
pts <- SpatialPointsDataFrame(pts[,c('x','y')], pts, proj4string = CRS(crs))
#str(pts)
writeOGR(pts, "spatialpoint.shp", "spatialpoint", driver= "ESRI Shapefile")
png("plot.png")
plotMyMap(pts, xl = c(110,155), yl = c(-40,-10))
title("dalhousie survey")
dev.off()
Zoom in on the point locations
- using the number of species names at each location as a count
Conclusions
- I really like how this model emphasises the use of databases for data management (and postgres or mysql are great options)
- I think the instructions for the load process are good but with typos makes it difficult
- for a casual browser this is not so much fun

 
            



