Disentangle Things by Ivan Hanigan

Welcome to my Open Notebook

This is an Open Notebook with Selected Content - Delayed. All content is licenced with CC-BY. Find out more Here.

librarians-and-python

I stumbled on these posts about python and IPython Notebook by “Data Scientist Training for Librarians”:

I;ve been wanting to learn more python. I don’t think it’ll be ready for statistical modelling for a while, but I a want to be ready when it is. You can get my ipython notebook file for this here: olive.ipynb, but first run the following R code snippet to get the replication dataset ‘olive.csv’ into your home directory.

R Code: get the olive oil dataset

install.packages("pgmm")
require(pgmm)
data(olive)
names(olive) <- tolower(names(olive))
str(olive)
write.csv(olive, "olive.csv", row.names = F)
# actualy home might be easier for ipython to access
write.csv(olive, "~/olive.csv", row.names = F)

OK now reproduce the example

I quite like the histograms from the second example. Here is the raw code extracted from the ipynb

Code:

%pylab inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.colors as colors
 
acidlist=['palmitic', 'palmitoleic', 'stearic', 'oleic', 'linoleic', 'linolenic', 'arachidic', 'eicosenoic']
dfsub=df[acidlist].apply(lambda x: x/100.0)
dfsub.head()
 
rkeys=[1,2,3]
rvals=['South','Sardinia','North']
rmap={e[0]:e[1] for e in zip(rkeys,rvals)}
rmap
 
fig, axes=plt.subplots(figsize=(10,20), nrows=len(acidlist), ncols=1) # sets up the framework for the graphs. Acidlist is defined elsewhere, and is a list of the acids we’re interested in.
i=0 # Sets a counter to 0
 
for ax in axes.flatten(): # Starts a loop to go through our plot and render each row
 
    acid=acidlist[i]
    seriesacid=df[acid] # creates seriesacid and sets it to df[acid], a list of the percent composition of the acid in the current iteration that’s in each olive oil.
 
    minmax=[seriesacid.min(), seriesacid.max()] # the minimum and maximum values plotted will be the minimum and maximum percentages that we find in the data
 
    for k,g in df.groupby('region'):  # starts a loop in the loop to plot the values by region
        style = {'histtype':'stepfilled', 'alpha':0.5, 'label':rmap[k], 'ax':ax}
        g[acid].hist(**style)
 
        ax.set_xlim(minmax)
 
        ax.set_title(acid)
 
        ax.grid(False)
 
        #construct legend
        ax.legend()
    i=i+1 # increments the counter, to move the loop on to the next acid.
 
    fig.tight_layout()

Conclusions

I am not sure how to do the transparency but the rest of it would make more sense to me with R
Will try to reproduce in R for head-to-head shoot out.
I also plan to look into the new shinyAce browser based R editor for similar work
I’m still really enjoying Emacs Orgmode for this kind of functionality though and thoroughly recommend KJ Healy’s starter kit
(I’ve installed and config thison Ubuntu, Windoze and Mac, but LaTeX and exporting R code can be tricky)