Wednesday 6 July 2016

Free data and data-collector's remorse

Free data and data-collector's remorse


If you've ever designed your own field study in ecology or environmental science, there's a very good chance that you experienced what I call "data-collector's remorse" at the end of the field season. Much in the same way that "buyer's remorse" is a feeling of regret at having bought something for which your appreciation is flagging, data-collector's remorse is essentially regretting having collected too much, or too little data, during that carefully planned and budgeted 8-weeks you spent in the field.

For me, data-collector's remorse tends to manifest in one of two ways: (1) I start to run some preliminary analyses on the data, find out that have too many variables, and end up having to cut some out (meaning I wasted my time collecting them), or (2) during the course of follow-up research, I discover a half-dozen new papers that all describe a much better way of measuring a variable, or a much better set of variables to collect data on, than what I just finished doing. Obviously, these two ways are not equally damaging to a project, because while you can always save excess data for other work (or to use as a demo for a class you're teaching), we seldom have a chance to go back and collect a new dataset to get those things we wish that we collected in the first place.

Sound familiar?

Thankfully, we live in a world that is awash with data; there are literally hundreds of websites, papers, and databases with almost everything you need to fill in the blanks without leaving your office.

For example, during my M.Sc. field work in northern B.C., I measured the basal and breast-height diameters of tree trunks, thinking that it would be all I needed. Later on, I had a great idea to build a model of canopy tree influence on understory vegetation using falling-object formulae from Newtonian physics, but this model needed the height of the trees to work.

Knowing that foresters often only collect the diameter at breast height (DBH) of trees when doing their timber-cruising, I figured there must be something published on modelling the height of lodgepole pine trees using basal diameter or DBH. Within about 15 minutes on the Web of Science, I had located a paper that did just that, and was able to use the equations from that paper to get what I think were pretty accurate estimates of tree height (I tested it on a few local trees in the Prince George area to make sure).

In a more recent example, I was lamenting not having collected rainfall data for some study sites on a project where I used microclimate to predict the cover of epixylic bryophytes and lichens on rotting logs in the forest. What I did have was relative humidity (RH) data, and access to Environment Canada's online repository of historic weather data, on which I found a weather station that was only 5 km away from my study sites. After downloading the weather data, and synching it up with the timing and periodicity of my own RH data, I realized that the weather station recorded rainfall whenever my data loggers were registering RH of 96% or greater. So even though I still lacked information on total rainfall amounts for my specific location, I was pretty confident that I could predict when it was and wasn't raining over my study sites.

The take-home message is this: if you find yourself experiencing data-collector's remorse at the end of the field season, stay calm and do what scientists do best - RESEARCH!!! Consult the literature, consult your peers about where they get different types of data, and consult good old-fashioned Google. And for the sake of all us scientists out there for whom only hind-sight is 20-20, please share your data online once you've published.

Here are some of my favourite sources for online data - please feel free to share yours with me :)

Climate / Weather data
http://climate.weather.gc.ca/
http://cfcg.forestry.ubc.ca/projects/climate-data/climatebcwna/ http://collaboration.cmc.ec.gc.ca/science/rpn/modcom/eole/wind-atlas-selection.html

Mapping / GIS data -
http://www.nrcan.gc.ca/earth-sciences/geography/topographic-information/free-data-geogratis/11042
http://geobc.gov.bc.ca/
http://www.snb.ca/geonb1/

Species occurrence data - 
http://bryophyteportal.org/portal/
http://lichenportal.org/portal/
http://collections.nature.ca/en/Search
http://www.birdscanada.org/birdmon/?lang=EN 

Pollution
http://globalnews.ca/news/622513/open-data-alberta-oil-spills-1975-2013/
https://www.ec.gc.ca/inrp-npri/default.asp?lang=En&n=B85A1846-1

Statistics Canada (mostly social & economic data)
http://www.statcan.gc.ca/eng/start