In mid-September I was approached by Carrie Saxifrage, who wanted to pick my brains. Specifically, she wondered whether I might provide feedback on the idea of enlisting volunteers to “ground-truth” the existing forest inventory for Cortes Community Forest (CCF). Not knowing anything at all about Cortes Island, I said I’d be happy to look into it.
This proposal is the result of my looking.
There’s a wealth of information available, including digital maps, available through the Cortes Community Forest General Partnership (CCFGP) website here. The *.pdf files are georeferenced, which means they can be imported into a GIS (geographic information system) such as ArcMap or QGis. The maps are great… but not as useful as they might be…because a user doesn’t have access to the underlying data… and thus can’t fully explore what the maps mean.
After more homework…I learned that the raw shapefiles (with attached data) are freely-available here, courtesy of the B.C. Ministry of Forests. An important detail concerns the map projection used…they use “NAD 83″…and if you explore this further on your own, remember that map-projection matters in all things GIS!
Ok, so now I’ve “reprojected” the maps (so they’ll work equally nicely on my android phone or your Iphone or Google Earth. More importantly, I’m now able to re-create the existing maps by “querying” or “expressing” the data in different ways.
At left, for example, polygons have been color-coded by “leading” species, e.g., a forest dominated by Western Hemlock is “HW”, while one dominated by Douglas Fir is “FD”.
I’ve also removed all the “water” polygons, and everything else that’s not on Cortes Island (Figure 2). I’m interested in the community forest – but it’s always nice to step back and maintain a broader perspective.
At right (Figure 3) I’ve clipped all the polygons by using only those that fall within the boundaries of the community forest, and I’ve also expressed the data as “projected age-class” (as was done in Figure 1).
My color-coding differs from that provided in Figure 1 (I used a simple color-ramp). Thus, in my version recent cutblocks are white, and older forests are shown as increasingly darker shades of green.
Okay. So it’s taken me a few days to re-create what I was able to download, but didn’t fully appreciate, after Carrie first asked me to “look into things”.
Welcome to the arcane world of GIS mapping of ecological data.
Here, as they say, is where things get interesting.
What’s the problem?
On the surface of it, you’re in good shape. You already have a “Vegetation Resource Inventory” (VRI) together with things like the “Sensitive Ecosystem Inventory” and the “Forest Tenure” polygons. However, the question is: how good are the data?
As a first effort, I picked the westernmost block of the community forest (highlighted). Within this block I selected a smaller sub-sample area. Think of this as a “pilot study” area.
It’s about 64 ha (128 acres) in size. This small area represents about 1.7% of the 3786 ha (9577 acres) contained within the community forest. In spatial terms, this is truly a “tiny” sample.
If you expand this map, you’ll find that it contains n=12 discrete vegetation polygons. If you query the underlying data, you’ll find that it contains n=3 unique “leading tree species” (it’s mostly Douglas fir) and n=7 unique “age-classes”.
The raw data contain fields for everything from “canopy-closure” to “stem-density” to “canopy-height”. Specifically, there are n=186 fields in the existing *dbf file , (most of which are Greek to me) that a professional forester could set right in five minutes. But that misses the point. With so many attributes, GIS datasets can grow large…and complicated…very quickly. Sometimes it’s easy to miss the obvious.
Because I knew nothing…I started with that. Specifically, I converted my “pilot-study box” into “keyhole” (*.kml) format, booted up Google Earth, and went flying around at low altitude in search of interesting “discrepancies” between the existing VRI polygons and what I could see.
Did I find any? Yes. Of course. There will be discrepances, in any dataset. Especially when you remember that the data come from a set that encompasses the entirety of the forests of the Province of British Columbia. Which is a pretty big area.
It’s hardly surprising that when I focus on a single polygon, and zoom in far enough, some obvious questions emerge:
“hey those tree crowns are really big, and these ones are visibly smaller…can those really be all 61-80 year-old Douglas firs?”
The question becomes: is that important?
Well it might be. If you’re interested in not overharvesting your forest base. Or in finding that solitary Great Blue Heron nest.
Remote sensing versus boots-on-the-ground
Remote sensing has progressed by leaps and bounds over the last several decades. Measuring individual tree-crown diameters from B&W aerial photos with a micrometer (as I needed to do for my first scientific publication in 1986) has given way to things like LIDAR, which allows you to actually see inside that individual tree-crown.
Indeed LIDAR has become the gold standard for forest resource management, but it comes with a hefty price tag, both for data collection…and analysis. I have no direct experience with LIDAR, but given the size of the area I suspect the costs would exceed $100,000.
There are cheaper, abiet less precise methods, dating all the way back to those first timber-cruising days of pioneer B.C.
At the inexpensive end of the spectrum, one could enlist a group of volunteers to walk transects and measure trees, perhaps using something like point-centered quarter (PCQ) methods. Indeed, this is what Carrie initially approached me with. It would be inexpensive in dollar terms only.
In terms of labour and time, the effort needed would be huge. I’m not a forester, but as I indicated to Carrie, were I to survey forest-dwelling birds in coastal B.C. forests (something I’ve done), I’d want to keep my point-counts 100 metres apart to ensure statistical independence.
It’s possible that one could get away with sparser sampling if forests are relatively homogenous…but as a first approximation I think my attempt is useful for two reasons.
First, sampling at this density puts only 2 points within the polgon with “big”, “medium” and “small” tree-crowns. So sampling at that density would have missed one of those groups. And, looking at the map again, it might have missed two of the three.
Second, and more importantly, Figure 8 shows what such a sampling density would look like when superimposed on the entirety of the community forest. There are n=2788 sampling points. How many volunteer-hours are we talking about?
Perceptive readers might surmise that I’m looking for the proverbial “happy medium” in all this – and they would be quite correct.
Maximizing resolution, manpower & utility
The venerable Landsat series has been around since 1972, but has always been limited by the coarse resolution of it’s sensors (each pixel in a Landsat image =30 m² on the ground).
Things have improved since then.
There’s a constellation of satellites orbiting Earth as you read this. Many are military (forget it – I already asked). Some are commercial, and , well it’s their business.
Thus, Airbus Industries will sell you a single SPOT image at 1.5 m² resolution…for several thousand dollars. You can do even better. The sensors aboard the Pléiades Neo spacecraft provide a ground-level resolution of 30 centimetres.
The B.C. Government has access to the latest, highest-resolution ECW imagery as well… and they’ll also sell you a single high resolution image…for several thousand dollars. Yup. Which they bought using taxpayer dollars.
And Google Earth? Well, as with all things Google, they’re a business too. Which is why you’re not supposed to download their very nice imagery directly.
But it’s pretty awesome. I especially like Figure 9. Yup, that’s me. Changing the oil on my Izuzu Rodeo. The spare tire’s been swung open, and yes that’s the shadow of my head above the open hood. Google gets their images from various commercial satellites, and the resolution is mind-numbingly good (although it varies from place to place and time to time). Hey. They even got my best side…
Alas, even though there is a way to import Google Earth imagery into a GIS system, and therefore make use of their very high (<2 m² ) resolution, you’d be limited in other respects. Specifically, because Google Earth delivers only “natural color” imagery, the options for analysis are limited.
Even venerable Landsat, with it’s much coarser resolution (the normal 30 m² can be “sharpened” using the 15 m² panchromatic band) provides more options.
Specifically, by combining various wavelengths, one can derive statistical spectral “signatures” for various things – including different types of forests – and even individual tree species within a forest stand. The scientific literature on the subject is huge. This and this provide good introductions to the subject if you’re new to GIS and want to start with the basics.
More recently the European Space Agency has made their Sentinel-2 imagery freely available…this satellite samples at 10 m² resolution…and this is a real game-changer.
Although not as nice as Google Earth, Sentinal-2 provides a big improvement in resolution, and offers an equivilent-to-Landsat variety of band-wavelengths to sample from. It’s also free.
Most importantly, given the the reams of scientific literutare devoted to classifying forest types, it’s now possible to do what would have been impossible even a few years ago.
In a nutshell, I’m proposing to blend the old with the new. Specifically, I propose to:
- conduct a modern forest classification analysis using Sentinal-2 imagery over the entire Cortes Community Forest.
- provide a statistically-robust sampling design that will allow volunteers to ground-truth the new, and the existing, vegetation classification (VRI).
- provide training and data such that this project ultimately becomes your project.
Without boring you with jargon, what I propose to do is:
- download a series of recent Sentinal-2 images. I’m not looking for changes in forest cover (i.e., forest harvesting) over time, so I’ll use images from the past year or so. I’ll want more than one image, because the physiology of trees changes with the seasons. Comparing images from January to June would identify deciduous trees, to use an obvious example. Cloud-cover will limit my choices. But as Sentinal-2 images Cortes Island with a revisit-time of about 2.6 days, there’ll be lots of images to choose from. This is the sample.
- I won’t be comparing images by eye (even to look for deciduous trees, this would be quite insane). Instead, I’ll perform what is known as a “supervised forest cover classification” (here’s an example, but be forewarned…it’s not light reading). In lay terms, what this means is that you’re comparing, pixel by pixel, the “spectral signature” (i.e., the color) recorded by the sensors aboard the satellite.
- By using different wavelengths (sensor “bands”) and images taken across seasons, it will be possible to distinguish among tree species, even in coastal coniferous forests where everything looks “green” to the naked eye. The literature would seem to suggest that this would work with something like 75% accuracy. Here’s a nice review paper on the subject (again…it’s not light reading).
- By building a series of “classification rules” from the sample of images, I will use the “semi-automatic classification plugin” implemented in QGIS to basically say “ok, build me a new set of vegetation polygons based on these decision-rules”.
- This is where the volunteers come in.
In addition to building a new VRI that can be statistically and spatially compared with the one already available from the Ministry of Forests, I propose to develop, based on the new polygons, a statistically-defensible sampling strategy for ground-truthing them.
Apart from the obvious “team-building” benefits of having “boots on the ground”, it only makes sense to do things in a manner that produces meaningful information. It makes no sense to merely gather data if they’re not gathered in a systematic way.
Budget and timeline
Not having done this before, this is frankly difficult to assess. There are no materials costs, as the images and software are open-source. I have adequate computer and hardware, and have more statistical and graphics software than I remember how to use. The key question is “how much time will this take?”
Given how long it’s taken me to develop this proposal, and what I needed to teach myself before starting to type it, I’m thinking that what I’m proposing to do will take the equivilent of two months – at professional wages. So I’m tempted to suggest that we’re talking about something like $8,500 to start.
Much depends on how well species such as Western Red Cedar or Western Hemlock can be resolved by seasonally-dependent physiological responses to different wavelengths. Frankly I just don’t know, and the largest chunk of time will be spent reading the science.
Another factor relates to the question of what happens when data start flowing in from the field? It’s one thing to pick a “pilot-study” area and randomly assign sampling points within polygons. That’s the easy part. The harder question becomes:
- who enters/manages the data?
- are you using a GIS (smart phone) or a notebook?
- how are you analysing the data (Systat likes *.wks files, ArcMap and QGis like *.dbf files, and Excel is the common denominator)
- at what point do you decide that the existing VRI is adequate or inadequate (or the new one is)? The key point here is to establish a priori decision points. In science, 95% confidence is the traditional yardstick for establishing “significance”. But if you found a consistent 3% difference in two estimates of wood volume from a large area, well…that would represent a lot of wood.
Ok, that should be enough to start…does this sound interesting to you?