-One 10x10km grid cell from the NREL/Perez Direct Normal Irradiance grid
-90m Landscan Raster detailing population for greater Denver area from Oak Ridge National Lab
The Analysis Intent:
-Summarize population within grid cell, first using ESRI and then PostGIS and compare. Easy, no?
1. Load Some Data
First, loading the raster into PostGIS. This is pretty straight forward with one exception, tile size. What tile size do you use? I would love to see theoretical or empirical analysis detailing the optimal tile size per given cell size and extent. Anyone? Until then, I'll guess.
-C : applies contraints, -M : vacuum analyze, -Y : use copy rather than insert, -t : tile size, -s : SRID, -U : username, -d : database
raster2pgsql -s 4326 -C -M -Y landscanden -t 100x100 ls_den | psql -U postgres -d postgis2
2. Get a Base Estimate
Get a base estimate using ESRI. Using Zonal Statistics I get the following:
Cell Count: 14,400
3. How Much Can We Explain?
If my results are different between ESRI and PostGIS, how much could be explained? Assume one program summarizes all border cells while the other does not. This will give us a percent difference we can easily explain. So, if I have a 10x10km grid cell as the zone, then I could fit 111 90m cells on each border for a total of 444 border cells. Square 111.9 to get 12,321 90m cells that could fit inside the 10x10km grid cell. Divide the border cells by the total 444/12,321*100 = 3.6%. We can explain 3.6% of the difference (if there is any) by just the border cells.
4. First Attempts at Raster Analysis in PostGISI thought I was thinking logically when I wrote this...should have just looked at the documentation first!
--Create a vectorized result of the geometric intersection between my raster layer and vector polygon
CREATE TABLE sum_pop AS
,(gv).val AS population
,(gv).geom AS geom
,ST_Intersection(b.rast, a.geom) gv
FROM perez_grid a, ls_den b
WHERE ST_Intersects(b.rast, a.geom) ) foo;
Spatial results looked good. Just needed to summarize
SELECT SUM(population), COUNT(*) AS cell_count FROM sum_pop;
Cell Count: 9,811
Population Percent Difference: ~7%
Well...that doesn't look right...how do I explain the additional 3.4%? First things first, what is the deal with the count? Well, that one is pretty easy, when the raster is vectorized, it groups cells of like values. So count doesn't really matter.
5. A Hunch or Swing in the Dark?Ok, not sure why I decided to try this, but I loaded in the exact same raster dataset except for this time I used 200x200 tile size. Shouldn't make a difference, right? WRONG!!! This still vexes me, if anyone can explain I would be grateful.
After reading the documentation, I have a plan of attack. I'll follow their example and see what I get.
CREATE TABLE sum_pop2 AS
feat AS (SELECT gid, geom FROM perez_grid AS b ),
(SELECT gid, (stats).*
SELECT gid, ST_SummaryStats(ST_Clip(rast,
1,geom)) AS stats
INNER JOIN feat
ON ST_Intersects(feat.geom,rast) ) AS foo )
SELECT gid, SUM(count) AS cell_count
,SUM(sum) AS population
WHERE count > 0
GROUP BY gid
ORDER BY gid;
Cell Count: 14,400
Population Percent Difference: 0%
SUCCESS, however, I'm not sure why examples must be so verbose! It is a beautiful query don't get me wrong...
So whats difference between this amd my first attempt?
The WITH clause simply allows you to join queries, you could right this as a query with a subquery and get the same results. Using ST_Clip(), the raster is never converted to vector, the vector is used to clip the raster than a simple SummaryStats() is performed. That's the bulk of it. The GROUP BY clause aggregates all the rids (aka tiles).
The BIG question is, why does the first attempt produce the wrong answer? The intersection produces geomval pairs for only those cells that intersect, so why the difference? I would have guessed that it would have resulted in more population because it grabs even the slightest sliver that intersects (its vector on vector) while ESRI grabs only cells whose centroid falls within the intersect, but PostGIS resulted in less!
8. A Simple Alternative to the Correct Method
For those that find the above query verbose and hard to understand, I'll provide an alternative query that produces the same result in similar return time.
CREATE TABLE sum_pop3 AS
SELECT gid, SUM((ST_SummaryStats(ST_Clip(rast,1,geom))).sum)
FROM perez_grid, ls_den
GROUP BY gid