The user could adjust their ROI polygon as needed, so we would then need to update the points within the polygon.
Demo
In this post I'm going to go through the demo I created to do this.
The demo has two parts:
Flask + PyMongo
There is a flask app with two endpoints:
- the root url for the page
- an endpoint to fetch a point-in-polygon query
I am using pymongo to connect to MongoDB.
D3 Visualization
- The visualization is a 2-D grid of points.
- On each iteration a random polgyon is used to query the database for the points that lie within it.
Note: I am not going to go into detail on the D3 visualization itself in this post, but I have linked a few other posts that are relevant, and the code for this is also available!
First, a bit of background.
Point in Polygon (PIP)
We are confronted with the Point in polygon problem, and there are a couple of algorithms used to solve it.
Luckily these are implemented in various libraries, so we will not need to worry about that in this post!
500,000 points
Back to the problem at hand.
We were dealing with up to 500,000 points on a canvas.
This would have the potential to get pretty inefficient if we were to loop through an array of all of these points and check each one to see if it is contained in the polygon.
...and we would need to do this each time the user adjusted the polygon, looping through every single point to check.
Because while our eyes can tell us which points are close to the polygon, we have to find other ways to communicate that to a computer.
This way we don't have to check every single point to see if it is contained in the polygon.
We can just check the ones that are reasonably close.
2-D indexes in MongoDB
For that we can use a 2-D index in MongodB to index the point coordinates.
There are two parts to this:
- recursively divide the space into quadrants and generate a geohash based on this - it's pretty interesting and you can read more about it here.
- use a B-tree data structure to index the geohashes.
MongoDB GIS
Instead of worrying about any of the computational geometry or space partitioning from above, we can use a database with geospatial capabilities, such as MongoDB or Postgres + Postgis.
We were already using MongoDB in our project, and one nice thing is that the spatial functionality we need is available right out of the box, without installing any other dependencies.
Everything was pretty easy to setup.
from pymongo import MongoClient, GEO2D
GEO2D is the 2-D index for the points.
Connect to MongoDB
I'm not going to go into the details of installing MongoDB in this post. I added this demo to a local database that was already setup.
client = MongoClient("mongodb://127.0.0.1:27017/admin")
client = client.get_database("admin")
Make sure you have MongoDB running locally, and you should be able to connect like above.
Now create a collection called demo.
Next we add the 2-D index on a field pts, which will ultimately hold the array of points.
db.coords.create_index([("pt", GEO2D)], min=0, max=grid_width)
The min and max are parameters that you can set, which otherwise default to -180 and 180, non-inclusive.
I was trying to create points for a grid of 200 x 200, and got an error when the grid_width was 200, which it is in the sample code.
The coords part is sort of like a table - I didn't have to define a schema ahead of time because of the nature of MongoDB.
Now we can generate some points and insert them into the database.
points = generate_points(grid_width,grid_height)
db.coords.insert_many(points)
You can find the generate_points function in the sample code.
A document in coords has the structure:
Notice the pt from where we created the GEO2D index.
If coords is a table, pt is like a column if you're used to relational databases, and the index is on this column.
I'm not a MongoDB expert, so if there is a better analogy, please let us know in the comments!
I've added the id field here to use as a unique data key in the D3 data join.
Query the points using $geoWithin
Now we are ready to make some queries.
Instead of a typical database query where you might have something like select all documents where name="Christina", where the input is Christina, the input in this case is a polygon.
A polygon is just an array of points.
polygon_points = [[72,150], [74,84], [87,62], [169,54], [178,170]]
The query looks like this:
query = {"pt":
{"$geoWithin":
{"$polygon": polygon_points}
}
}
And we execute the query - this query is technically fetching all documents in coords where the pt point is within the polygon.
result = db.coords.find(query, {"_id": 0})
The {"_id": 0}
part is just because I wanted to remove the ObjectId from the results.
So the results would just be an array of {"pt": [x,y], "id": 123}
objects.
And that's it!
This was pretty straightforward to put together.
My main concern was whether I would be able to use this functionality with regular, euclidean geometry instead of geospatial data such as latitude and longitude coordinates.
You can read more in the docs about geospatial queries in MongoDB.
Also there is a nice tutorial on finding restaurants with geospatial queries that is pretty interesting.
Thanks for reading, and let me know if you have used MongoDB or Postres + Postgis in any of your projects!