Point in Polygon search with MongoDB

point_polygon_yn.png

Computational geometry can be a daunting subject, but sometimes it's just needed to solve certain problems.

For example, in one recent project:

  • we had a large number of data points on a canvas
  • the user could draw a region of interest - a polygon - on the canvas
  • and we would then display only the points within the ROI

The user could adjust their ROI polygon as needed, so we would then need to update the points within the polygon.

Demo

In this post I'm going to go through the demo I created to do this.

The demo has two parts:

Flask + PyMongo

There is a flask app with two endpoints:

  • the root url for the page
  • an endpoint to fetch a point-in-polygon query

I am using pymongo to connect to MongoDB.

D3 Visualization
  • The visualization is a 2-D grid of points.
  • On each iteration a random polgyon is used to query the database for the points that lie within it.

Note: I am not going to go into detail on the D3 visualization itself in this post, but I have linked a few other posts that are relevant, and the code for this is also available!


First, a bit of background.

Point in Polygon (PIP)

We are confronted with the Point in polygon problem, and there are a couple of algorithms used to solve it.

Luckily these are implemented in various libraries, so we will not need to worry about that in this post!

500,000 points

Back to the problem at hand.

We were dealing with up to 500,000 points on a canvas.

This would have the potential to get pretty inefficient if we were to loop through an array of all of these points and check each one to see if it is contained in the polygon.

...and we would need to do this each time the user adjusted the polygon, looping through every single point to check.

Because while our eyes can tell us which points are close to the polygon, we have to find other ways to communicate that to a computer.

near_far_points.png

Having an idea of whether a point is near or far from the polygon annotation in question could allow us to avoid checking it at all.

But how do you communicate nearness and farness to the computer through code?

Space partitioning

Enter space partitioning.

Instead of looping through an array of points, it would be better to store the points in a different data structure, such as a quad tree.

D3 quad tree example

This way we don't have to check every single point to see if it is contained in the polygon.

We can just check the ones that are reasonably close.

2-D indexes in MongoDB

For that we can use a 2-D index in MongodB to index the point coordinates.

There are two parts to this:

  • recursively divide the space into quadrants and generate a geohash based on this - it's pretty interesting and you can read more about it here.
  • use a B-tree data structure to index the geohashes.

MongoDB GIS

Instead of worrying about any of the computational geometry or space partitioning from above, we can use a database with geospatial capabilities, such as MongoDB or Postgres + Postgis.

We were already using MongoDB in our project, and one nice thing is that the spatial functionality we need is available right out of the box, without installing any other dependencies.

Everything was pretty easy to setup.

from pymongo import MongoClient, GEO2D

GEO2D is the 2-D index for the points.

Connect to MongoDB

I'm not going to go into the details of installing MongoDB in this post. I added this demo to a local database that was already setup.

client = MongoClient("mongodb://127.0.0.1:27017/admin")
client = client.get_database("admin")

Make sure you have MongoDB running locally, and you should be able to connect like above.

Now create a collection called demo.

db = client["demo"]

Next we add the 2-D index on a field pts, which will ultimately hold the array of points.

db.coords.create_index([("pt", GEO2D)], min=0, max=grid_width)

The min and max are parameters that you can set, which otherwise default to -180 and 180, non-inclusive.

I was trying to create points for a grid of 200 x 200, and got an error when the grid_width was 200, which it is in the sample code.

The coords part is sort of like a table - I didn't have to define a schema ahead of time because of the nature of MongoDB.

Now we can generate some points and insert them into the database.

points = generate_points(grid_width,grid_height)
db.coords.insert_many(points)

You can find the generate_points function in the sample code.

A document in coords has the structure:

{"pt": [x,y], "id": 123}

Notice the pt from where we created the GEO2D index.

If coords is a table, pt is like a column if you're used to relational databases, and the index is on this column.

I'm not a MongoDB expert, so if there is a better analogy, please let us know in the comments!

I've added the id field here to use as a unique data key in the D3 data join.

Query the points using $geoWithin

Now we are ready to make some queries.

Instead of a typical database query where you might have something like select all documents where name="Christina", where the input is Christina, the input in this case is a polygon.

A polygon is just an array of points.

polygon_points = [[72,150], [74,84], [87,62], [169,54], [178,170]]

The query looks like this:

query = {"pt": 
            {"$geoWithin": 
                {"$polygon": polygon_points}
            }
        }

And we execute the query - this query is technically fetching all documents in coords where the pt point is within the polygon.

result = db.coords.find(query, {"_id": 0})

The {"_id": 0} part is just because I wanted to remove the ObjectId from the results.

So the results would just be an array of {"pt": [x,y], "id": 123} objects.

And that's it!

This was pretty straightforward to put together.

My main concern was whether I would be able to use this functionality with regular, euclidean geometry instead of geospatial data such as latitude and longitude coordinates.

You can read more in the docs about geospatial queries in MongoDB.

Also there is a nice tutorial on finding restaurants with geospatial queries that is pretty interesting.

Thanks for reading, and let me know if you have used MongoDB or Postres + Postgis in any of your projects!

Tagged In
blog comments powered by Disqus

Recent Posts

abstract_tree.png
Solving the Lowest Common Ancestor Problem in Python
May 9, 2023

Finding the Lowest Common Ancestor of a pair of nodes in a tree can be helpful in a variety of problems in areas such as information retrieval, where it is used with suffix trees for string matching. Read on for the basics of this in Python.

Read More
rectangles_cover.png
How to write a custom fragment shader in GLSL and use it with three.js
April 16, 2023

This blog post walks through the process of writing a fragment shader in GLSL, and using it within the three.js library for working with WebGL. We will render a visually appealing grid of rotating rectangles that can be used as a website background.

Read More
streaming data
Streaming data with Flask and Fetch + the Streams API
April 10, 2023

Streaming can be a great way to transfer and process large amounts of data. It can help save space and/or time, if the data uses a lot of memory, or if you want to start processing or visualizing the data as it comes in.

Read More
Get the latest posts as soon as they come out!