Point in Polygon search with MongoDB

point_polygon_yn.png

Computational geometry can be a daunting subject, but sometimes it's just needed to solve certain problems.

For example, in one recent project:

  • we had a large number of data points on a canvas
  • the user could draw a region of interest - a polygon - on the canvas
  • and we would then display only the points within the ROI

The user could adjust their ROI polygon as needed, so we would then need to update the points within the polygon.

Demo

In this post I'm going to go through the demo I created to do this.

The demo has two parts:

Flask + PyMongo

There is a flask app with two endpoints:

  • the root url for the page
  • an endpoint to fetch a point-in-polygon query

I am using pymongo to connect to MongoDB.

D3 Visualization
  • The visualization is a 2-D grid of points.
  • On each iteration a random polgyon is used to query the database for the points that lie within it.

Note: I am not going to go into detail on the D3 visualization itself in this post, but I have linked a few other posts that are relevant, and the code for this is also available!


First, a bit of background.

Point in Polygon (PIP)

We are confronted with the Point in polygon problem, and there are a couple of algorithms used to solve it.

Luckily these are implemented in various libraries, so we will not need to worry about that in this post!

500,000 points

Back to the problem at hand.

We were dealing with up to 500,000 points on a canvas.

This would have the potential to get pretty inefficient if we were to loop through an array of all of these points and check each one to see if it is contained in the polygon.

...and we would need to do this each time the user adjusted the polygon, looping through every single point to check.

Because while our eyes can tell us which points are close to the polygon, we have to find other ways to communicate that to a computer.

near_far_points.png

Having an idea of whether a point is near or far from the polygon annotation in question could allow us to avoid checking it at all.

But how do you communicate nearness and farness to the computer through code?

Space partitioning

Enter space partitioning.

Instead of looping through an array of points, it would be better to store the points in a different data structure, such as a quad tree.

D3 quad tree example

This way we don't have to check every single point to see if it is contained in the polygon.

We can just check the ones that are reasonably close.

2-D indexes in MongoDB

For that we can use a 2-D index in MongodB to index the point coordinates.

There are two parts to this:

  • recursively divide the space into quadrants and generate a geohash based on this - it's pretty interesting and you can read more about it here.
  • use a B-tree data structure to index the geohashes.

MongoDB GIS

Instead of worrying about any of the computational geometry or space partitioning from above, we can use a database with geospatial capabilities, such as MongoDB or Postgres + Postgis.

We were already using MongoDB in our project, and one nice thing is that the spatial functionality we need is available right out of the box, without installing any other dependencies.

Everything was pretty easy to setup.

from pymongo import MongoClient, GEO2D

GEO2D is the 2-D index for the points.

Connect to MongoDB

I'm not going to go into the details of installing MongoDB in this post. I added this demo to a local database that was already setup.

client = MongoClient("mongodb://127.0.0.1:27017/admin")
client = client.get_database("admin")

Make sure you have MongoDB running locally, and you should be able to connect like above.

Now create a collection called demo.

db = client["demo"]

Next we add the 2-D index on a field pts, which will ultimately hold the array of points.

db.coords.create_index([("pt", GEO2D)], min=0, max=grid_width)

The min and max are parameters that you can set, which otherwise default to -180 and 180, non-inclusive.

I was trying to create points for a grid of 200 x 200, and got an error when the grid_width was 200, which it is in the sample code.

The coords part is sort of like a table - I didn't have to define a schema ahead of time because of the nature of MongoDB.

Now we can generate some points and insert them into the database.

points = generate_points(grid_width,grid_height)
db.coords.insert_many(points)

You can find the generate_points function in the sample code.

A document in coords has the structure:

{"pt": [x,y], "id": 123}

Notice the pt from where we created the GEO2D index.

If coords is a table, pt is like a column if you're used to relational databases, and the index is on this column.

I'm not a MongoDB expert, so if there is a better analogy, please let us know in the comments!

I've added the id field here to use as a unique data key in the D3 data join.

Query the points using $geoWithin

Now we are ready to make some queries.

Instead of a typical database query where you might have something like select all documents where name="Christina", where the input is Christina, the input in this case is a polygon.

A polygon is just an array of points.

polygon_points = [[72,150], [74,84], [87,62], [169,54], [178,170]]

The query looks like this:

query = {"pt": 
            {"$geoWithin": 
                {"$polygon": polygon_points}
            }
        }

And we execute the query - this query is technically fetching all documents in coords where the pt point is within the polygon.

result = db.coords.find(query, {"_id": 0})

The {"_id": 0} part is just because I wanted to remove the ObjectId from the results.

So the results would just be an array of {"pt": [x,y], "id": 123} objects.

And that's it!

This was pretty straightforward to put together.

My main concern was whether I would be able to use this functionality with regular, euclidean geometry instead of geospatial data such as latitude and longitude coordinates.

You can read more in the docs about geospatial queries in MongoDB.

Also there is a nice tutorial on finding restaurants with geospatial queries that is pretty interesting.

Thanks for reading, and let me know if you have used MongoDB or Postres + Postgis in any of your projects!

Tagged In
blog comments powered by Disqus

Recent Posts

point_polygon_yn.png
Point in Polygon search with MongoDB
Nov. 12, 2022

In a recent project, we had a large number of points on a canvas, where a user could draw a region of interest to see only the points within that area. Here is a demo of how to do that using MongoDB with a geospatial 2D-index. Visualized using D3.

Read More
main_graphic.jpg
Image Similarity with Python Part II: Nearest Neighbor Search
Feb. 18, 2022

This is Part II of my post on image similarity in Python with perceptual hashing. In this post, we will use Spotify's Annoy library to perform nearest neighbors search on a collection of images to find similar images to a query image.

Read More
kruskal animation shot
Kruskal's Algorithm Animation + Maze Generation
Feb. 7, 2022

Kruskal's algorithm finds a minimum spanning tree in an undirected, connected and weighted graph. We will use a union-find algorithm to do this, and generate a random maze from a grid of points.

Read More
Get the latest posts as soon as they come out!