This is Part II of my post on image similarity in Python with perceptual hashing. In this post, we will use Spotify's Annoy library to perform nearest neighbors search on a collection of images to find similar images to a query image.

Read MoreIf you've used the **bucket fill tool** in a paint or photo editing program like Photoshop or Gimp, then you already have first-hand experience with the flood fill algorithm.

I'm going to demonstrate it with this image of a sweatshirt that keeps coming up in my targeted ads.

Find the code for this post here.

The algorithm works on a multi-dimensional array, such as a **2-D matrix of pixels** that make up an image.

- It takes a
**starting point**in the array. - Then it finds all of the other
**adjacent nodes**that are connected to it based on some measure of**similarity**.

With the bucket tool, you select an area of an image and then click somewhere within the selection to fill it in.

The starting point is where you clicked.

After you click, the algorithm moves out in all directions from that point, and traverses a path of of pixels that have a **similar color to the starting point**.

When it runs into a pixel that is **not** similarly colored to the starting pixel, it **stops pursuing that path**.

The similarly colored pixels will be updated or **filled in** with whatever color you chose.

Flood fill is also used in games like **Minesweeper** to determine which squares to clear from the board when you click on one.

Now pretend like we've opened the image in your favorite photo editing software, and made a selection - the black dashed ellipse is our "selection" area.

Then click in the middle of the cat, which is black, so the algorithm traversed the neighboring areas until it ran into pixels that were not a similar color.

The bucket tool would either keep going until it ran into different colored pixels, or the bounds of the selection area.

You can see that the orange parts of the sweatshirt were not changed, and the gray flecks of fur were also not changed.

If you're designing the algorithm, it's up to you to decide on what **similar** colors means, which we will get into later.

There are **three inputs** to the flood fill algorithm.

- The starting point.
- The pixel color of the starting point.
- The color to update the connected pixels to.

From the word **traverse** you may have gathered that this is a **graph** problem.

- The graph is the 2-D matrix of pixels.
- Each pixel is a cell in the matrix, and a
**node**in the graph. - There are
**edges**between nodes that are connected to the starting point.

In a matrix, each cell has four neighbors on the **top**, **bottom**, **left** and **right**.

You will need to decide if you also want to traverse **diagonal neighbors** of each pixel.

Since this is a graph problem, you can use any of the graph traversal classics like **breadth-first search** or **depth-first search** to solve it.

Here I'm using depth-first search, and including the diagonal neighbors.

The starting point is marked with an **X**.

You can pretend that we are using a photo editing program and clicked on this spot in the image.

This can be done iteratively or with recursion.

View the full code here.

def flood_recursive(matrix): width = len(matrix) height = len(matrix[0]) def fill(x,y,start_color,color_to_update): #if the square is not the same color as the starting point if matrix[x][y] != start_color: return #if the square is not the new color elif matrix[x][y] == color_to_update: return else: #update the color of the current square to the replacement color matrix[x][y] = color_to_update neighbors = [(x-1,y),(x+1,y),(x-1,y-1),(x+1,y+1),(x-1,y+1),(x+1,y-1),(x,y-1),(x,y+1)] for n in neighbors: if 0 <= n[0] <= width-1 and 0 <= n[1] <= height-1: fill(n[0],n[1],start_color,color_to_update) #pick a random starting point start_x = random.randint(0,width-1) start_y = random.randint(0,height-1) start_color = matrix[start_x][start_y] fill(start_x,start_y,start_color,9) return matrix

This is the same code used to create the animation, and the colors correspond to numbers, so `color_to_update`

is initialized to **9**, which is mapped to the gray color that the connected pixels are updated to.

With depth-first search, the algorithm explores **one path completely** until reaching a pixel with a different color.

Then it comes back to the starting point and pursues the next neighbor's path.

Notice that only the dark blue pixels that are **connected** are changed. The other dark blue pixels that are not adjacent to the starting point are left alone.

With this example the colors are all **uniform**, so we just have to compare pixels to see if they have an equal value.

But with a real image like the sweatshirt, it is not so simple.

If you look closely at the orange sweatshirt in the image earlier, you can see that there are **many different shades of orange** that make up even just the sweatshirt part of the image.

There are lighter and darker shades of orange due to lighting, shadows, creases, etc.

Two pixels right next to each other might be slightly different shades of orange, but look the same to our human eyes.

So we need to be more flexible and instead check to see if the colors are *similar*.

We are working with **RGB** colors.

- An RGB color has a
**red**,**green**and**blue**parameter, which together combine to create all other colors. - Each parameter has a value between
**0 and 255**, which defines the**intensity**of that color parameter.

An example would be `rgb(242, 103, 51)`

, which is one of the shades of orange found in the sweatshirt.

So colors are on a spectrum, and instead of checking for an exact match, we could compare a new pixel color to the starting point, and decide if they are **close enough** of a match.

One way to compare two colors would be to **subtract** the red, green and blue values of the colors **separately** and see if **all** of them are within a certain threshold.

def is_similar(start_color, new_color): threshold = 40 red,green,blue,alpha = start_color new_red,new_green,new_blue,new_alpha = new_color if abs(red-new_red) <= threshold and abs(green-new_green) <= threshold and abs(blue-new_blue) <= threshold: return True return False

In this function I've set a threshold of 40, so if the **absolute value of the difference** between each of the red, green and blue values in the two colors is greater than or equal to 40, then they are **similar enough**.

Side note: the image is actually an **RGBA** image, where the **A** represents an **alpha channel** for opacity, but we are only concerned with the red, green and blue values here.

Here you can see the difference between thresholds.

- Top left = threshold
**10** - Top right = threshold
**40** - Bottom left = threshold
**100** - Bottom right = threshold
**200**

There is a **green star** on each image that shows where the **starting point** is.

The green arrow on the bottom right points it out as well, in case it's not easy to see!

It makes sense because with a large threshold like 200, we are defining almost **all** other RGB colors as similar.

Another use for this algorithm is in interview questions that ask about **islands**.

If they want you to count all the islands in a matrix or something like that, which seems to be a popular interview question.

You can use the flood fill algorithm to determine the nodes that belong to each island.

As the saying goes, don't reinvent the wheel!

There are implementations of the flood fill algorithm that have been tested and are probably a better choice than rolling your own solution.

**About the sweatshirt image:** I took a screenshot of it awhile ago, and haven't seen it recently so I am not sure exactly where it came from.

It seems to be a common shirt that can be found on a lot of sites.

Let me know if you have questions or comments! Comment below or find me on Twitter @LVNGD.

This is Part II of my post on image similarity in Python with perceptual hashing. In this post, we will use Spotify's Annoy library to perform nearest neighbors search on a collection of images to find similar images to a query image.

Read MoreKruskal's algorithm finds a minimum spanning tree in an undirected, connected and weighted graph. We will use a union-find algorithm to do this, and generate a random maze from a grid of points.

Read MoreJust in time for Valentine's day, create a puckering lips animation in D3 from an SVG path, using interpolations and .attrTween(). We will go through the steps from generating points from an SVG path, to interpolating lines in D3 to animate them.

Read More