In object detection, one seeks to develop algorithm that identifies a specific object in an image. Here, we'll see how to build a very simple object detector (based on color) using opencv. More sophisticated object detection algorithms are capable of identifying multiple objects in a single image. For example, one can train an object detection model to identify various types of fruits, etc. Later, we'll also see that our object detection model is not exactly perfect. Nevertheless, aim of this notebook is not to build a world-class object detector but to introduce the reader to basic computer vision and image processing.

Let's start by loading some useful libraries

# A popular python library useful for working with arrays
import numpy as np  

# opencv library
import cv2

# For image visualization
import matplotlib.pyplot as plt 


#Plots are displayed below the code cell 
%matplotlib inline 

Let's load and inspect the dimensions of our image. Images are basically a matrix of size heigth*width*color channels.

fruits = cv2.imread('apple_banana.png')  # cv2.method loads an image
fruits.shape
(1216, 752, 3)

So, we can see that our image is 1216 by 752 pixels and it has 3 color channels. Next, we'll convert our image into the RGB color channel. RGB color space is an additive color model where we can obtain other colors by a linear combinations of red, green, and blue color. Each of the red, green and blue light levels is encoded as a number in the range from 0 to 255, with 0 denoting zero light and 255 denoting maximum light. To obtain a matrix with values ranging from 0 to 1, we'll divide by 255.

fruits = cv2.cvtColor(fruits, cv2.COLOR_BGR2RGB)  # cvtColor method to convert an image from one color space to another. 
fruits = fruits / 255.0

Finally, let's plot our image.

plt.imshow(fruits)
<matplotlib.image.AxesImage at 0x7f55dae437b8>

We can see that our image contains one red apple and one yellow banana. Next, we will build a very basic object detector which can pinpoint apple and banana in our image based on their colors. There are more excellent algorithms out there to do this task but that's for some other time.

We start by creating two new images of the same dimensions as our original image and fill first one with the red color - to detect apple and the second one with the yellow - to detect banana.

apple_red = np.zeros(np.shape(fruits))
banana_yellow = np.zeros(np.shape(fruits))

apple_red[:,:,0] = 1   # set red channel to 1 - index 0 corresponds to red channel 
banana_yellow[:,:,0:2] = 1 # set yellow channel to 1 - it can be done by filling red and blue channel with 1

fig, (ax1, ax2) = plt.subplots(1,2)
ax1.imshow(apple_red)
ax2.imshow(banana_yellow)
<matplotlib.image.AxesImage at 0x7f55da945908>

Now, we will compare the pixels between our colored and fruits images. One way is to calculate the mean-squared distance as follows:

$$d_{x,y} = \sqrt{\sum_{z = 1}^{3}(R_{xyz} - F_{xyz})^2} $$

where, $d_{xyz}$ is Euclidean distance between pixel values for all 3 color channels in two compared images $R$ and $F$.

To implement this, we will first subtract two matrices from each other, and then take a norm of a vector. This can be easily acheived by numpy's linalg.norm method (Don't forget to set the axis to 2).

# Subtract matrices

diff_red = fruits - apple_red 
diff_yellow = fruits - banana_yellow

# Take norm of both vectors

dist_red = np.linalg.norm(diff_red, axis=2)
dist_yellow = np.linalg.norm(diff_yellow, axis=2)

# Let's plot our matrix with values, the imshow function color-maps them.

# For apple(red) detector

plt.imshow(dist_red)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f55da8896d8>

One can see in the plot above that the pixels with the lowest value in the matrice are the pixels that make up the apple (see colorbar for reference). This makes sense as those pixels corresponds to the red-most pixels in the fruits image. Let's also plot the matrice for banana (yellow) detector.

# For banana (yellow) detector

plt.imshow(dist_yellow)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f55da7c9400>

Again we see that the pixels with the lowest value in the matrice are the pixels that make up the banana.

Now in order to pinpoint apple and banana in our fruits image, we need to find the index of the matrix element with the lowest value.

ind_red = np.argmin(dist_red)
print ("red most pixel index= ", ind_red)

ind_yellow = np.argmin(dist_yellow)
print ("yellow most pixel index = ", ind_yellow)
red most pixel index=  544887
yellow most pixel index =  225109

In order to point the location of this index on our fruits image i.e. to pinpoint our object, we need the x,y coordinates of the index. This can be done using the np.unravel_index method.

# We will get the height and width of our fruits image
image = np.shape(fruits)[0:2]
(y_red, x_red) = np.unravel_index(ind_red, image)
(y_yellow, x_yellow) = np.unravel_index(ind_yellow, image)

Finally, it's time to pinpoint our objects ! Let's first pinpoint our apple.

fig, (ax1, ax2) = plt.subplots(1,2)

# Apple 
ax1.scatter(x_red, y_red, c='black', s = 100, marker = 'X')
ax1.imshow(fruits)

# Banana
ax2.scatter(x_yellow, y_yellow, c='black', s = 100, marker = 'X')
ax2.imshow(fruits)
<matplotlib.image.AxesImage at 0x7f55da6f0e48>

As you can see, our method correctly pinpoints both objects. Note that this method is not exactly perfect as it is based on colors. If there were some other yellow or red objects in the image, then we cannot really say that our marker will be on the apple or banana any longer.

In my other notebooks on computer vision, you'll find much more sophisticated algorithms to detect objects in images.

Let me know if you have any comments and suggestions. Thanks !