Saturday, July 29, 2017

How to compress color spaces using k-means clustering

One exciting application of k-means clustering is the compression of image color spaces. Although True-color images come with a 24-bit color depth (allowing 16,777,216 color variations), a large number of colors within any particular image will typically be unused—and many of the pixels in the image will have similar or identical colors. In this post I am going to show you how you can use k-means clustering via OpenCV and Python to reduce the color palette of an image to a total of 16 colors.

Compressing color spaces using k-means

Generally speaking, k-means clustering searches for a predetermined number of k clusters (or groups) within an unlabeled multidimensional dataset. It does so by using two simple assumptions about what an optimal clustering should look like:

  • The center of each cluster is simply the arithmetic mean of all the points belonging to the cluster.
  • Each point in the cluster is closer to its own center than to other cluster centers.

We can use k-means to reduce the color palette of a True-color image by following a simple three-step procedure:

  1. Load the image.
  2. Apply k-means to find the k most dominant colors in the image.
  3. Recolor the image using the k new colors.

The trick here is to think of the cluster centers as the reduced color palette. Then k-means will automatically organize the millions of colors in the original image into the appropriate number of colors!

Step 1: Loading the image

Let's have a look at a particular image:
In [1]: import cv2
In [2]: lena = cv2.imread('data/lena.jpg', cv2.IMREAD_COLOR)

We can plot it using Matplotlib:

In [3]: import matplotlib.pyplot as plt
In [4]: # In a Jupyter Notebook, you will need the following:
...     %matplotlib inline
In [5]: plt.imshow(cv2.cvtColor(lena, cv2.COLOR_BGR2RGB))

The picture looks like this:

Step 2: Applying k-means

The image itself is stored in a 3D array of size (height, width, depth) containing blue/green/red contributions as integers from 0 to 255. Because every color channel has 256 possible values, the number of possible colors is 256 x 256 x 256, or 16,777,216, as mentioned prior to this. One way to visualize the sheer amount of different colors in the image is to reshape the data to a cloud of points in a 3D color space. We also scale the colors to lie between 0 and 1:

In [6]: img_data = lena / 255.0
...     img_data = img_data.reshape((-1, 3))
...     img_data.shape
Out[6]: (50625, 3)

Now let's reduce these 16 million colors to just 16 colors by telling k-means to cluster all color variations into 16 distinct clusters. We use the before mentioned procedure, but now specify 16 as the number of clusters:

In [7]: criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER,
...                 10, 1.0)
...     flags = cv2.KMEANS_RANDOM_CENTERS
...     img_data = img_data.astype(np.float32)
...     compactness, labels, centers = cv2.kmeans(img_data,
...                                               16, None, criteria,
...                                               10, flags)

The resulting cluster corresponds to the 16 colors of our reduced color palette. Visual inspection of the centers array reveals that all colors have three entries—B, G, and R—with values between 0 and 1:

In [8]: centers
Out[8]: array([[ 0.29973754, 0.31500012, 0.48251548],
               [ 0.27192295, 0.35615689, 0.64276862],
               [ 0.17865284, 0.20933454, 0.41286203],
               [ 0.39422086, 0.62827665, 0.94220853],
               [ 0.34117648, 0.58823532, 0.90196079],
               [ 0.42996961, 0.62061119, 0.91163337],
               [ 0.06039202, 0.07102439, 0.1840712 ],
               [ 0.5589878 , 0.6313886 , 0.83993536],
               [ 0.37320262, 0.54575169, 0.88888896],
               [ 0.35686275, 0.57385623, 0.88954246],
               [ 0.47058824, 0.48235294, 0.59215689],
               [ 0.34346411, 0.57483661, 0.88627452],
               [ 0.13815609, 0.12984112, 0.21053818],
               [ 0.3752504 , 0.47029912, 0.75687987],
               [ 0.31909946, 0.54829341, 0.87378371],
               [ 0.40409693, 0.58062142, 0.8547557 ]], dtype=float32)

Step 3: Recoloring the image

These 16 colors correspond to the 16 cluster labels contained in the labels vector. So we want all data points with label 0 to be colored according to row 0 in the centers array; all data points with label 1 to be colored according to row 1 in the centers array; and so on. In other words, we want to use labels as an index into the centers array—these are our new colors:

In [9]: new_colors = centers[labels].reshape((-1, 3))

In order to see the effect of this recoloring, we have to plot new_colors as an image. In order to get from the image to the data matrix, we flattened the earlier image. Now we need to do the inverse to get back to the image, which is to reshape new_colors according to the shape of the Lena image:

In [10]: lena_recolored = new_colors.reshape(lena.shape)

Then we can visualize the recolored Lena image like any other image:

In [11]: plt.imshow(cv2.cvtColor(lena_recolored, cv2.COLOR_BGR2RGB))
...      plt.title('16-color image')

The result looks like this:

Pretty cool, right?

Although some detail is arguably lost, overall, the preceding image is still clearly recognizable. This is pretty remarkable, considering that we just compressed the image by a factor of around 1 million!

You can repeat this procedure for any desired number of colors.

More information can be found in the book Machine Learning for OpenCV, Packt Publishing Ltd., July 2017.

As usual, all source code is available for free on GitHub (refer to Chapter 8: Discovering Hidden Structures with Unsupervised Learning).

No comments :

Post a Comment