epilys

# Mapping concepts to colors (terribly) with the Oklab perceptual colorspace

## TL;DR

• What? I wanted a way to semi-automate associating colors to topic tags stories posted on https://sic.pm.
• Why not any color? I was curious to see what was reasonable within this approach.
• What’s the approach? I had no idea about colorspaces or image processing before I started. My strategy would be to download the top results of a query using DuckDuckGo image search, calculate their dominant colors, and then calculate the overall dominant colors. Any suggestions/corrections are most welcome!
• Show me the result:

## What’s a colorspace and what’s “Perceptually uniform”?

A colorspace is basically a way to model colors to attributes. The well known RGB colorspace maps colors to Red, Green and Blue.

If that space has three attributes, we can view them as coordinates on a 3D space (Any n attributes can be viewed as an n-dimensional vector space). Then we define color distance as the usual Euclidean distance we use for tangible stuff in the real world.

A uniformly perceptual colorspace aims to have the following identity: “identical spatial distance between two colors equals identical amount of perceived color difference”. The actual definitions of those terms can be found in color science books and research.

Oklab is a perceptual color space designed by Björn Ottosson to make working with colors in image processing easier. After reading the introductory blog post, I wondered if I could apply it to finding dominant colors of an image.

Oklab has three coordinates:

• L perceived lightness
• a how green/red the color is
• b how blue/yellow the color is

## Dominant colors

I guess we would take an image and average all colors. What would that produce?

Terrible. Obviously the approach can’t work with multiple colors apparent in a picture. If the picture was mostly one color it’d be somewhat useful:

## k-means clustering

From signal processing comes this dazzling technique: Given a set of colors c, partition them to k buckets as follows:

1. Initially assign k average colors somehow. You can pick them randomly for example. We will incrementally improve on those averages to arrive to a centroid color, or the mean (average) color of a cluster.
2. Assign every color c to the average closest to it mκ by calculating Euclidean distances to each m.
3. Recalculate mκ as the average of the updated cluster κ.
4. Repeat until assignments are the same as the previous step; we’ve reached convergence which is not necessarily correct/optimal.

Since we will use a perceptually uniform colorspace, we expect each cluster to be perceivably close to the actual colors it contains.

And since we will be working with lots of sample images, we can calculate the overall dominant colors by putting all the colors together.

## Implementation

To visualize the results, I chose to calculate the dominant colors for each image, then calculate the overall dominant colors from those.

I also uniformly split the Oklab colorspace into colors and clustered all the dominant colors again, in order to see the difference of the calculated dominant colors and the uniformly sampled ones:

The image results for most queries are stock photos or text, hence there is a lot of black and white. We can deduce how black or greyscale looking is a color by looking at its coordinates. In Oklab, the a, b coordinates will be close to zero. In HSL (Hue-Saturation-Lightness) a low L value means the color is close to black. We can discard such colors by checking those values.

## Results

Searching for non abstract things such as fruits returns pictures of the things themselves so we get good results:

Searching for pharmaceuticals returns lots of pictures of colorful pills:

Searching for ethics returns pictures of signs that point to stuff such as “Right” and “Wrong” and “Principles”:

Searching for design returns a boring sea of brown and beige thanks to interior design trends:

Searching for programming identifies the classic green terminal color along with other syntax highlighting palettes:

Finally, philosophy returns pictures of books and statues, so the results are predictable and omitted:

## Improving the sample source

I’ve had some luck getting “better” results by searching for “book about {query}” and “book about {query} cover” expecting topical books to share color schemes, like the distinctive palettes O’Reilly uses in its programming books.

I found Google Images to show less junk results but they have no API you can use without an account.

## Conclusions and notes

As expected, this doesn’t produce particularly mind blowing results since abstract concepts lack color association in general. Even if you have any type of vision synesthesia, the colors you see are usually unique for each person.

To get back to the original motivation behind this experiment, which was associating post tags with colors: you can achieve this by clustering existing colors and for each new tag calculate dominant colors, and choose one that belongs to the smallest cluster. That way you can avoid common colors like black/white/blue/orange saturating your tag cloud.

## Sample code

``````import decimal
import itertools
from wand.image import Image
import numpy as np
from scipy.cluster.vq import vq, kmeans
import colorio

wand_color_to_arr = lambda c: np.array([c.red_int8, c.green_int8, c.blue_int8])

OKLAB = colorio.cs.OKLAB()
color_abs = lambda v: 0xFF if v > 0xFF else v if v >= 0 else 0

oklab_to_rgb255 = lambda o: OKLAB.to_rgb255(o)
rgb_to_hex = lambda rgb: "#%s" % "".join(("%02x" % p for p in rgb))
oklab_to_hex = lambda o: rgb_to_hex(map(color_abs, map(int, oklab_to_rgb255(o))))

dec_ctx = decimal.Context(prec=2, rounding=decimal.ROUND_HALF_DOWN)
arr_display = lambda arr: ["%.2f" % dec_ctx.create_decimal_from_float(i) for i in arr]

def image_to_colors(img: Image):
img.thumbnail(200, 200)
colors = set(c for row in img for c in row)
ret = []
for c in colors:
ret.append(OKLAB.from_rgb255(wand_color_to_arr(c)))
return ret

class Bucket:
def __init__(self, rep):
self.rep = rep
self.colors = []

def __len__(self):
return len(self.colors)

def append(self, color):
self.colors.append(color)

def dominant_colors(oks, n=20):
_r, _ = kmeans(oks, min(n, len(oks)))
# sort dominant colors by cluster size
buckets = [Bucket(rep) for rep in _r]
_s, _ = vq(oks, _r)
for idx, c in enumerate(oks):
bucket_idx = _s[idx]
buckets[bucket_idx].append(c)
buckets.sort(key=lambda b: len(b), reverse=True)
return [b.rep for b in buckets]

def make_uniform_clusters(oks, n=20):
def make_grid(n=20):
code_steps = np.linspace(-1.0, 1.0, num=n)
return list(itertools.product(code_steps, code_steps, code_steps))

prod = make_grid(n)
buckets = [Bucket(rep) for rep in prod]
_r, _ = vq(oks, prod)
for idx, c in enumerate(oks):
bucket_idx = _r[idx]
buckets[bucket_idx].append(c)
buckets.sort(key=lambda b: len(b), reverse=True)
return buckets``````