http://rubinghsoftware.de/projects/alfavision_comp/algorithm.html


Alfavision programming test:

Algorithm description



Page contents
ALGORITHM
    Basic idea
    The output quantities we have to compute
    The component counting algorithm (introduction)
    The component counting algorithm (completed)
ALGORITHM OPTIMIZATIONS AND IMPLEMENTATION
    Algorithm optimization 1: lineAcc
    Algorithm optimization 2: lm_t, and Recycling of label numbers


This page describes the algorithm I used for solving the Alfavision programming test for extraction of the number of pixels, bounding box, and center of mass of each of the 4-connected components in a binary image.  The text below is the exact algorithm description that I included as a comment in one of the source files (only reformatted from plain ASCII text to HTML for more convienient readability). 



ALGORITHM

Basic idea

The algorithm is derived from the algorithms for "Connected Components Labeling" that I read about in Haralick and Shapiro, Computer and Robot Vision, Addison-Wesley, 1992.

Connected Components Labeling means that each white pixel in the image is labeled with an integer ID number (called a label) that is unique for one of the connected components.  (Each component consisting of four- connected white pixels.)
    We start with an unlabeled image (all labels having the value LABEL_EMPTY).  At the end, each white pixel should have a valid (nonempty) label value (the black pixels remain unlabeled).

The basic idea of the algorithm is as follows.  We run through the image line by line ("line" = horizontal row of pixels) from top to bottom, going through each individual line from left to right.  During this loop, in each still-unlabeld white pixel P we encounter, we give this pixel P a label number derived from the labels on the two immediate neighbour pixels of P that were already processed in the loop, i.e. the left and upper neighbours of P. 
    When our white pixel P (that is to be labeled) has no white pixels as its left of upper neighbours, then we give the pixel P a new unused label number.  When our white pixel P does have a white pixel neighbour to its left or above it, then we propagate the label of this neighbour into our pixel P. 

Example input image:

       +---+---+---+---+---+---+
       | X | X | X | . | X | . |    X = white pixel
       +---+---+---+---+---+---+    . = black pixel
       | . | . | X | . | X | . | 
       +---+---+---+---+---+---+
       | . | . | X | X | X | . | 
       +---+---+---+---+---+---+

Running the labeling algorithm described above and issuing new label numbers in order 1, 2, ... produces:

       +---+---+---+---+---+---+
       | 1 | 1 | 1 | . | 2 | . |    
       +---+---+---+---+---+---+    
       | . | . | 1 | . | 2 | . | 
       +---+---+---+---+---+---+
       | . | . | 1 | 1 | P | . | 
       +---+---+---+---+---+---+

In the last diagram, we see the case arising that the white pixel P to be labeled has a left and an upper white-pixel neighbour with different label numbers.  For Connected Components Labeling, where an explicit labeled output image is desired (with a unique label for each component), this case is troublesome, and needs at least a second pass over the image. 
    However, as explained in detail below, for our purpose the above case turns out not to be troublesome; and for our purpose it turns out to be sufficient to iterate once (and only once) over the input image in a way similar Connected Components Labeling, without storing or creating the complete labeled image that is the output of Connected Components Labeling. 

The output quantities we have to compute

Our goal is not the labeled output image that is created by Connected Components Labeling.  Our goal is simpler, namely we only have to determine for each connected component: the number of pixels in it, its bounding box, and its center of mass. 

The total number of pixels in the component and the bounding box can be determined by a function that is fed all the pixels of the component one by one, where the function keeps "running sums" for the total and for min_x, min_y, max_x, max_y

The center of mass (xC,yC) of a component is

xC  =  Σi[ xi ] / n
yC  =  Σi[ yi ] / n
where i in the sum runs over all pixels in the component, and where n is the total number of pixels in the component. The two sums
xS  =  Σi[ xi ]
yS  =  Σi[ yi ]
can also be computed by a function that is fed all the pixels of the component one by one, keeping running sums for xS and yS.  Just before the data for the component is written to output, we divide the sums xS and yS by n to obtain the desired (xC,yC). 

Therefore all output quantities we have to compute can be determined by a counting function that is given the pixels of a the components one by one.  This function needs a memory, namely for each component it needs a separate set of running sums (for n, min_x, min_y, max_x, max_y, xS, and yS).  My implementation uses comp_t for these running sums for a component (using bary_x and bary_y for holding xS and yS). 

All our output quantities can therefore be determined by looping once through the image pixels, and for each pixel giving the counting function the pixel's x and y, and its label (so that the function knows which counts to add to.)

The component counting algorithm (introduction)

For our purposes, we do not need to create the complete labeled image that is the output of Connected Components Labeling.  The only thing we need the labels for is to identify which set of running sums (comp_t) each white pixel that we encounter should be added to as we loop through the pixels of the image.  Consider again the earlier example:

       +---+---+---+---+---+---+
       | 1 | 1 | 1 | . | 2 | . |    
       +---+---+---+---+---+---+       In P:
       | . | . | 1 | . | 2 | . |         n1 = 6  
       +---+---+---+---+---+---+         n2 = 2
       | . | . | 1 | 1 | P | . | 
       +---+---+---+---+---+---+
As we iterate through the pixels from left to right and from top to bottom, we add the labeled pixels (1, 2) to the comp_t running sums for the two components 1 and 2 respectively.  The diagram above shows also the values of the running totals n1 = comp_t::pixels for component 1 and n2 = comp_t::pixels for component 2, just before reaching pixel P. 

Arriving at pixel P, where the left and upper neighbours have different labels, we discover that labels 1 and 2 are not for different components but for the same component.  All we have to do here is to add the running sums (comp_t) for the components 1 and 2 together, store the result under one of these labels (in my implementation I always take the left label, here "1"), and continue processing with the label of the combined component (1).  Component 2 has now been merged into the component 1, and the comp_t for label 2 should be thrown away. 

Finally we of course also have to add pixel P itself to the comp_t of the merged component 1. 

The result is:

       +---+---+---+---+---+---+
       | 1 | 1 | 1 | . | 2 | . |    
       +---+---+---+---+---+---+       In P:
       | . | . | 1 | . | 2 | . |         n1 = 6 + 2 + 1 = 9
       +---+---+---+---+---+---+        
       | . | . | 1 | 1 | 1 | . | 
       +---+---+---+---+---+---+
We see that after processing pixel P, we have arrived at the correct count n1 = comp_t::pixel for the connected component.

The component counting algorithm (completed)

In the last diagram above, the labels "2" on the pixels above P are wrong.  For that example image, this is irrelevant because in that example, these labels are not "visible" anymore (not inspected anymore by to the algorithm) after we proceed to the next pixel after P (if we only take one loop over the image). 

The next example shows that we can not simply forget about labels in the row above, and shows that the algorithm needs a small extra ingredient:

       +---+---+---+---+---+---+---+
       | X | X | . | X | X | X | . |     X = white pixel
       +---+---+---+---+---+---+---+     . = black pixel
       | . | X | . | X | . | X | . | 
       +---+---+---+---+---+---+---+
       | . | X | X | X | . | X | X | 
       +---+---+---+---+---+---+---+
After running our algorithm (as described so far) up to the pixel in the middle of the bottom row (pixel P) we have:
       +---+---+---+---+---+---+---+
       | 1 | 1 | . | 2 | 2 | 2 | . |      At P:
       +---+---+---+---+---+---+---+        n1 = 5
       | . | 1 | . | 2 | . | 2 | . |        n2 = 5
       +---+---+---+---+---+---+---+
       | . | 1 | 1 | P | . | X | X | 
       +---+---+---+---+---+---+---+
Pixel P now gets label "1", and we add the counts for component 2 into the counts for component 1:
       +---+---+---+---+---+---+---+
       | 1 | 1 | . | 2 | 2 | 2 | . |     
       +---+---+---+---+---+---+---+      At Q:
       | . | 1 | . | 2 | . | 2 | . |        n1 = 5 + 5 + 1 = 11
       +---+---+---+---+---+---+---+
       | . | 1 | 1 | 1 | . | Q | X | 
       +---+---+---+---+---+---+---+
Continuing to the next white pixel, Q, we encounter in the pixel above Q the label "2", which is the label of a component that has been merged into the combined component 1. 

At pixel Q, the algorithm needs one more piece of information, namely the information that label 2 is no longer in use, and belongs to a component that has been merged into component 1.  That is, we need the information that label 2 has to be mapped to label 1. 

This information must be created at pixel P.  At that pixel, we have to register (store) the information that label 2 no longer has its own comp_t, and instead maps to label 1.  Every time we merge the comp_t of label K into the comp_t of label L, we have to register the mapping

label K   -->   label L
of a label (K) that no longer has a comp_t, to the label (L) which does have a comp_t and into which the component K has been merged.  The algorithm therefore has to keep a list of mappings.  This set of mappings is called the "set of equivalences" or the "equivalence mapping list". 

Considering again pixel Q, we see that every time we inspect a pixel's upper neighbour and retrieve its label (idUp), we have to look up this label in the equivalence mapping list, and (of found) replace it by the label it maps to. 

At pixel Q we therefore retrieve the mapping 2 --> 1 from the equivalence mapping list.  With this new label number, we then continue normal processing: We give to pixel Q the label 1, and add the pixel to the comp_t of component 1. 

This ensures that the labels of all components that have been merged into other components have an entry in the equivalence mapping list.  And it ensures that any time a label L is encountered that that no longer has a comp_t, it is replaced by the label of a component that does have a comp_t, namely the component into which L has been merged. 

The equivalence mapping list tells us how to remap the "old" labels in the line above to the latest current labels, i.e. the labels containing the comp_t in which the counts are stored.  The equivalence mapping list "reroutes" things so that counts are added to the correct newest labels, instead of added to "old" labels. 

This gets us the correct counts (number of pixels, bounding box, xS, yS) for each connected component after ONE single loop through the image. 



ALGORITHM OPTIMIZATIONS AND IMPLEMENTATION

Algorithm optimization 1: lineAcc

The diagrams for the examples above suggested that a separate image is used to store the label numbers for each pixel.  However this is not necessary for our purpose.  We only loop through the image once, line by line.  At any pixel P during the loop, we only need the label of the pixel left of P and the label of the pixel above P.  Therefore it suffices for us to store only one line of label data. 

I use an array of label numbers, called the "line accumulator" or lineAcc, of the same size as the width of the image.  The input image is processed line by line, the lineAcc moving from top to bottom over the input image:

            +---------------------------+
            |                           |
            |                           |
           +-----------------------------+  |         
           |                             |  |  lineAcc  
           +-----------------------------+  V         
            |                           |
            |                           |
            |                           |
            |                           |
            |                           |
            +---------------------------+
lineAcc is empty initially (all elements set to LABEL_EMPTY). 

The lineAcc is moved over the image line by line, top to bottom.  In each successive vertical position of lineAcc, the pixels of that line of the image are iterated through left to right. 

Consider the algorithm in pixel P, at pixel coordinates (iX,iY).  The elements left of iX in lineAcc have been overwritten by the labels of the pixels on line iY left of pixel P.  The elements in lineAcc at iX and further to the right still hold the labels of the pixels in the line iY-1 above.  (lineAcc is represented by the hashes #.)

            +---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---##################
            |   |   |   #   #   #   #    #  lineAcc
           ###############################
  lineAcc  #    #   #   # P |   |   |   |   --- iY
           ##############---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+
                          |
                         iX
The new label value Lp of pixel P is determined from the label of the pixel to its left (which is lineAcc[iX-1], and from the label of the pixel above, which at that point still sits in lineAcc[iX], left from the processing of the previous line.  The label value Lp is then written into lineAcc[iX], overwriting its previous contents (which isn't needed anymore).  Then we advance to the next pixel Q on the right and repeat. 
            +---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---##############
            |   |   |   |   #   #   #    #  lineAcc
           ###############################
  lineAcc  #    #   #   #   # Q |   |   |   --- iY
           ##################---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+
                              |
                             iX

Algorithm optimization 2: lm_t, and Recycling of label numbers

The labels only serve as references to the data associated with the label.  This data consists of:

For the algorithm to be fast, it is necessary that given a label number, the data for the label can be accessed very quickly. 

Logically, the functionality needed is a so-called map datastructure (associative array) where the key into the map is the label number, and where the data values stored for a given key are the ones listed above. 

The fastest map implementation for integer keys is a simple array, where the array index is the integer key. 

Images can be very big, with many components, and many instances of merging components.  This can lead to "sparse" sets of labels for which data needs to be stored, even if the new labels issued are successive integers.  This would make the array implementation wasteful of memory space, and would suggest that a more elaborate map implementation, e.g. with balanced trees, would be better.  However, the latter lose some of the unmatched speed of the array. 

The array implementation can however be made to work, if we recycle label numbers.  "Recycle" means that during the processing of the image, once it is certain the label number of a component is no longer needed, we reuse the label number for a new component.  This limits the number different label numbers drastically (compared to using the successive integers 1, 2, ... for each new component encountered, throughout the image). 

In our line-by-line processing, we determine the labels of the current line from the labels of two lines, namely the current line and the line above it.  Therefore largest number of different labels that the algorithm can have to keep account at any time is the maximum number of different components that can fit inside two successive lines.  The diagram below shows that for our four-connected components, this maximum number equals the width SizeX of the image in pixels. 

       +---+---+---+---+---+---+---+
       | X | . | X | . | X | . | X |     X = white pixel
       +---+---+---+---+---+---+---+     . = black pixel
       | . | X | . | X | . | X | . | 
       +---+---+---+---+---+---+---+
       |                           |
       |<-------SizeX pixels------>|
All the label numbers that were once used in the lines further above can no longer influence the algorithm, and can therefore be recycled for use for new components.  Before reusing an old label number, its comp_t (when not empty) is written to output as a completed component, and then cleared so that the new component starts with zero counts. 

With label recycling, SizeX different labels therefore suffice, which makes the array implementation very feasible (an array of size SizeX). 

Recycling of label numbers means that lists of free and used labels must be kept.  The "used-list" must be random-access, i.e. it must be possible to return any of the of the used labels to the free list. 

Since the array implementation is the fastest possible, I chose to use that implementation.  I use one and the same array for the for the map from labels to comp_t, for the equivalence mapping list (which is a map from labels to labels), and also for the free-list and the used-list.  The resulting data structure I called lm_t (for Label Map), and is implemented in lmap.h and lmap.c.  The header file 'lmap.h' contains documentation about the implementation details.