Automated Feature Extraction with Machine Learning and Image Processing

Prof. Dr. Stefan Bosse

University of Siegen - Dept. Maschinenbau
University of Koblenz - Dept. Computer Science

1 / 56

PD Stefan Bosse - AFEML - Module C: Image Feature Extraction and Marking -

Image Feature Extraction and Marking

In this module we learn the basic workflow of image processing consisting of:

  1. Image feature marking: Image → Image

    • Cellular Automata
    • Multiagent Systems
    • Convolution and Convolutional Filtering
    • CNN Pixel Classifier
  2. Image feature clustering (Point list → List of point lists)

    • Density-based Clustering
  3. Geometric processing of point clouds (Point list → Point list)

    • Hull Computation
2 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing -

Cellular Automata and Image Processing

How can we apply algorithms to image data?

3 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing -

Cellular Automata and Image Processing

How can we apply algorithms to image data?

Relation between Images and Cellular Automata - local versa global state processing

4 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing -

Cellular Automata and Image Processing

How can we apply algorithms to image data?

Relation between Images and Cellular Automata - local versa global state processing

From local to global state using Point Clustering and Polygon Hulls

5 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing -

Cellular Automata and Image Processing

How can we apply algorithms to image data?

Relation between Images and Cellular Automata - local versa global state processing

From local to global state using Point Clustering and Polygon Hulls

Kernel-based Transformations and Convolutional Neural Networks!

6 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Cellular Automaton

  • A cellular automaton (CA) is basically an active (functional) matrix

  • Applications:

    • Analysis and exploration of complex space-time dynamics and of the behavior of dynamical systems
    • Image transformations (state-less single-step and state-based/iterative)
    • Feature Analysis (Extraction, Amplification)
    • Parallel computation model!
7 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Cellular Automaton

  • Basic assumption:

Complex phenomena are the result of the collective dynamics of a very large number of parts obeying simple rules

8 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Cellular Automaton

We want to define the simplest nontrivial model of a cellular system. We base our model on the following concepts:

  • Cell and cellular space
  • Neighborhood (local interaction)
  • Cell state
  • Transition rule
  • We do not model all the details and characteristics of biological multicellular organisms but we obtain simple models where many interesting global and local phenomena can be observed.
9 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Space

Cellular Space

  • The cellular space constructs the cell network featuring
    • Dimension
    • Size
    • Neighbourhood

Floreano et al.

10 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Cellular Automaton

  • A cellular automaton (CA) consists of a set of state-based cells CA={c1,c2,..,cn}.

  • Each cell i has a number of cell variables {x1,..,xk}i

  • The variables define the data state S: S(zi)=Si={x1,..,xk}

  • The next state of the cells is calculated via an activity function Act.

  • The state transition of a cell usually takes place in the activity function, optionally prepared by before (pre) or after (post) functions.

11 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Scheduling

  • A CA is a parallel computational architecture. All cell state transitions can be computed in parallel at the same time.

  • But CA processing by generic computers requires a (semi-) sequential algorithm to perform cell state transitions ⇒ CA simulation with a scheduler

  • In a parallel system the following formula is valid for each cell (①right hand side, ②left hand side):

Si(t+1)=Act(Si(t),{Sj(t)|jiand|jR})

  • Sequential computation would modify right hand side input before used!
  • Therefore, the old state of cells must be saved before the new state is computed!
12 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

  • A scheduler processes the individual calculation functions of all cells sequentially in three phases:

    1. before: ∀ cells do pre(S)
    2. activity: ∀ cells do Act(S)
    3. after: ∀ cells do post(S)

pre(S):SiSi,0Act(S):Si,0×ΣSipost(S):SiSi

  • with Σ as the state of neighbourhood cells (or global state in general):

Σ(R)i={Sj|jiand|jR}

13 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Architecture and basic principle

  • The Cellular Automaton is a simulation tool that represents an artificial two-dimensional world of entities!
  • For visualization, each cell gets a color from a defined color palette

Cellular automaton as a network of simple communicating computational units

14 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Cell States

There is a

Data State
Defined by the set of all cell states (variables), visible
Control State
The cell activation (update) sequence, hidden (Scheduler)

Typically, the data state can be compsoed of:

  • Input space X, e.g., a mapping of image pixels on cells 1:1
  • Intermediate hidden states T, i.e., auxiliary and temporary variables, e.g., t=x;x=y,y=t
  • Output space Y, e.g., mapping of cell states on an output image (pixels)
15 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Cell Neighbourhood

  • Informally, it is the set of cells that can influence directly a given cell
  • In homogeneous cellular models it has the same shape for all cells
  • Neighbourhood defined some kind of communication (data transfer, but w/o synchronisation)
Moore
Moore neighbourhood is a squared region around a center cell position with a radius R.
Neumann
Neumann neighbourhood is an approximated "circular" region around a center cell position with a radius R. R=1 neighbourhood defines only the direct neighbours North, South, West, East.
16 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Floreano et al. Comparison of Moore and Neumann neighbourhood in different space dimensions

17 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Cell State Transitions

  • The value of the state of each cell belong to a finite set, whose elements are commonly numbers.

  • The value of the state is often represented by cell colors. A color function maps cell states on colors (from a palette)

  • There can be a special quiescent state s0.

  • The transition rule is the fundamental element of the CA.

  • It must specify the new state corresponding to each possible configuration of states of the cells in the neighborhood.

18 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Floreano et al. A transition rule maps states of neighbouring cells on new cell states. Cell states are visualized by colors.

19 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Cellular Automaton

Cell updates by transition rules modify the cell states over (discrete) time. In order to start with the updating of the cells of the CA we must specify the initial state of the cells (initial conditions or seed)

Floreano et al.

20 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Image processing with CA

Image processing with CA

An input image is mapped 1:1 on a cell matrix. The CA activation (iteratively) modifies the state of the cells, finally mapped 1:1 on an output (feature) image.

21 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Kernel-based Transformations

Kernel-based Transformations

  • Convolution or folding is a linear operation applied to each CA cell → Linear and state-less CA transition rules
  • Common kernel-based transformations:
    • Denoising
    • Intensity gradient
    • Edge detection / amplification
  • A kernel-based transformation produces always a linear output:

I(x,y)=ijI(xi+a,yj+b)k(i,j)

  • with k: kernal/folding matrix, a,b: index of center point of folding matrix k
22 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Kernel-based Transformations

Effect Kernel
Average
19111111111
Sharpening
010151010
Relief
210111012
Effect Kernel
Edge, Laplace
010141010
Edge, Sobel
101202101
Noise?, Bosse
110001100
23 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Kernel-based Transformations

Most kernel-based edge and gradient computations are sensitive in a specific direction (axis).

  • E.g., there are two Sobel edge filter kernel, one for the x-, and one for the y-axis sensitive, finally combined:

Sx=101202101Sy=SxT=121000121zx,y= (3k=1Sxk,lzx+k1,y+l1)2+(3l=1Syk,lzx+k1,y+l1)2

24 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Iterative Transformations

Iterative Transformations

  • Kernel-based transformations usually terminates after one step, except if the input is the output of the previous update phase!

  • In iterative transformations, the input of the next update phase is the output from the last (or the initial seed image data)

25 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Edge Detection with CA

Edge Detection with CA

For edge detection of intensity images, some approaches first convert the intensity images into binary ones, and then evolve two-state cellular automata using specific state transition rules to determine edge pixels, while the others directly update pixel states based on the relationship of the central pixel with its neighbourhood, mostly a 1-ring von Neumann or Moore neighbourhood.

Popovici et al. proposed an edge detection approach based on the state differences between the central pixel and the pixels in its von Neumann neighbourhood. If all the absolute state differences are less than a threshold ε, then the state of the central pixel becomes 0, otherwise it remains unaltered.

26 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Edge Detection with CA

Edge Detection with CA

  • The rule can be formulated as:

s+i={0sjsi<δ,jNisiotherwise

  • where s and s+ are the current and the updated states of the central cell i, Ni is the von Neumann neighbourhood of the cell c, and sj is the current state value of the cell j in Ni.

  • In the case of feature marking, the conditional expression || < δ can be applied to other input variables (input image), and the state s represents a marked feature count only. But in this case it is a one-step update.

27 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Edge Detection with CA

Edge Detection with CA

Wongthanavasu and Sadananda proposed a conditional rule to update the cellular state as:

s+i={sisi<(smaxsmin)smaxsminotherwisesmax=max{sjjNi},smin=min{sjjNi}

  • where smax and smin are the maximum and minimum states, respectively, in the von Neumann neighbourhood Ni of the central cell i.
28 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Agent-based Image Processing: Feature Marking

Agent-based Image Processing: Feature Marking

Lui proposed a similar approach of neighbourhood exploration for feature marking in mages using mobile agents.

  • In contrast to Ca cells, the state of an agent is not fixed to a specific position.

  • There are explorations agents with the following behaviour:

    • Replication. If the agent finds a feature at the current position, it stays (or increasing the feature count), then replicate a specific number r of child agents that migrate to a neighboruing cell (randomly chosen).
    • Diffusion. If the agent do not find a feature, it migrates to neighboruing cell (randomly chosen).
    • The number of diffusions and hops is limited.
    • At each new place the feature detection starts again.
29 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Agent-based Image Processing: Feature Marking

  • Conditional expression for feature detection:

s+i=si+{1(vjvi<δ,jNi)[ϵ1,ϵ2]0otherwise

  • with v as the input pixel intensity and s as the feature marking count.
30 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Agent-based Image Processing: Feature Marking

CA Implementation of FM-MAS

state : { input:0, output:0, weight:1, deltaweight:0 },
activity : function (x,y) {
// neighbors: ordering [NW,N,NW,W,E,SW,S,SE]
var countdn = this.ask(this.neighbors, 'absdiffcount', 'input',this.delta,'<='),
mark = countdn>=this.eps1 && countdn <= this.eps2;
if (mark) {
this.output += this.weight;
// simulate new agent replications on heighbor nodes by increasing weight
for (var i=0;i<this.replicate*this.weight;i++) {
var delta = [0,0];
while (delta[0]==0 && delta[1]==0)
delta=[Math.random.select([-1,0,1]),Math.random.select([-1,0,1])];
this.set(delta,'deltaweight',this.get(delta,'deltaweight')+1);
}
} else if (this.weight>0) {
this.deltaweight--;
// simulate aagent diffusion to heighbor node by increasing weight
var delta = [0,0];
while (delta[0]==0 && delta[1]==0)
delta=[Math.random.select([-1,0,1]),Math.random.select([-1,0,1])];
this.set(delta,'deltaweight',this.get(delta,'deltaweight')+1);
}
},
after : function (x,y) {
this.weight += this.deltaweight;
this.deltaweight=0;
},
31 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Learning CA

Learning CA

Which transition rules leads to a specific (global) emergent behaviour, e.g., edge detection?

32 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Learning CA

Learning CA

Which transition rules leads to a specific (global) emergent behaviour, e.g., edge detection?

Besides classical folding kernels, the answer is unknown.

33 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Learning CA

Learning CA

Which transition rules leads to a specific (global) emergent behaviour, e.g., edge detection?

Besides classical folding kernels, the answer is unknown.

Solution: Learn the rules, i.e., the kernel matrix! E.g., by using evolutinary and genetic algorithms.

34 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Density-based Pointcloud Clustering DBSCAN

Density-based Pointcloud Clustering DBSCAN

  • The output of the CA is an image with pixels representing point-wise features

  • But point clouds are still high-dimensioal

  • Dimensionality reduction by finding groups of points close together ⇒ Clustering

Density-based spatial clustering of applications with noise (DBSCAN) is a well-known data clustering algorithm that is commonly used in data mining and machine learning.

35 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Density-based Pointcloud Clustering DBSCAN

Algorithm

  • Based on a set of points (let’s think in a bidimensional space as exemplified in the figure), DBSCAN groups together points that are close to each other based on a distance measurement (usually Euclidean distance) and a minimum number of points.

  • It also marks as outliers the points that are in low-density regions.

Input
List of all point coordinates, can be pre-clustered in independent lists if the point belong to different classes
Output
List of point index lists with respect to the original input list. Each sub-list is a cluster group.
36 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Density-based Pointcloud Clustering DBSCAN

https://www.kdnuggets.com/2020/04/dbscan-clustering-algorithm-machine-learning.html Points: Core - This is a point that has at least m points within distance n from itself. Border - This is a point that has at least one core point at a distance n. Noise - This is a point that is neither a Core nor a Border. And it has less than m points within distance n from itself.

37 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Density-based Pointcloud Clustering DBSCAN

  1. The algorithm proceeds by arbitrarily picking up a point in the dataset (until all points have been visited).

  2. If there are at least ‘minPoint’ points within a radius of ‘ε’ to the point then we consider all these points to be part of the same cluster.

  3. The clusters are then expanded by recursively repeating the neighborhood calculation for each neighboring point

38 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Density-based Pointcloud Clustering DBSCAN

Parameters

The DBSCAN algorithm basically requires 2 parameters:

https://towardsdatascience.com/how-dbscan-works-and-why-should-i-use-it-443b4a191c80

eps
specifies how close points should be to each other to be considered a part of a cluster. It means that if the distance between two points is lower or equal to this value (eps), these points are considered neighbors.
minPoints
the minimum number of points to form a dense region. For example, if we set the minPoints parameter as 5, then we need at least 5 points to form a dense region.
39 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Density-based Pointcloud Clustering DBSCAN

Parameter estimation

The parameter estimation is a problem for every data mining task. To choose good parameters we need to understand how they are used and have at least a basic previous knowledge about the data set that will be used.

eps
if the eps value chosen is too small, a large part of the data will not be clustered. It will be considered outliers because don’t satisfy the number of points to create a dense region. On the other hand, if the value that was chosen is too high, clusters will merge and the majority of objects will be in the same cluster. The eps should be chosen based on the distance of the dataset (we can use a k-distance graph to find it), but in general small eps values are preferable.
40 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Density-based Pointcloud Clustering DBSCAN

minPoints
As a general rule, a minimum minPoints can be derived from a number of dimensions (D) in the data set, as minPoints ≥ D + 1. Larger values are usually better for data sets with noise and will form more significant clusters. The minimum value for the minPoints must be 3, but the larger the data set, the larger the minPoints value that should be chosen.
41 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Density-based Pointcloud Clustering DBSCAN

More information:

Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, №34, pp. 226–231).

42 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Polygon Hulls

Polygon Hulls

  • Up to here, we have still point clouds, although clustered.

  • For, e.g., area or perimeter computation of a cluster, a (concave) polygon hull approximation is needed (simplifying, averaging, and denoising point cloud)

  • Well known algorithms and software:

    • Graham Scan
    • Polylidar

The hull may not be limited to monotonic hulls without holes and caves (mixing of convex and concave hull segments)!

43 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Polygon Hulls

  • A hull can be described by a list of ordered boundary points

  • A hull can be described by a list of pice-wise linear lines (between boundary points)

https://gis.stackexchange.com/questions/143821/how-to-find-the-concave-hull-for-a-cloud-of-points-in-3d-space The outline of a point cloud as a concave hull

44 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Polygon Hulls

Further readings:

  1. Implementation of a fast and efficient concave hull algorithm, http://www.it.uu.se/edu/course/homepage/projektTDB/ht13/project10/Project-10-report.pdf
45 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Convolutional Neural Networks

Convolutional Neural Networks

CNN are suitable for classifying different structural features in the data regardless of location and orientation

A CNN is based on matrix algebra with convolution operations

For further reading: https://towardsdatascience.com/covolutional-neural-network-cb0883dd6529?gi=521747216671; G. Paaß, Artificial Intelligence, What is behind the technology of the future?, Jumper

Well-known software framework for the browser: convnet.js https://cs.stanford.edu/people/karpathy/convnetjs

46 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Folding Operation

Folding Operation

20 First calculation step (left) and second calculation step (right) in the convolution layer for a shifted small area of the input matrix. The kernel is "pushed" successively over the entire input matrix and the result matrix is filled

47 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Convolutional Neural Networks

Convolutional Neural Networks

  • A CNN is composed of different layers:
    • Convolution layers
    • Merging (pooling) layers
    • Output by classification layers (softmax)

20 A convolution layer contains k kernels and k result matrices, which are each combined into tensors

48 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Convolutional Neural Networks

https://towardsdatascience.com General construction of a CNN with alternating layers of convolutions, merging, and finally binary classification

49 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Convolutional Neural Networks

https://towardsdatascience.com Depending on the number of structural features, there may be a large number of the following folding and merging layers

50 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - CNN Point Classifier

CNN Point Classifier

  • As with the CA approach, a CNN can be used to predict feature output images ⇒ CNN Image Feature Marking

  • The idea: A CNN is used to predict the feature class (e.g., marking of pixel belonging to a crack shape) with a small moving segment window.

    • The central pixel of this segment (e.g., (10,10) in a 20 × 20 pixel segment) is the pixel for which the CNN should predict the feature classification
51 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - CNN Point Classifier

Iterative image pixel feature marking by a moving segment window

  • Training is supervised with labelled input data.

  • Training and prediction poses high computational complexity.

  • The CNN can be simple, e.g.: two convoultional layers with 8 and 16 filter, respectively, one soft-max layer. But even such a simple CNN has about 10000 parameters!

52 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - CNN Point Classifier

Training
We need a set of labelled images. The regions with a feature are described with closed polygon paths. For each input image of size w × h, we get a feature map image of size w × h pixels. A randomly selected and shuffled segment set is used for training (created in advance). Segmentation can be strided (stride=Δx,Δy)
Prediction
The model is applied to all pixels of an input image with optional striding. The result is a feature map images of size w/stride × h/stride pixels.
53 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - CNN Point Classifier

Example

Feature marking with a trained CNN pixel classifier of pores in X-ray images (Aluminum diecasted plates, here synthetic X-ray images simulated with gVXR)

54 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Summary

Summary

In this module we learnrd the basic workflow of feature extraction with image processing consisting of:

  1. Image feature marking: Image → Image

    • Cellular Automata
    • Multiagent Systems
    • Convolution and Convolutional Filtering
    • CNN Pixel Classifier
  2. Image feature clustering (Point list → List of point lists)

    • Density-based Clustering
  3. Geometric processing of point clouds (Point list → Point list)

    • Hull Computation
  4. Finally we can compute aggregate paramters (from hull):

    • Mass of Center (MoC)
    • Geometric model fit (e.g., ellipse or ellipsoid fit), ellipse parameters
    • Area, Volume
55 / 56

PD Stefan Bosse - AFEML - Module C: Cellular Automata and Image Processing - Further Reading

Further Reading

  1. Bio-Inspired Artificial Intelligence: Theories, Methods, and Technologies by Dario Floreano and Claudio Mattiussi, MIT Press
56 / 56