PD Stefan Bosse
University of Siegen - Dept. Maschinenbau
University of Bremen - Dept. Mathematics and Computer Science
PD Stefan Bosse - AFEML - Module A: Data and Data Features -
Metrics and taxonomy of Data
Features of Data
Analysis of Data
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data
In general, data and their values can be divided into:
Data have dimensionality 𝕏N
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data Reduction
P(XN):XN→YM|Y|<|X|,M<N
Materials science, metrology, and construction engineering uses:
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data Reduction
function isRaining(temp,sunrad,moisture) = { if (temp < 0) FALSE else if (temp > 40) FALSE else if ((sunrad-moisture) > 30) FALSE else TRUE}
A R example from measurement technology with a data reduction function ℝ3 → 𝔹
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data classes
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data classes
m = 1m = [1.0,1.5,2.5]c = 'A'c = ['A','B','A']c = [TRUE,FALSE,TRUE]c = factor(m,levels=[1,1.5,2,2.5],labels=['A','B','C','D'])
R examples of numerical and categorical values and conversion (factorization)
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data classes
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data classes
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data Aggregations
v = c(4) v = [1.0,1.5,2.5]v[1] = 1.2l = list(a=1,b=2) l = {a=1,b=2} l={1.0,1.5,2.5}l$a = 9m = matrix(0,nrow=2,ncol=3)m = [1,2,3;4,5,6]a = array(0,dim=[3,2,4])df = data.frame(a={1,2,3},b={3,4,5})
R examples of aggregated data)
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data classes (longitudinal)
A digitized sensor signal is always discrete in time, but the physical variable that the sensor measures is continuous in time (note the sampling theorem)
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Data
→X=(X1,X2,..,Xd)
→dj=→xj=(xj,1,xj,2,..,xj,d)
df = data.frame( X1={'x1,1','x1,2','...'}, X2={'x2,1','x2,2','...'}, X3={'x3,1','x3,2','...'})print(df) X1 X2 X3 == X1 "x1,1" "x2,1" "x3,1"2 "x1,2" "x2,2" "x3,2"3 "..." "..." "..."
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Input and Output Variables
→Xxy=(X1,X2,..,Xu,Y1,Y2,..,Yv)→X=(X1,X2,..,Xu)→Y=(Y1,Y2,..,Yv)→dj=(xj,1,xj,2,..,xj,u,yj,1,yj,2,..,yj,v)F(→X):→X→→Y,
with u+v=d.
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Example of a data matrix
1
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Example of a data matrix
Computed Strain-stress diagram
www.precifast.de/elastizitaetsmodul-e-modul
Measurement data from strain test
Strain [mm] | Force [kN] |
---|---|
0 | 0 |
0.1 | 0.2 |
0.2 | 0.7 |
0.3 | 1.5 |
0.4 | 1.7 |
0.5 | 1.9 |
0.6 | 2.0 |
0.7 | 0.2 |
0.8 | -0.5 |
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Example of a data matrix
tt = data.frame( Strain = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8], Force = [0.0,0.2,0.7,1.5,1.7,1.9,2.0,0.2,-0.5])
Measure data stored in a R data.frame
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Example of a data matrix
The measured variables X1 to X4 are metric data variables, the variable X5=y is a categorical variable!
The measured variables X1 to X4 (i.e. sensors) are called attributes because they are properties and descriptive variables of the target variable y.
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Sensors
Which sensors and measurement data do you know:
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Sensor
Measurement
When measuring with sensors, a distinction is made between:
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Sensor
Socio-technical systems, surveys
Generally available data
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Sensor model
A sensor is a transducer (indicator for a property that is not directly measurable)
A sensor therefore generally maps a physical quantity x to another quantity y:
S(x):x→y,K:correct(x→y)
There is usually a calibration function K(f,x,y)
Examples are:
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Sensor data
Sensors S are data sources d of physical, sociological or other natural variables x that cannot be detected directly
The data values (numeric) will be in a definable interval
S(x):x→dd∈[a,b]⇒{v0,v1,..,vi}
PD Stefan Bosse - AFEML - Module A: Data and Data Features - Sensor data
PD Stefan Bosse - AFEML - Module A: Measurement and sensory systems - Sensor data
The origin of data for analysis and machine learning!
A sensor rarely comes alone.
PD Stefan Bosse - AFEML - Module A: Measurement and sensory systems - Measurement methods
A distinction is made between two different measurement methods:
PD Stefan Bosse - AFEML - Module A: Measurement and sensory systems - Measurement methods
Acoustic Emission measuring technologies can belong to both classes,
PD Stefan Bosse - AFEML - Module A: Measurement and sensory systems - Measurement methods
Acoustic Emission measuring technologies can belong to both classes,
Guided Ultrasonic Waves belong to class A, and
PD Stefan Bosse - AFEML - Module A: Measurement and sensory systems - Measurement methods
Acoustic Emission measuring technologies can belong to both classes,
Guided Ultrasonic Waves belong to class A, and
X-ray imaging belongs commonly only to class P.
PD Stefan Bosse - AFEML - Module A: Signal Features - Measurement methods
Statistical Features
Spatial Features (Images, geometric features)
Frequency and spectral Features /time and space)
Differences to reference signals
Transformed Signals
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
Assumption: Data series
There is a data series d related to one variable x(from sensor s):
→d={d1,d2,…,dn},s:x→d
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
Feature | Formula |
---|---|
Sample Size | n |
Extrema | min(x),max(x) |
Sample Mean | ¯¯¯x=∑ni=0xin |
Standard Deviation | s=√∑ni=0(xi−¯¯¯x)2n |
Sample Variance | s2=∑ni=0(x−¯¯¯xi)2n |
... and many more
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
use mathForce = [0.0,0.2,0.7,1.5,1.7,1.9,2.0,0.2,-0.5]statsForce = fivenum(Force)statsForce$std = sd(Force)cprint(statsForce){min : -0.5 , q1 : 0.2 , median : 0.7 , mean : 0.855 , q3 : 1.7 , max : 2, sd: 0.93}
Statistical analysis of data series or vectors in R
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
Feature | Formula |
---|---|
N-th moment about point a, e.g., a=¯x |
μn(a)=∑(x−a)nP(x) |
Gaussian Distribution | P(x)=1σ√2πe−(x−μ)2/−(x−μ)22σ22σ2 |
Fisher Skewness | γ1=μ3μ3/22=μ3σ3,σ=√μ2 |
... and many more
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
use mathForce = [0.0,0.2,0.7,1.5,1.7,1.9,2.0,0.2,-0.5]mn = moment(Force,order=2,central=TRUE)print(mn)
Higher order moment analysis of data series or vectors in R
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
Meaning of higher order moments (Wikipedia)
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
Statistical analysis is applied to the same static variable X with unordered values from repeated measurements of X under the same conditions
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
Statistical analysis is applied to the same static variable X with unordered values from repeated measurements of X under the same conditions
Statistical measures for data series (e.g., time-dependent) of dynamic variables with values from measurements under different conditions are not valid ("non-sense"). But statistical measures can be still used as signal features posing a correlation between the input signal and the target features (e.g., damages), e.g., the mean value or higher order moments.
PD Stefan Bosse - AFEML - Module A: Signal Features - Statistical Features
An ordered data series {di} can be considered as an ordered series of different variables {Xi}!
Stat(X):X→XfX=(X1,..,Xi),Xf=(Xf1,..,Xfj),i≫j
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Low Level
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
High Level
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Reduce Picture Dimension
A simple way to reduce the dimension of our feature vector is to decrease the size of the image with decimation (downsampling) by reducing the resolution of the image.
If the color component is not relevant, we can also convert pictures to grayscale to divide the number dimension by three.
Intensity homogenisation using transfer functions
A two-dimensional mathematical matrix is a grayscale image, a three-dimensional mathematical matrix is a color image.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Conversion from color to grayscale uses a specific color model transformation. Be careful.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
I(x,y)=R(x,y)+G(x,y)+B(x,y)3
I(x,y)=0.299R(x,y)+0.587G(x,y)+0.114B(x,y)
I(x,y)=f(R(x,y))+f(G(x,y))+f(B(x,y))3f(i,a)=(1−a)k+aia=A(x,y)kk=255
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Intesnity distributions can be transformed with continous functions e.g., an exponential gamma correction, or by using a look-up table.
A look-up table can be considered as a discrete mapping function f(x): x → y, whereby the index, i.e,, a specific row, is given by the (discrete) x value, and y is the value in the specific row.
Only meaningful for small and discrete intensity value ranges, e.g., 8 Bit [0,255]
Only rough approximation of an intensity transfer function with continous value distributions, but fast method!
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
use plot,math,imagervals = [1,3,5,6,7.5,8,8.5,9,9.5,10]mylut = lut(vals,range=[0,9])img = matrix(runif(100)*10,10,10)img.isca = mylut(img)plot(img,auto.scale=TRUE)hist(img,breaks=20)plot(img.isca,auto.scale=TRUE)hist(img.isca,breaks=20)
LUT function in R(+) applied to a random matrix
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
The HOG feature descriptor is a popular technique used in computer vision and image processing for detecting objects in digital images.
The HOG descriptor is a type of feature descriptor that encodes the shape and appearance of an object by computing the distribution of intensity gradients in an image.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
use math,plotimg = matrix(runif(100),10,10)plot(img,auto.scale=TRUE)hist(img,ylim=[0,1])img[img>0.5]=1plot(img,auto.scale=TRUE)hist(img,ylim=[0,1])
Histogram of a uniformly distributed random image and image binarization
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
The intensity of an image can vary significantly across the spatial x-y plane, e.g., as a result of the measuring method and conditions.
Image processing and transformation algorithms can be sensitive to intensity inhomogeneity.
Algorithms:
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Microcracks Image
Intensity Profiles
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
https://docs.opencv.org/3.4/d4/d1b/tutorial_histogram_equalization.html
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
https://github.com/YuAo/Accelerated-CLAHE
Histogram equalization (HE) is a method in image processing of contrast adjustment using the image's histogram.
This method usually increases the global contrast of many images, especially when the usable data of the image is represented by close contrast values.
Through this adjustment, the intensities can be better distributed on the histogram.
This allows for areas of lower local contrast to gain a higher contrast and attention in visual inspection.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Hcd(i)=∑0≤j<iH(j)N
Ieq(x,y)=Hcd(I(x,y))
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
use math,plotm=matrix(runif(100),10,10)h=hist(m,ylim=[0,1],breaks=20,plot=FALSE)print(h$density)cdf=vector('numeric',length(h$density))for (i in 1:length(h$density)) { cdf[i]=sum(h$density[1:i])}plot(cdf,auto.scale=TRUE,main='CDF')
Higher order moment analysis of data series or vectors in R
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
This simple Histogram Equalization is not sensitive to spatial intensity inhomogeneities and variations! Spatial uniform intensity distributions are assumed!
Intensity variations can be a result of a statistical process or due to the measuring technology and conditions
Methods based on a spatial filtering of the images use the assumption that the bias field (intensity inhomogeneity) consists of a low spatial frequency intensity variation ⇒ Applying a High-pass filter in the wavenumber space!?
Low-pass filtering methods can be used to extract non-uniformity
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Trivial Approach
xl=x0+apyl=y0+bpl(p):p→(x,y)l⊥(p,q):q→(x,y)
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
(Left) Computing the average intensity Iavg(p) perpendicular to a line along the intensity gradient (Right) Correct all pixels perpendicular to the correction line with a equalization factor
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
https://github.com/YuAo/Accelerated-CLAHE
CLAHE (Contrast Limited Adaptive Histogram Equalization) is an algorithm for enhancing local contrast in images, and is frequently used in application areas like underwater photography, traffic control, astronomy, and medical imaging.
CLAHE can also be used in the tone mapping operation of displaying a HDR (High Dynamic Range) image.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Adaptive histogram equalization (AHE) differs from ordinary histogram equalization in the respect that the adaptive method computes several histograms, each corresponding to a distinct section of the image, and uses them to redistribute the lightness values of the image.
It is therefore suitable for improving the local contrast and enhancing the definitions of edges in each region of an image.
AHE has a tendency to overamplify noise in relatively homogeneous regions of an image.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
|DFT(s)|:s(t)→S(ω)DFT({xn}):{xn}→{Xk}Xk=∑0≤n<Nxne−2iπNkXk=∑0≤n<Nxn(cos(2πNkn)−isin(2πNkn))
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
The DFT transforms a series of complex numbers {xn} into a sequence of complex numbers {Xk}.
Low-, High-, and Bandpassfiltering can be performed by applying a mask function to the frequency distribution {Xk} and transforming back into time-space (blending in frequency space)
TU Graz, IVU_frequency_2017
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Images can be transformed into the frequency space, too, called wavenumber space
A two-dimensional (2D) DFT is used (output is a matrix, too)
IF(k,l)=∑0≤m<N∑0≤n<NI(m,n)e−2iπ(kmN+lnN)
TU Graz, IVU_frequency_2017
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
The signal frequency distribution is symmetric!
TU Graz, IVU_frequency_2017
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Disadvantage of Fourier transformations is the lost of the time or spatial information.
Wavelet decomposition is a way of breaking down a signal in both space and frequency. In the case of pictures, this means breaking down the image into its horizontal, vertical, and diagonal components.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Parida et al.,2017 Decomposition of an image 2-D discrete wavelet transform with filter banks (2-D DWT)
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Bosse et al., doi:10.3390/computers10030034 Example of a DWT signal decomposition of a US time-dependent signal
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
An image wavelet is a two-dimensional function Φ(x,y), and we need two.dimensional convolution operations. Time consuming!
Examples of 2D wavelets (Left) Haar (Right) Max Hat https://www.section.io/engineering-education/wavelet-transform-analysis-of-images-using-waveletanalyzer-toolbox-in-matlab/
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Instead performing a 2-D wavelet convolution, we can apply the 1-D transformation to the rows and columns of images as separable 2-D transformations.
In most applications where wavelets are used for image processing, this approach is more practical due to the low computational complexity of separable transformations.
Each decomposition reduces the image size by a factor 2 in each dimension: DWT: M × M → M/2 × M/2;
The DWT decomposition can be repeated by using the ouput of the previous level
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Wavelet 1st Level
Wavelet 2nd Level
https://www.section.io/engineering-education/wavelet-transform-analysis-of-images-using-waveletanalyzer-toolbox-in-matlab/
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Wavelet Image Decomposition
Wavelet Image Reconstruction
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
The (intensity) gradient of an image is the vector ∇I(x,y). It is characterized by a magnitude m and a direction φ in the image:
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Another important image transformation is the Laplacian of an image with intensity I(x,y) that is defined by:
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Two main strategies:
These strategies rely on the fact that edges correspond to 0-order discontinuities of the intensity function.
The derivative computation requires a pre-filtering of the images.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
The Sobel filter is a x- and y-sensitive gradient filter by using a convolution operation with two 3×3 kernels.. The x- and y-gradients are merged finally in one image.
use math,imager,plotimg.sobel <- sobelEdges(img,blur=2,gradient=TRUE)print(summary(img.sobel))plot(img.sobel,auto.scale=TRUE)
Sobel edge filter. The gaussian blurring is essential to reduce noise.
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
The canny edge filter is a multi-stage algorithm. After denoising, intensity gradients of the image are computed ofr x- and y-direction, then a non-maximum suppression is applied, finally applying a hysteris threhold filtering.
use math,imager,plotimg.canny <- cannyEdges(img,t1=0,t2=50,blur=4)print(summary(img.canny))plot(img.canny,auto.scale=TRUE)
Canny edge filter. The gaussian blurring is essential to reduce noise. The edge detection thresholds t1 and t2 relate to the intensity gradient and must be set carefully. https://docs.opencv.org/4.x/da/d22/tutorial_py_canny.html
PD Stefan Bosse - AFEML - Module A: Signal Features - Image Features
Convolution is using a kernel matrix to extract certain features from images.
https://towardsdatascience.com/types-of-convolution-kernels-simplified-f040cb307c37
PD Stefan Bosse - AFEML - Module A: Signal Features - Geometric Transformations
Simple geometrical operations of entire image or parts of the image are:
Advanced geometrical operations of entire image:
PD Stefan Bosse - AFEML - Module A: Signal Features - Geometric Distortions
Local geometric distortions caused by optical imaging (lense distortion) https://www.image-engineering.de/library/image-quality/factors/1062-distortion
PD Stefan Bosse - AFEML - Module A: Signal Features - Measurement error and confidence
PD Stefan Bosse - AFEML - Module A: Signal Features - Measurement error and confidence
Random errors affect the accuracy of a measurement (noise).
Noise affects input and target feature computation (ML output)!
If one repeats a measurement of a quantity X which is falsified by pure random errors, the frequency distribution of the measured values is S = {s1, s2,...,sn} by a mean value ¯S given by a Gaussian distribution (the number of measurements N must be large).
PD Stefan Bosse - AFEML - Module A: Signal Features - Measurement error and confidence
9
Frequency distribution according to Gauss of measured values centered around an average value
PD Stefan Bosse - AFEML - Module A: Signal Features - Examples: Statistical Analysis
PD Stefan Bosse - AFEML - Module A: Signal Features - Summary
Data can be classified into:
All sensor variables are subject to measurement errors:
A (statistical) data analysis is often the first step in the ML workflow
PD Stefan Bosse - AFEML - Module A: Signal Features - Summary
The signal feature selection and extraction is the first step to compute and detect target features like damages using data-driven models.