# STEFAN BOSSE<sup>1,2\*</sup>

- <sup>1</sup> Institute of Computer Science Researchgroup Practical Computer Science University of Koblenz
- <sup>2</sup> Department of Mechanical Engineering Lehrstuhl für Materialkunde und Werkstoffprüfung University of Siegen











DESIGN OF ANALOG COMPUTERS: BUILDING BLOCKS AND ADVANCED METHODS FOR TINY ANALOG MACHINE LEARNING WITH SURROGATE MODELING

Stefan Bosse

Sysint 2025 Conference



#### CONTENT



#### Introduction

Motivation: In-sensor computation and organic electronics



03

#### Analog Computers

Electronic circuits for numerical computations - from history to challenges and lmits

#### Analog ANN (AANN)

Analog Artificial Neural Networks -Design and Training Methodologies



#### **Surrogate Models**

Using Electronic Simulation and Surrogate ML Models for Training and Test of AANN



Discussion

Workflow - Experiments - Examples



**Conclusions and Outlook** Issues and pitfalls Lessons learned



### INTRODCUTION

- (Distributed) Sensor networks are deployed in a wide range of applications and environments. A sensor network is a distributed system given by a connected graph of communicating sensor nodes.
   Each sensor node measures physical properties of its local environment.
- Sensor density increased exponentially and sensors are integrated systems ("Smart Sensors").



FlexSmell Project with silicon electronics on flexible substrates



Assumption: Data in sensor networks is inherently distributed and must be processed locally on sensor node level: In-sensor Computation.



## INTRODCUTION



(Left) MEMS for Distributed Wireless Sensor Networks, Warnecke et al. (Right) Smart Dust - Hitachi

- Digital computing based on the binary number system is the standard for any numerical computation since about 60 years. Digital computers are capable to perform highly complex numerical computations, with only a small set of instructions using high-level programming languages and compilers.
- With ongoing <u>miniaturization</u>, computation is integrated in sensors and devices towards material-integrated sensor networks (in-sensor computation). But the miniaturization towards the 1 mm<sup>3</sup> scale reduces computational power and memory capacity significantly.



Hypothesis: Analog (electronic) circuits can perform numerical computations such as Artificial Neural Networks with lower ressources and electrical energy than digital circuits (and eventually faster).



#### WHY ANALOG PROCESSING?





### ANALOG COMPUTERS: THE HIGH-PERFORMANCE CLASSICS...

- Digital: Discrete Value Distribution
- Analog: Continuous Value Distribution (t,s)
- Basic Model: Ideal Operational Amplifier
- Functional Composition:
   Wire Interconnect
- Parameters: Variable Resistors
- Technology: Silicon Electronics





Analog Computer EAI8800 (1986) with Hardware-in-the-loop!



### ANALOG COMPUTERS: THE ANCIENT!

- Digital: Discrete Value Distribution
- Analog: Continuous Value Distribution (t,s)
- Basic Model: Ideal Operational Amplifier
- Functional Composition: Wire Interconnect
- Parameters: Variable Resistors
- Technology: Silicon Electronics

university of koblenz

**Computer Science** 



Analog Computer EAI8800 (1993) with rust!

#### ANALOG COMPUTERS: PROGRAMMABLE CHIPS - NEW AGE?

- Mixed-Analog-Digital: Semi- Continuous Value Distribution (t',s')
- Basic Model: Mixed A/D, OPAMP, Transistors, AD/DA Conversion
- Functional Composition: Switched Matrix
- Parameters: Digital Resistors, Switched Capacitors
- Technology: Silicon Electronics



Field Programmable Analog Array Computer FPAA (2003-2024) on a chip! [Hasler et al..Okika dev.]



#### ANALOG COMPUTATIONAL SYSTEMS: TAXONOMY





#### ANALOG COMPUTATIONAL SYSTEMS: TAXONOMY





#### ANALOG COMPUTATIONAL SYSTEMS: TAXONOMY





- The Operational Amplifier (OPAMP) is the basic cell of any accurate analog computational system!.
- An OPAMP is a difference amplifier with an inverting and a non-inverting input *i*<sub>+</sub> and *i*<sub>-</sub>, respectively. There is one output *o*.
- An OPAMP can implement a(ny) (perhaps time-dependent) function y=f(x) by adding up to 7 functional blocks (e.e., resistive) defining the transfer function of the entire circuit:





- The mathematical model == ideal OPAMP features an infinite (or realistic very large) openloop gain (G<sub>0</sub>), no offset/bias or any other technical deviation, infinite common-mode rejection ratio (CMRR), zero noise, and **infinite output voltage range**.
- It is a difference amplifier, i.e.:





- Real transistor circuits show significant deviation from the ideal mathematical model!
- The (static) error *E* is introduced by a function  $\Gamma$ , which depends on a large set of parameters:





- Real transistor circuits show significant deviation from the ideal mathematical model!
- The (static) error *E* is introduced by a function  $\Gamma$ , which depends on a large set of parameters.
- We need accuracte simulation to design analog circuits from given computational models!



- Real transistor circuits show significant deviation from the ideal mathematical model!
- A real OPAMP requires about 10-20 transistors, a lower transistor count increases the error:
  - Output offset  $V_{out} \neq 0$ , although  $\Delta V = 0$
  - Drift
  - Limited Gain (Amplification) G
  - Non-linearity
  - Temperature dependency (including spatial gradients)
  - Asymmetric input and output transfer functions
  - Limited output range V<sub>out</sub> (Saturation, Clipping)





### ANALOG COMPUTATIONAL SYSTEMS: ANALOG ANN

- We implement an analog artificial neuron (AAN) with OPAMP architecture.
- The neuron is composed of three simple OPAMP blocks and a sigmoid block with a bipolar path architecture [different paths for negative and positive parameters]:





### ANALOG COMPUTATIONAL SYSTEMS: ANALOG ANN

- We implement an analog artificial neuron (AAN) with a simple transistor circuit.
- The neuron is composed of three simple OPAMP blocks and a sigmoid block with a bipolar path architecture using 12 bipolar transistors and 27 resistors (excluding weights):







- Starting point: A standard mathemarical/numerical model of a neuron is used consisting of two functions: Linear sum-of-products (SOP) and non-linear activation function (A).
- Weights are considered as amplification factors for the OPAMP blocks.
- Semi-realistic limitations by model clipping: Limited output of SOP blocks (e.g., ±10), limited amplification (e.g. w<sub>max</sub>=10); input and output scaling; modified sigmoid function
- Classical training algorithms based on **analytical gradients** can be used: sgd/adam/adagrad.



- Next step: Training with analog model function f(x,w): x→y (digital twin) of the electronic circuit containing SOP and A functions! SOP and A can be non-linear!
- Bipolar architecture: Separated paths for negative and positive weight/bias parameters trained simultaneously.
- Realistic limitations by **analog model** through:
  - 1. Real Measurements
  - 2. Electronic Simulation

- 3. Surrogate Model (SM) derived from data (Real/Simu), e.g., FC-ANN (tanh activation functions)
- Training: gradient descent (gd) with <u>numerically computed gradient</u> of df(x,w)/dw<sub>i</sub>

$$w_{i} = w_{i} - \alpha e(f, x, y) \frac{\Delta f(x)}{\Delta w_{i}}, w_{i} = w_{i} - \alpha \sum_{j=1}^{batchsize} e(f, x_{j}, y_{j}) \frac{\Delta f(x_{j})}{\Delta w_{i}}$$
w: weight parameter, e: backpropagated error, a: update rate
w: weight parameter, e: backpropagated error, a: update rate
w: weight parameter, e: backpropagated error, a: update rate

Next step: Pre-Conditioning of parameter space using <u>Simulated Annealing</u> before error gradient backpropagation is performed!

```
function SA(data, params, num_steps=1000, noise=0.01, cooling=0.999) {
    initial_params=optimal_params=best_params=params; temp=1.0
    new_loss = best_loss = loss(data,params)
    for(i=1,num_steps) {
        temp=temp*cooling
        new_params = params.map(p => p+gaussianRandom(0, noise))
        new_loss = loss(data,new_params)
        if (new_loss < best_loss || random()*temp > best_loss/new_loss) {
            params = new_params
            if (new_loss < best_loss) { optimal_params = params; best_loss = new_loss }
        }
    }
}</pre>
```







## TRAINING OF ANALOG ANN: SURROGATE MODEL APPROACH

- The Analog Electronic Model (AM) is one monolithic parameterized function f(x,w): x → y describing input to output relation used for forward and backward computations of the AANN.
- The mapping  $\mathbf{x} \rightarrow \mathbf{y}$  can be derived by:
  - 1. Measurement in real parametrizbale electronic circuits (hardware-in-the-loop): Slow! Too slow;
  - 2. By electronic simulation: Not so slow but still too slow;
  - **3.** By a data-driven surrogate model SM/S-Model (e.g., a neural network, too): Faster, let's try it!
  - 4. By an analytical function: Fastest, but impossible (or too simplified).







Monolithic f(SOP,A,w,b)



university of koblenz

### TRANSFER FUNCTION S-MODEL VS. CIRCUIT

• S-Model analysis within interpolation range: Promising results ... we are done?





#### S-Model

• Low prediction error compared with circuit data, but error oscillates in the steep region



### **TRANSFER FUNCTION S-MODEL VS. CIRCUIT**

• What is wrong here? The reality gap of data-driven models ... never extrapolate



Circuit

- Output voltage versa input current (ix1=fixed)
- Monotonic behavior

values outside training range

- data outside the training space in the right branch
- Left branch seems still • valid, really?



## **TRANSFER FUNCTION S-MODEL VS. CIRCUIT**

What is wrong here? The reality gap of data-driven models ... the details!



kobler

### EXPERIMENTS AND RESULTS

- Bipolar clipped mathematical model (baseline model), Gradient Desc. (+ Simulated Annealing)

| x1 | x2 | У |
|----|----|---|
| 0  | 0  | 0 |
| 0  | 1  | 1 |
| 1  | 0  | 1 |
| 1  | 1  | 1 |
|    |    |   |

Logic OR Gate

- [2,1] neurons
- Convergence: < 20 Epochs, 100% success rate
- Class. Error: 0%
- Bipolar separated parameters: yes
- Linear problem



#### Logic EXOR Gate

- [2,1] neurons
- Convergence: < 50 Epochs, > 70% success rate
- Class. Error: 0%
- Bipolar separated parameters: yes
- Non-linear problem

| w     | l     | pw   | pl  | Cls |
|-------|-------|------|-----|-----|
| 3     | 3     | 2    | 1.5 | Set |
| 4     | 2     | 1    | 2.4 | Ver |
| 3     | 4     | 2    | 2   | Sac |
| 2     | 3     | 1    | 2   | Ver |
| RIS B | enchr | nark |     |     |

- [3,3] neurons
- Convergence: < 200 Epochs,</li>
   > 90% success rate
- Class. Error: 2-4%
- Bipolar separated parameters: yes
- Linear and non-linear problem
- SA-only training: success
- 1  $\mu$ s forward, 2  $\mu$ s backward



#### EXPERIMENTS AND RESULTS



Bipolar analog el. surrogate model (from simulated data), Gradient Desc.+ Simulated Annealing

| x1 | x2 | у |  |
|----|----|---|--|
| 0  | 0  | 0 |  |
| 0  | 1  | 1 |  |
| 1  | 0  | 1 |  |
| 1  | 1  | 1 |  |
|    |    |   |  |

Logic OR Gate

- [2,1] neurons
- Convergence: < 50 Epochs,</li>
   > 90% success rate
- Class. Error: 0%
- Linear problem



#### Logic EXOR Gate

- [2,1] neurons
- Convergence: < 100 Epochs,</li>
   > 50% success rate
- Class. Error: 0%
- Non-linear problem

| w     | l    | pw   | pl  | Cls |
|-------|------|------|-----|-----|
| 3     | 3    | 2    | 1.5 | Set |
| 4     | 2    | 1    | 2.4 | Ver |
| 3     | 4    | 2    | 2   | Sac |
| 2     | 3    | 1    | 2   | Ver |
| RIS B | ench | mark |     |     |

- [3,3] neurons
- Convergence: > 30 < 500 Epochs, 20% success rate (w/o SA: < 10%)</li>
- Class. Error: 2-4%
- Bipolar separated parameters: no
- Linear and non-linear problem
- SA-only training: failed
- 100 μs forward, 800 μs backward



#### CONCLUSIONS

| Analog ANN                                                                                                                                               | Surrogate Model                                                                      | ML Training and Results                                                                    |
|----------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
| <ul> <li>Bipolar Difference OP Amplifier<br/>Architecture</li> </ul>                                                                                     | <ul> <li>Derived from electronic simulation<br/>of OPAMP circuit</li> </ul>          | <ul> <li>Training with Error Gradient<br/>Descent Backpropagation</li> </ul>               |
| <ul> <li>Transistor circuit: 12 Transistors</li> <li>SOP with non-linearity, Offset</li> </ul>                                                           | <ul> <li>Current Controlled Voltage<br/>Source Model. Input: Positive and</li> </ul> | <ul> <li>Parameter space initializiation with<br/>random process</li> </ul>                |
| <ul> <li>SOP with limited weights (&lt;40)</li> <li>Training: Indirect with Bipolar</li> </ul>                                                           | negative SOP currents, Output:<br>Voltage of Sigmoid approximation<br>circuit        | <ul> <li>Parameter space pre-conditioning<br/>with Simmulated Annealing</li> </ul>         |
| <ul> <li>Training. Indirect with bipolar<br/>clipped numerical model with post-<br/>synthesis or direct analog<br/>electronic surrogate model</li> </ul> | <ul> <li>Low approx. error, but deviation<br/>at the boundaries</li> </ul>           | <ul> <li>Low convergence, trainining<br/>instability, but low class. error</li> </ul>      |
|                                                                                                                                                          | <ul> <li>High error beyond trained<br/>parameter space</li> </ul>                    | <ul> <li>Iterative monitored training process<br/>with fallback on low progress</li> </ul> |



# THANK YOU

Stefan Bosse sbosse@uni-koblenz.de www.edu-9.de



