# JET5

## Neural Network

An artificial neural network (ANN) is a software implementation of multiple neurons and synapses. ANNs can self-learn from training data consisting of inputs and outputs, and once trained can respond to previously unseen data— producing creepily accurate outputs.

###### ANN "squashing" function (0..1)
$$a = {1 \over 1 + e ( - sharpness \cdot a ) }$$
###### ANN output error function
$$E = {1 \over 2} \sum_{k \in K} (a_k - t_k)^2$$

Mathematically it has been proved that ANNs can reproduce any function mapping inputs to outputs no matter how complex. They process new data in a very organic way and can perform pattern analyses that are difficult to achieve with other statistical techniques. Neural networks are central to some mathematically challenging problems including predicting Stock Market movements, weather forecasting, and controlling self-driving cars.

### XOR

#### "One or the other, but not neither nor both"

A classic machine-learning challenge that requires a network with hidden neurons to solve. The training set format is 2 inputs and 1 output, all binary, with the following requirements (inputs = output):

0,0=0;
1,1=0;
1,0=1;
0,1=1;

click for xor info

Here's a live working example. The network has 2 inputs a 2-neuron hidden layer and a single output (9 synapses). Bias neurons are green. The derivative of the "squashing" function implements Fahlman's Solution to speed up training. Training settings: learning rate = 0.8, momentum = 0.9, sharpness = 1, and error target = 0.0001;

INPUT(S)
You need a modern browser to see this.
( Chrome, Firefox, Edge, Safari or IE>8 )
OUTPUT(S)
Epochs: 0    Output error:

### Wow!

#### So what happened when I clicked "Train Me"?

All network synapses were initialised with a small random weighting: the network starts off officially "stupid" and only able to produce random results. Then training begins.

The four training patterns are applied to the inputs in turn and the network is "fired": simulated action potentials propagate along the weighted synapses and neurons fire in response. Finally the output neuron(s) fire with an intensity which reflects the behaviour of the entire network.

Each output is then compared to the expected output: there will be errors (it starts off randomised) and the size and direction of these errors are noted. Each output error is now "back-propagated" through the network and a slight adjustment to each synapse weight is made so that in future the error from this pattern of inputs will be reduced.

Then the next training pattern is applied and the same process followed. Once all training patterns have been applied, the order of the training sets is randomised and the training sets are resubmitted one at a time. After repeated epochs (an epoch is a single pass through all the training sets) the mean output error will fall to zero (or below a predefined threshold). When this is achieved then the network of synaptic weights has encoded within it all the information required to correctly map inputs to outputs. The network has been successfully trained and the synapse-weights can be saved for future use.

### It failed to train...

Sometimes the network will fail to train within the allotted time. Why is this? Backprop works by a process of error-descent, starting at a random point in a multi-dimensional error-space that is defined by the training data and creeping "downhill" until the error is minimised. This error-space may contain "saddles" (points with no or very low error-gradient) or "local minima" (points where the error is locally relatively low but that are not the lowest error that can be achieved) and the algorithm can get stuck in either of these situations. The use of a momentum term (where movement within the error-space literally has a momentum) is designed to help the algorithm escape these traps, but it does not always succeeed and if momentum is too high the algorithm can end up bouncing around in the error-space (a rhinoceros finds it more difficult to make tight twists and turns than a mouse but is less likely to get stuck in a small hole— which is best?).

### Discussion

In this example the training set and network are small, but XOR is non-linear (no single input has a direct relationship with the output) and therefore logically challenging. Yet training is typically very fast indeed - blink and you'll miss it.

Real world training, on say clinical data, takes considerably longer because the problem presented is a large, complex, dataset containing multiple sources of error and logical inconsistencies. Training ANNs can be frustrating and test your patience - but the potential rewards are great.

### Topology

Jet5's ANN is a standard fully-connected feed-forward network using backpropagation 1 for training (with some refinements to speed things up). Although other ANN types may learn more quickly, backprop ANNs have been the workhorse of neural learning since the '80s and have perfectly acceptable learning speeds - also most of my hands-on experience has been with backprop ANNs.
1: Rumelhart et al., 1986

### Data pre-processing

Most training data has to be pre-processed for two reasons: first to scale the data to meet the range of inputs and outputs the ANN expects; and secondly to present the data in a way which takes into account the likely impact an input will have on an output.

Pre-processing includes data-cleansing and feature-engineering. It is essential to have domain knowledge to set this up. The process includes: duplicate-removal, balancing, stratification, normalisation, standardisation, scaling, pivoting, binning, data-replacement (cutting, splitting, or merging), attribute-weighting, and the use of statistical functions to replace missing data.

Before you then throw your data at an ANN you must think feature-selction, validation, testing of features.

For example: if an input is haemoglobin concentration and one of the outputs is morbidity, then it would be expected that deviations either below or above the normal range might cause morbidity by different mechanisms. So best to present haemoglobin concentration as two separate inputs: an "anaemia" input scaled from 0 to lower normal range, and a "polycythaemia" input ranging from the upper normal range to an estimated highest possible value. Effort spent at this stage will speed up training and increase the likelihood of producing a reliable, sensitive, generalising solution.

### Bias

The ANN has "bias" neurons added (green in the network above). There is one for every layer (apart from the output layer). These neurons have no input synapses, have a fixed value of 1, and are fully forward connected to all the neurons in the next layer. They can be thought of as "normalising" inputs for the hidden layers by shifting the activation function to the left or right, thereby speeding up learning and improving the network's fit.

### Pruning

During training, neurons fed by synapses which ALL tend to vanishingly low weightings are pruned: so the topology will auto-simplify to the fewest active synapses required to produce a solution. The resulting network will be smaller and should generalise better. Pruning algorithms are not implemented in the above example.

### Avoiding over-training

By design, the Jet5 training process avoids the all-too-common mistake of over-training the ANN on the training data set by training to the lowest possible output error. This results in an ANN which performs very well with the training data because it has started to learn individual input sets (over fitting); but performs poorly with new and unseen data — it has been over-trained.

So the ANN is trained on only part of the training data, and after every epoch the network is tested on a sub-set of the training data which is not used for training. The ANN is therefore trained to maximise performance on unseen data... ie how well does it generalise. Training is considered complete when the error in the test data reaches a minimum: although further training will reduce the output errors from training data, output errors from test data may well start to rise.

### What is the point of all this?

Jet5 plans to introduce a range of trained neural networks on this website which can be freely used. Anaesthesia applications of ANNs to date have been limited, with interest in depth-of-anaesthesia monitoring and ITU mortality prediction. None of these applications have entered mainstream medical practice. However there are a number of areas in anaesthesia which would benefit from the strengths of a Neural Computing approach and I will be seeking co-workers to develop these.

There is a current media awareness of ANNs with Google's Deepmind project, which aims to predict AKI by applying Neural Learning techniques to a clinical dataset containing medical records and lab results from 1.6 million UK NHS patients. We don't have the luxury of access to this size of dataset, but as practicing clinicians we have the ability to develop small, focused, well-designed neural applications and that is the plan here.

If you are a QEUH trainee, and interested in participating in our first big ANN project, see Dr Graeme Hilditch or myself. We have identified a suitable large clinical dataset (which includes output data) and have a surgeon who is very interested to see where this project might lead. Advanced Neural Computing comes to the QEUH!

Alan Hope for Jet5, Nov 2017