(Artificial) Neural Networks

Artificial Neural Networks are computational models that are inspired by the structure and/or functional aspects of biological neural networks. They're used in all sorts of problem solving capacities, quite commonly in machine learning and data mining.

Properties of Human Neural Network

  • Neuron switching time ≈ .001 second
  • Number of neurons ≈ 1010
  • Connections per neuron ≈ 104−5
  • Scene recognition time ≈ .1 second

100 inference steps (to recognise a scene) doesn’t seem like enough → (hence there must be) much parallel computation


  • Soma = cell body
  • Dendrites = inputs
  • An axon = outputs
  • Synapses = connections between cells

Synapses can be excitatory or inhibitory and may change over time. (excitatory = more likely to fire if another does, inhibitory = less likely to fire if another does).

When the inputs reach some threshhold an action potential (electrical pulse) is sent along the axon to the outputs.

Properties of Artificial Neural Networks

  • Many neuron-like threshold switching nodes
  • Many weighted edges between nodes (inputs and outputs)
  • An activation level (function of the inputs)
  • Highly parallel, distributed process
  • Emphasis on tuning weights automatically

The input function is the weighted sum of the activation levels of inputs.
The activation level is a non-linear transfer function g of this input:

When to use Neural Networks

  • If input is high-dimensional discrete or real-valued (e.g. raw sensor input)
  • Output is discrete or real valued
  • Output is a vector of values
  • Possibly noisy data
  • Form of target function is unknown
  • Human readability of result is unimportant


  • Speech phoneme recognition (NetTalk)
  • Image classification (see face recognition data)
  • Autonomous Driving
  • Game Playing
  • Credit Card Fraud Detection
  • Handwriting Recognition
  • Financial Prediction

Face Recognition Data (ALVINN)

ALVINN = Autonomous Land Vehicle in a Neural Network. Was groundbreaking at the time, although it has now been superseded.
ALVINN drives 70 mp/h on highways.


Later versions they had 29 hidden units (which improved the performance dramatically) and also a sonar as well as a camera.

Managed to drive from coast to coast autonomously in the mid 1990s.


The perceptron is a very simple type of Neural Network.

More About ANNs

Expressive Capabilities

  • Every Boolean function can be represented by a network with single hidden layer, but it might require exponential (in number of inputs) hidden units.
  • Every bounded continuous function can be approximated with arbitrarily small error, by network with one hidden layer. Any function can be approximated to arbitrary accuracy by a network with two hidden layers.

Overfitting of ANN

Once we get into the 1000ths weight updates we see overfitting of data start to occur (recap: overfitting is where the error on the training data is much much smaller than on the actual new data).


Overfitting can be addressed by:

  • Limiting the number of hidden nodes/connections
  • Using a validation set and limiting training time
  • Penalising large weights

Alternative Error Functions


Potential Problems

Despite some nice properties of ANNs, such as

  • generalisation to deal sensibly with unseen input patterns
  • robustness to losing neurons (prediction performance can degrade gracefully)

they still have some problems:

  • Back-propagation does not appear to scale well – large nets may have to be partitioned into separate modules that can be trained independently
  • ANNs are not very transparent – it's hard to understand the representation of what has been learned

Possible solution: exploit success of tree-structured approaches in ML with non-linear regression in ML.