An activation function is used to determine whether an input in a feedforward neural network is moved from one layer to the next. The function is meant to model a biological neural system where neurons may or may not fire and pass information on to the next neuron, depending on the input.
As an input is passed through the layers of neural network, each node (neuron) transforms the input by summing the input and weight (dot product) and then adding some bias. This can be thought of as the linear combination of inputs and weights from the previous layer. The activation function then evaluates the result to determine whether or not it should fire the node. It is important that activation functions introduce non-linearity into the system (so that it can model non-linear outputs) and for the functions to have differentiable output (to allow for back-propagation).
A simple activation function would be a step function, that has an output of (activated) if the , and otherwise has an output of (not activated). This kind of binary activation is suitable for some classification problems, but is insufficient for others.
The most common type of activation function is rectified linear unit (ReLU), though many others exist. Different layers in a neural network may have different activation functions. In particular, the final layer often needs to produce a definite outcome, especially in classification problems. The softmax function is one example of a function useful in the final layer.