Explore how neural networks learn, step by step. Created by Leonardo Cofone
A neural network is a computer system that learns from many labeled examples and tries to think a bit like a human brain. It's made up of many units called neurons, which are connected to each other in layers.
Each neuron receives some numbers, performs a small calculation, and passes the results (a number) to the next neurons. By working together, these neurons can recognize patterns, solve problems and make predictions. Just like we learn from experience, a neural network makes a prediction, then checks how wrong it was and updates its connections accordingly to improve.
This website shows you exactly how a neural network learns, step by step! Below this introduction, you'll find a legend (make sure to read it carefully as it will help you understand everything better) and a brief explanation of what an activation is. Right after, ther's a small real neural network. You can observe how it trains in real-time and see how it improves at each step. Below the network, there's a panel explaining each training step plus a practical example showing how it works in the real world.
z = wx + b
.
As mentioned before, activation functions are like small rules each neuron uses to decide what number to send next.
They help the network learn complex things by introducing non-linearity, making it smarter than just a straight line.
In our example, we use the Sigmoid function because it's simple and good for yes/no decisions, like classifying between two categories.
Hi! I’m your guide to understanding how a neural network learns. Imagine taking a microscopic look inside the "brain" of an AI. To start our journey, click the "Start training" button. This will launch a detailed step-by-step visualization of a full training cycle, covering both how the network makes predictions (forward propagation) and how it learns from its mistakes (backpropagation). You can advance step by step to truly grasp each concept.
Imagine a neuron that needs to decide whether to turn on a light (output 1) or keep it off (output 0), based on two inputs:
Suppose our network has an input layer, a hidden layer with 3 neurons, and an output layer with 1 neuron. Let's follow how the output neuron makes its decision and learns.
Initial Scenario: It is dark (Input 1 = 0.9) and there is motion (Input 2 = 0.8). The initial weights and biases of the network are set randomly.
Click Next Step to see how this simple network processes information, step by step.
The input values are passed to the neurons in the hidden layer. Each hidden neuron calculates its weighted sum (z) and activation (a) (in this case the Sigmoid activation) based on these inputs and its own weights and bias.
Input 1:
Input 2:
Hidden Neuron 1:
z_H1 = (Input 1 × Weight I1,H1) + (Input 2 × Weight I2,H1) + Bias H1 = ( × ) + ( × ) + () =
a_H1 = Sigmoid(z_H1) =
Hidden Neuron 2:
z_H2 = (Input 1 × Weight I1,H2) + (Input 2 × Weight I2,H2) + Bias H2 = ( × ) + ( × ) + () =
a_H2 = Sigmoid(z_H2) =
Hidden Neuron 3:
z_H3 = (Input 1 × Weight I1,H3) + (Input 2 × Weight I2,H3) + Bias H3 = ( × ) + ( × ) + () =
a_H3 = Sigmoid(z_H3) =
The activations from the hidden layer (a_H1, a_H2, a_H3) now become the inputs for the output neuron. The output neuron calculates its weighted sum (z) and final activation (a).
a_H1:
a_H2:
a_H3:
Output Neuron:
z_Out = (a_H1 × Weight H1,Out) + (a_H2 × Weight H2,Out) + (a_H3 × Weight H3,Out) + Bias Out
z_Out = ( × ) + ( × ) + ( × ) + () =
a_Out = Sigmoid(z_Out) =
We compare the network’s final output (a_Out) with the desired target value. The Mean Squared Error (MSE) loss function quantifies how "wrong" the prediction is.
Formula: Error = (Target - Activation) squared
Calculation: Error = ( - ) squared =
To correct the network, we calculate the error gradient (Delta) for the output neuron. This tells us how much the error changes with respect to the net input (z) of the output neuron, considering the derivative of its activation function.
Formula: Delta_Out = (Target - a_Out) × f prime(z_Out), where f prime is the sigmoid derivative
Since f prime(z) = a × (1 - a), we get:
Delta_Out = ( - ) × ( × (1 - )) =
Now, we propagate the error backward to the hidden layer. The error gradient (Delta) of each hidden neuron depends on its contribution to the output error, weighted by its connections to the output neuron, and the derivative of its own activation function.
Formula: Delta_Hj = (sum over k of Delta_k × W_Hj,k) × f prime(z_Hj)
Hidden Neuron 1:
Delta_H1 = (Delta_Out × Weight H1 to Out) × f prime(z_H1) = ( × ) × ( × (1 - )) =
Hidden Neuron 2:
Delta_H2 = (Delta_Out × Weight H2 to Out) × f prime(z_H2) = ( × ) × ( × (1 - )) =
Hidden Neuron 3:
Delta_H3 = (Delta_Out × Weight H3 to Out) × f prime(z_H3) = ( × ) × ( × (1 - )) =
Now we calculate specific gradients for each weight and bias. These indicate the direction and magnitude of adjustment needed to reduce the error.
Weight gradient formula: partial derivative of Loss over w_ij = a_i × Delta_j
Bias gradient formula: partial derivative of Loss over b_j = Delta_j
Output Layer Gradients:
Gradient w_H1,Out = a_H1 × Delta_Out = × =
Gradient w_H2,Out = a_H2 × Delta_Out = × =
Gradient w_H3,Out = a_H3 × Delta_Out = × =
Gradient b_Out = Delta_Out =
Hidden Layer Gradients:
Gradient w_I1,H1 = Input 1 × Delta_H1 = × =
Gradient w_I2,H1 = Input 2 × Delta_H1 = × =
Gradient b_H1 = Delta_H1 =
(Similar calculations for Hidden Neurons 2 and 3)
Finally, we update the network’s weights and biases using the calculated gradients and the learning rate (eta). This is the core of learning, where the network adjusts to make better predictions.
Learning Rate (eta): 0.1
Formula: New Parameter = Old Parameter - eta × Gradient
Output Layer Updates:
w_H1,Out,new = - × =
w_H2,Out,new = - × =
w_H3,Out,new = - × =
b_Out,new = - × =
Hidden Layer Updates (Example for Hidden Neuron 1):
w_I1,H1,new = - × =
w_I2,H1,new = - × =
b_H1,new = - × =
(Similar updates for Hidden Neurons 2 and 3)
This process of forward pass, loss calculation, and backpropagation forms the core of how neural networks learn from examples. By repeating this many times with different examples, the network learns to make more accurate predictions.
The new values for this example (after this epoch) are:
Click "Reset Network" to start again with new random weights, or "Start training" to see another epoch with the updated weights!