NeuroVision: Neural Network Visualizer

Explore how neural networks learn, step by step. Created by Leonardo Cofone

Visualizzazione 3D di una rete neurale con nodi interconnessi che mostrano schemi di attivazione

What is a Neural Network?

A neural network is a computer system that learns from many labeled examples and tries to think a bit like a human brain. It's made up of many units called neurons, which are connected to each other in layers.

Each neuron receives some numbers, performs a small calculation, and passes the results (a number) to the next neurons. By working together, these neurons can recognize patterns, solve problems and make predictions. Just like we learn from experience, a neural network makes a prediction, then checks how wrong it was and updates its connections accordingly to improve.

What can you do here?

This website shows you exactly how a neural network learns, step by step! Below this introduction, you'll find a legend (make sure to read it carefully as it will help you understand everything better) and a brief explanation of what an activation is. Right after, ther's a small real neural network. You can observe how it trains in real-time and see how it improves at each step. Below the network, there's a panel explaining each training step plus a practical example showing how it works in the real world.

Legend

w (Weight): Controls how much one neuron influences the next one. It’s the strength between neurons.
b (Bias): A value added to help the neuron activate even with weak signal and connections.
z (Total input): The sum of all inputs, including weights and bias: z = wx + b.
f (Activation function): The math inside a neuron that transforms the input into an output. Adds flexibility to the network.
a (Activation): The neuron’s output after its internal calculation and activation function.
L (Layer index): Position of the layer in the network. L = 0 is the input, L = 1 is the first hidden layer, etc.
ŷ (y-hat): The predicted output of the network. Compared to the true value to measure error.
J / ℒ (Loss function): Indicates how far the prediction is from the correct result. Lower is better.
𝛿 (Delta): The neuron’s error used to adjust weights during Backpropagation.
∇ (Gradient): Tells how to change the weights to reduce error. Guides the learning direction.
η (Learning rate): Controls how fast the network learns and how large each weight update is. Higher = faster, lower = safer.
Epoch: One full pass over the entire training dataset.
Batch / Mini-batch: A small subset of training examples used in a single update step.
Forward Propagation: The step where the network processes input data through the layers to produce an output (tries to predict something new).
Backpropagation: The step where the network calculates errors and adjusts weights to minimize the difference between prediction and true value (learns from past mistakes).

Activation Functions

As mentioned before, activation functions are like small rules each neuron uses to decide what number to send next.
They help the network learn complex things by introducing non-linearity, making it smarter than just a straight line.

Sigmoid
Transforms any number to a value between 0 and 1. Good for binary (yes/no) decisions.
ReLU (Rectified Linear Unit)
Returns the number if positive, or zero if negative. Fast and popular.
Step Function
Returns 1 if the input is zero or more, otherwise 0. Very simple but not much used today.
Tanh
Like Sigmoid but ranges from -1 to +1. Sometimes helps learning faster.
Leaky ReLU
Like ReLU but allows a small negative number to pass through to avoid problems.
Linear
Simply passes the number as-is. Used when we want to predict exact numbers.
Softmax
Transforms numbers into probabilities that sum to 1. Used when choosing between multiple classes.

In our example, we use the Sigmoid function because it's simple and good for yes/no decisions, like classifying between two categories.

Index

0 Input Layer
1 Hidden Layer
2 Output Layer
Forward Flow: Phase 1, Active connections during Forward Propagation (activation computation)
Backward Flow: Phase 2, Active connections during Backpropagation (error computation)
Weight/Bias Update: Phase 4, Connections or neurons whose weights or biases are being updated

Welcome to the Neural Network Visualizer!

Hi! I’m your guide to understanding how a neural network learns. Imagine taking a microscopic look inside the "brain" of an AI. To start our journey, click the "Start training" button. This will launch a detailed step-by-step visualization of a full training cycle, covering both how the network makes predictions (forward propagation) and how it learns from its mistakes (backpropagation). You can advance step by step to truly grasp each concept.

Tutorial: How a Simple Neuron Learns to Decide

Imagine a neuron that needs to decide whether to turn on a light (output 1) or keep it off (output 0), based on two inputs:

Input 1 (Outside Light Level): Close to 1 when dark, close to 0 when bright.
Input 2 (Motion Detected): Close to 1 if motion is detected, close to 0 otherwise.

Suppose our network has an input layer, a hidden layer with 3 neurons, and an output layer with 1 neuron. Let's follow how the output neuron makes its decision and learns.

Initial Scenario: It is dark (Input 1 = 0.9) and there is motion (Input 2 = 0.8). The initial weights and biases of the network are set randomly.

Start Example:

Click Next Step to see how this simple network processes information, step by step.

1. Forward Propagation: Input to Hidden Layer

The input values are passed to the neurons in the hidden layer. Each hidden neuron calculates its weighted sum (z) and activation (a) (in this case the Sigmoid activation) based on these inputs and its own weights and bias.

Input 1:

Input 2:

Hidden Neuron 1:

z_H1 = (Input 1 × Weight I1,H1) + (Input 2 × Weight I2,H1) + Bias H1 = ( × ) + ( × ) + () =

a_H1 = Sigmoid(z_H1) =

Hidden Neuron 2:

z_H2 = (Input 1 × Weight I1,H2) + (Input 2 × Weight I2,H2) + Bias H2 = ( × ) + ( × ) + () =

a_H2 = Sigmoid(z_H2) =

Hidden Neuron 3:

z_H3 = (Input 1 × Weight I1,H3) + (Input 2 × Weight I2,H3) + Bias H3 = ( × ) + ( × ) + () =

a_H3 = Sigmoid(z_H3) =

2. Forward Propagation: Hidden to Output Layer

The activations from the hidden layer (a_H1, a_H2, a_H3) now become the inputs for the output neuron. The output neuron calculates its weighted sum (z) and final activation (a).

a_H1:

a_H2:

a_H3:

Output Neuron:

z_Out = (a_H1 × Weight H1,Out) + (a_H2 × Weight H2,Out) + (a_H3 × Weight H3,Out) + Bias Out

z_Out = ( × ) + ( × ) + ( × ) + () =

a_Out = Sigmoid(z_Out) =

3. Loss Calculation (Error)

We compare the network’s final output (a_Out) with the desired target value. The Mean Squared Error (MSE) loss function quantifies how "wrong" the prediction is.

Formula: Error = (Target - Activation) squared

Calculation: Error = ( - ) squared =

4. Backpropagation: Calculating Output Layer Error Gradient (Delta)

To correct the network, we calculate the error gradient (Delta) for the output neuron. This tells us how much the error changes with respect to the net input (z) of the output neuron, considering the derivative of its activation function.

Formula: Delta_Out = (Target - a_Out) × f prime(z_Out), where f prime is the sigmoid derivative

Since f prime(z) = a × (1 - a), we get:

Delta_Out = ( - ) × ( × (1 - )) =

5. Backpropagation: Calculating Hidden Layer Error Gradients (Delta)

Now, we propagate the error backward to the hidden layer. The error gradient (Delta) of each hidden neuron depends on its contribution to the output error, weighted by its connections to the output neuron, and the derivative of its own activation function.

Formula: Delta_Hj = (sum over k of Delta_k × W_Hj,k) × f prime(z_Hj)

Hidden Neuron 1:

Delta_H1 = (Delta_Out × Weight H1 to Out) × f prime(z_H1) = ( × ) × ( × (1 - )) =

Hidden Neuron 2:

Delta_H2 = (Delta_Out × Weight H2 to Out) × f prime(z_H2) = ( × ) × ( × (1 - )) =

Hidden Neuron 3:

Delta_H3 = (Delta_Out × Weight H3 to Out) × f prime(z_H3) = ( × ) × ( × (1 - )) =

6. Backpropagation: Calculating Gradients for Weights and Biases

Now we calculate specific gradients for each weight and bias. These indicate the direction and magnitude of adjustment needed to reduce the error.

Weight gradient formula: partial derivative of Loss over w_ij = a_i × Delta_j

Bias gradient formula: partial derivative of Loss over b_j = Delta_j

Output Layer Gradients:

Gradient w_H1,Out = a_H1 × Delta_Out = × =

Gradient w_H2,Out = a_H2 × Delta_Out = × =

Gradient w_H3,Out = a_H3 × Delta_Out = × =

Gradient b_Out = Delta_Out =

Hidden Layer Gradients:

Gradient w_I1,H1 = Input 1 × Delta_H1 = × =

Gradient w_I2,H1 = Input 2 × Delta_H1 = × =

Gradient b_H1 = Delta_H1 =

(Similar calculations for Hidden Neurons 2 and 3)

7. Updating Weights and Biases

Finally, we update the network’s weights and biases using the calculated gradients and the learning rate (eta). This is the core of learning, where the network adjusts to make better predictions.

Learning Rate (eta): 0.1

Formula: New Parameter = Old Parameter - eta × Gradient

Output Layer Updates:

w_H1,Out,new = - × =

w_H2,Out,new = - × =

w_H3,Out,new = - × =

b_Out,new = - × =

Hidden Layer Updates (Example for Hidden Neuron 1):

w_I1,H1,new = - × =

w_I2,H1,new = - × =

b_H1,new = - × =

(Similar updates for Hidden Neurons 2 and 3)

Epoch Conclusion:

This process of forward pass, loss calculation, and backpropagation forms the core of how neural networks learn from examples. By repeating this many times with different examples, the network learns to make more accurate predictions.

The new values for this example (after this epoch) are:

Output Neuron Bias:
Weight H1 to Output:
Weight H2 to Output:
Weight H3 to Output:
Hidden Neuron 1 Bias:
Input 1 to H1 Weight:
Input 2 to H1 Weight:

Click "Reset Network" to start again with new random weights, or "Start training" to see another epoch with the updated weights!