nn_transformer.jpg Source: https://arxiv.org/pdf/1706.03762 Figure 1: The Transformer - model architecture. This image is a flowchart diagram illustrating the architecture of a neural network model called "The Transformer." The data flows generally from the bottom of the page to the top, moving through two main vertical processing blocks side-by-side. At the very bottom left, there is text labeled "Inputs." An arrow points upward into a light pink rectangular box labeled "Input Embedding." From this box, an arrow continues up to a circle containing a plus sign (+). To the left of this circle is a squiggly line icon connected to it, labeled "Positional Encoding." From this addition point, the flow enters a large, light grey rounded rectangle. On the outside left of this grey block is the text "Nx," indicating that the components inside are repeated N times. Inside this block, the path goes through several layers: 1. First, an orange rectangular box labeled "Multi-Head Attention." There are curved arrows looping from the output back to the input of this box (a residual connection). 2. Above that is a yellow rectangular box labeled "Add & Norm," which also has a curved arrow looping its output back to its input. 3. Next is a light blue rectangular box labeled "Feed Forward." 4. Finally, at the top of this grey block, there is another yellow rectangular box labeled "Add & Norm" with a residual loop. An arrow exits the top of this left-hand grey block and curves to the right, entering the second main column. The second main column (the decoder side) starts at the bottom right with text labeled "Outputs (shifted right)." An arrow points up into a light pink rectangular box labeled "Output Embedding." From there, an arrow goes up to a circle with a plus sign (+). To the right of this circle is a squiggly line icon connected to it, labeled "Positional Encoding." From this addition point, the flow enters another large, light grey rounded rectangle. On the outside right of this block is the text "Nx," indicating repetition. Inside this block, from bottom to top: 1. An orange rectangular box labeled "Masked Multi-Head Attention" with residual loops. 2. A yellow rectangular box labeled "Add & Norm" with a residual loop. 3. An orange rectangular box labeled "Multi-Head Attention." This layer receives the arrow coming from the left-hand block (the encoder output) in addition to its own internal flow. It also has residual loops. 4. A yellow rectangular box labeled "Add & Norm" with a residual loop. 5. A light blue rectangular box labeled "Feed Forward." 6. A final yellow rectangular box at the top of this block labeled "Add & Norm" with a residual loop. From the top of this right-hand grey block, an arrow points upward to a light purple/blue rectangular box labeled "Linear." From there, an arrow points up to a green rectangular box labeled "Softmax." Finally, an arrow points from the Softmax box to the text at the very top: "Output Probabilities." This description was generated automatically. Please feel free to ask questions if you have further questions about the nature of the image or its meaning within the presentation.