Understanding How AI Generates Language

A Reproduction Notebook on “Attention Is All You Need”

Author: Koichi Kamachi, CPA
Date: October 2025
Framework: TensorFlow 2.x / Google Colab

Introduction

How does AI generate language?
For many people, this question still lies outside intuitive understanding.
Since the arrival of generative AI, we have been astonished by its fluency, and yet, somewhere inside, many wonder:

“Isn’t it just searching a massive database and stitching pieces together?”

I have been using ChatGPT since its earliest public release, not casually but daily: for research, for thinking, and for structuring my own ideas. As I began to speak about AI in public seminars, I noticed something striking: the longer someone had lived in the world of computers, the harder it was for them to grasp generation. They would say, “It’s still just search, only bigger.”

But AI is not search. What I use every day belongs to an entirely different realm, the realm of generation.

The Origin: Attention Is All You Need

In 2017, a paper titled “Attention Is All You Need” changed the course of AI.
It became the foundation of what we now call Transformer architecture, and later, the core of models like ChatGPT.
In this single paper lies the seed of how machines “create” language.

Still, what does that mean, exactly?
Many books say, “AI predicts the next word.”
But what kind of prediction is that?
How does it learn to produce meaning rather than retrieve it?

When I first tried to understand this, I was lost in a maze of mathematics.
After weeks of trial and error, I arrived at my own way of seeing the structure —
and eventually drew the following diagram to visualize what’s happening inside.

🟩 Diagram: Learning Phase of the Transformer

(© Koichi Kamachi | Bookkeeping Whisperer)

This figure shows how a Transformer processes tokens, generates the Q/K/V matrices, applies attention weights, and produces new series vectors through feed-forward layers. It reflects my attempt to understand how language emerges within the model, not through memorization but through a dynamic process of weighted transformation.

In this phase, every token interacts with every other token.
The model computes relationships, refines them through layers,
and gradually learns to represent meaning across the entire sequence.

From Reading to Building

For a long time, I thought I understood the concept. But something still felt incomplete, perhaps because I had not built it myself.

Learning is not only about reading. It is also about writing, in this case writing the code. So I decided to reconstruct the Transformer architecture from scratch.

After two months of implementation, I succeeded in running a minimal working model, a small but complete reproduction of Attention Is All You Need using TensorFlow 2.x. In doing so, I rediscovered something familiar: the same TensorFlow library I once used to develop a hidden-stroke risk model years ago now served as the backbone of this exploration into language generation.

What I Learned

To truly understand LLMs, one must clearly separate training from inference.
Only by tracing the moment of generation inside the code
can one see that AI does not “search.”
It creates.
Through probabilities and weighted representations,
it brings into existence sentences that never existed before.

This small experiment reminded me that genuine knowledge is not passive comprehension but active reconstruction, a step beyond understanding into making.