This blog is part of my 38 before 38 series. I write a blog for every single day for the 38 days leading up to my 38th Birthday.
ChatGPT is not “AI”. As in It is not an Intelligent Agent that knows how to reason in human terms. It is a cat interface to a Large Language Model. Which is a fancy way of saying it is “What Word Comes Next Predictor Machine”. You give it words. It puts those words in its Word Predictor equation, and gives you the most appropriate next word. And the word after that. And the word after that. Until the next word is likely to be the end of the text.
I will now try to explain, in the simplest of terms possible, what it is actually doing, how it is doing it, and what it can’t do. Most of it will be a gross oversimplification, but it will give you an abstract idea of how this technology works.
Some Mathematics
NO NO, please don’t go! I promise to keep it simple.
In your statistics class, you might have studied regression analysis. As a refresher, it is the equation y = mx + b
. Yes, this will be the only equation in this essay. It means for any given value x you can calculate the closest possible value of y. Even if you don’t have the data. You can do that for more than one “x”. So it can be y = mx1 + nx2 + ... + zx1billion +b
. This a simplest form of how ChatGPT predicts the next word.
A Capulet by any other Token
I lied to you. ChatGPT is not a “Word Predicting Machine”. It is a “Token Predicting Machine”.
“Hey!”, you must be thinking, “what is this token nonsense?”
I did say “the simplest terms possible”. A token is the base unit of a Token Predicting Machine. When you give it words, it breaks them down into tokens. So the sentence “What’s in a name? That which we call a rose by any other name would smell as sweet.” would be “tokenized to:
[What][’s][ in][ a][ name][?][ That][ which][ we][ call][ a][ rose][ by][ any][ other][ name][ would][ smell][ as][ sweet][.]
Each of those blocks is a token. How these are broken down is not relevant to our discussion. It then assigns a number to these tokens. It then put these numbers in it’s Super Duper Ultra Regression Equation. Comes up the with next possible numbers, and converts them to tokens and then displays them as words for you.
But how does it come up with the Regression Equation?
The Great Maw
The creators of ChatGPT took all the text available freely(and some not so freely) and put them in their tokenizer. Every public domain novel, play, song, poem. Every blog post, wikipedia article, reddit thread, forum conversations, publicly leaked email. Then some very very smart scientists and engineers program what the “y” is and what would be the “x”’s that would be used to predict that “y”.
This process is called “training”. It creates the “equation”, or “algorithm”, that would be used in the Prediction Machine. You might be thinking, “But algorithms have existed for centuries. What makes this special?”. It is because no human knows what this algorithm is. It is a secret. We can approximate it, but it is almost impossible for us to know.
This is what makes it “intelligent”. However, that would take us in a philosophical direction. For another day!
Caveat Emptor
I’d like to again note that this an oversimplification. And there are some quite glaring inaccuracy. Any technically adept reader would tell you that. My goal was oversimplification, however. I wanted explain this in a language of an average high-school graduate with little to no technical knowledge. That limits the frames of reference I can apply. I hope I was successful.