top of page

AI for Dummies: How does AI ‘learn’?

Writer: Lauren CaseyLauren Casey

~ Lauren Casey




While working for NuWave, I’m also a student at ASU mastering computer science. Through my courses, I’ve focused a lot on artificial intelligence. It is crazy to me how popular AI has become in the past couple years. The chatbot market expanded from less than $1 billion in 2016 to approximately $20 billion in 2024. And the computing power used to train AI has increased by a factor of over 300,000 times between 2012 and 2023. The speed at which AI infiltrated our lives is crazy! But with all this growth, most people don’t actually know how AI works. So that’s what I’m here to explain. I feel it’s important to understand the tools we use in our everyday lives and maybe understanding it will help us utilize the tool even better.

 

There are many forms of AI, these may include: image recognition, video analysis, generative imagining, machine translation, large language models (LLM), and so many more. Today, LLM’s are the most popular. This is what assistants like ChatGPT, Apple Intelligence, and Gemini use. To explain what exactly AI is, I’m going to assume that we’re talking about large language models (LLM), but keep in mind that different models may alter the process as they need.


With the LLM’s we use today: we ask a question, and the model responds. But how does it know how to respond? The process is as follows: construct the model, train the model, test the model, deploy the model (which is the bot you are talking to when you use something like ChatGPT). When we initialize a learning model like a chat bot, we must first define a few things like training data, testing data, neurons, layers, connections, activation functions, algorithms and parameters. So, let's initialize these.


Let’s first set up our neural network (the brain of the language model). The network consists of neurons (just like the human brain). These neurons will take in input, do something with it, and spit out output. The ‘do something with it’ part is where it gets a little complex. We will cover this later. But first, to have a functioning chat bot, you must train it.


The training material that trains the AI on how it must respond to the user, is constructed by us (the creator of the AI model). We can construct this material so that each question has an answer. So, in our training material for this article specifically, we will have a list of questions and their corresponding answers. In short, we are training the model by giving it a list of answers to a ton of questions. In a simple, real life, example: if I show you over and over how to make a fire...eventually you’re going to be able to make a fire on your own. The same goes for AI. If I show AI how it should be responding to thousands of different questions, it will eventually learn how to respond to any question. It’s simple pattern recognition. And that’s what the training material does. We give the neurons the question, and it uses predetermined algorithms and parameters to make up an answer. Then it learns from its mistakes and tries again.


Exactly how big is the training sample? I mentioned before we train the model by giving it a list of answers to a ton of questions. So, what is a ton?  This is harder to define because it all depends on how the model responds. A ton could mean 100,000 data samples of questions and their corresponding answers. It could also mean 3 trillion data samples. There’s a sweet spot in how much data is used because a model can be under trained, or it can be over trained. For example, you can imagine only training our model with a data sample of 5 (5 questions and 5 answers to those questions) won’t train our model enough. It’ll never be smart enough to answer any question the user asks. But, giving it way too many data samples could also make it produce a higher error margin, meaning it’ll never be precise enough with its answers. So, finding out exactly how much data to use in training is part of the process.

 

Before we go through a step-by-step example, first let me clarify what I mean when I say ‘algorithms and parameters. There are a ton of options for algorithms and parameters, each have their own use in different models and ways of processing data. An example of an algorithm would be the Sigmoid function, which takes in any real number and uses a mathematical equation to transform it into a value between 0 and 1. This is particularly useful for probabilities, or classification networks (like determining whether a picture looks more like a cat or dog). Parameters may include a weight or bias variable that affects the output of the Sigmoid function. Now, our LLM may not be using this Sigmoid function, I am simply giving this example to show that there are so many different options for algorithms and parameters, and how they process the input data. So rather than specifically defining this in our neural network, when I say ‘algorithms and parameters’ just assume that the neuron is processing the input in a way that will output English text.

 

Let’s begin!  Assume we have a neural network of one neuron. That means the question being asked goes into that one neuron, the neuron ‘does something with it’ and spits out an answer. The question in the training data we are inputting is “What color is the sky?”, the answer in the training data is “blue”.



Step 1: The question, “What color is the sky?”, goes into the neuron.

Step 2: The algorithm and parameters will then process the question and output the answer “octopus” (for example).

Step 3: The model compares the answer generated, “octopus”, to the actual answer, “blue”.

Step 4: The model sees that the answers do not match and tries changing something in the algorithm or parameter. This may be a value within an equation. This may be a weight or some other variable. The way the process is defined determines what variable is changed.

Step 5: This process (Steps 1-4) repeats with the updated changed values, until the neuron outputs “blue”, the correct answer.

Step 6: Finally, if we ask our bot, “What color is the sky?”  it will know that it’s blue without having to look at the training data answer we provided.


These steps show an example of a single neuron neural network and a single training question and answer. If you were to expand this by multiple neurons, multiple layers of neurons, each using their own algorithm and parameters, and a billion more questions... then you’ve got yourself a modern language learning model. Training a model of our size (one neuron, one layer, and let’s say 100,000 data samples for training) could take seconds to minutes. Not long at all. If we expand this to a much more complex model, like what chat bots use today, depending on how complex it is and how much data is used to train, training could take days to weeks, or even longer. This also depends upon the computing power along with other factors that impact runtime.

 

After training, there is testing. Once the model is confident that the margin of error between the actual answer and generated answer is as small as possible, it will enter the testing phase. The testing phase is where we give it a bunch of questions, without providing the answer like before, and see how it answers. If it’s not up to our expectations we can try changing our approach, adding or subtracting layers or neurons, or deciding we need more or less training material to make it smarter. The process loops over and over, until we are satisfied with how the AI is responding on its own, and then we can deploy it as a chat bot to the world.

 

That’s it! That is the process of AI. A different model, like image detection, may use different algorithms but the general idea of training and testing on a loop is the same, because that is what ‘learning’ is to an AI model. It’s just like the human brain (hence why it’s called a neural network). It’s not too complicated when you think of it that way. Maybe you’ve noticed throughout your use of modern chat bots how it may be learning to respond to you. And hopefully understanding how it’s learning may help you utilize it more efficiently. As AI continues to evolve, I hope your knowledge of it does as well, and I think this is a great beginning to that.


  

At NuWave Technologies, our mission is to develop innovative, intuitive enterprise solutions and provide exceptional support, all while making a positive impact. Renowned for our leadership in HPE NonStop integration, our solutions empower mission-critical systems to seamlessly integrate with enterprises, enhancing the world's most valuable data. 

  

Our clientele, which includes numerous Fortune 500 and Global 2000 businesses, chooses NuWave for our unparalleled ability to modernize legacy applications with intuitive and high-performing products. Through our solutions, we vastly improve business performance and efficiency. 

  

At NuWave Technologies, our dedication extends beyond innovation to include sustainability and positive global impact. Join us in shaping a better future for our planet. 

  

 
 

Comments


bottom of page