Getting Started with Language AI — How Machines Understand Text
Generative AI has been the talk of the town for the last few years.
But before diving deep into Generative AI and Large Language Models (LLMs), it’s essential to understand one foundational concept — how machines represent language.
Computers don’t understand text, emotions, or meaning the way humans do.
They only understand numbers — 0s and 1s.
So, for machines to “understand” text, we must convert words into numerical representations — a process called text vectorization.
Bag of Words (2000)
One of the earliest and simplest ways to represent text numerically is the Bag of Words (BoW) model.
Let’s consider two example sentences:
- Sentence 1:
"OpenAI is one of the leading artificial intelligence research and deployment companies" - Sentence 2:
"Anthropic is one of the leading artificial intelligence (AI) research and development companies in the world"

Step 1. Build the Vocabulary
We create a combined list of unique words from both sentences.
That’s our vocabulary.
Vocabulary = ["OpenAI", "is", "one", "of", "the", "leading", "artificial", "intelligence", "research", "and", "deployment", "development", "companies", "in", "the", "world"]
Each sentence is then represented as a vector of 0s and 1s,
depending on whether a word exists in that sentence.
Example (simplified):
Sentence 1 → [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0]
Sentence 2 → [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1]
Example: Bag of Words using scikit-learn
from sklearn.feature_extraction.text import CountVectorizer
sentences = [
"OpenAI is one of the leading artificial intelligence research and deployment companies",
"Anthropic is one of the leading artificial intelligence (AI) research and development companies in the world"
]
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(sentences)
print(vectorizer.get_feature_names_out())
print(X.toarray())This outputs the vocabulary and a numeric representation of each sentence.
⚠Limitations of Bag of Words
- No context awareness: It doesn’t understand meaning — it just counts words.
For example, “Apple” (fruit) and “Apple” (company) are treated the same. - Different languages: Some languages (like Japanese or Chinese) don’t use spaces.
Example: “インドへようこそ” (Welcome to India) — can’t be easily tokenized using spaces. - High dimensionality: As vocabulary grows, vectors become large and sparse.
Word2Vec (2013)
The next major leap in text representation came in 2013 with Word2Vec, introduced by Google researchers.
Unlike Bag of Words, Word2Vec uses neural networks to capture meaning and context.
It learns how words relate to each other by looking at nearby words using a technique called the sliding window.

🔍 Example Concept
Words that appear in similar contexts will have similar vector representations.
"car" → [0.8, 0.45, -0.22, ...]"petrol" → [0.78, 0.50, -0.18, ...]
These vectors are close to each other in vector space because they’re semantically related.
🧪 Example: Word2Vec using gensim
from gensim.models import Word2Vec
sentences = [
["car", "needs", "petrol"],
["electric", "car", "uses", "battery"],
["bats", "are", "nocturnal", "animals"],
["cricket", "bats", "are", "made", "of", "willow"]
]
model = Word2Vec(sentences, vector_size=50, window=3, min_count=1, sg=1)
print(model.wv.most_similar("car"))
print(model.wv["bats"])This produces a 50-dimensional vector for each word and shows words similar to “car”.
Contextual Limitation of Word2Vec
While Word2Vec captures relationships, it still struggles with polysemy —
words that have multiple meanings depending on context.
Example:
- A bat is a nocturnal animal.
- Cricket bats are made from willow.
Here, “bat” means two different things.
Word2Vec can’t differentiate these because each word has a single static embedding.

Transformers: Attention Is All You Need (2017)
In 2017, Google released a groundbreaking paper —
👉 “Attention Is All You Need”
This introduced the Transformer architecture, which revolutionized AI.
Transformer Architecture Components
- Encoder – Processes input text
- Decoder – Generates output (e.g., translated sentence)
- Self-Attention Mechanism – Understands context across the entire sequence
- Softmax Layer – Converts numerical scores into probabilities
Why It Mattered
Transformers solved the context problem by using attention —
a mechanism that lets the model “focus” on relevant parts of a sentence while processing.
For instance, in the sentence:
“The animal didn’t cross the street because it was too tired.”
The model learns that “it” refers to “animal,” not “street.”
This architecture paved the way for modern models like GPT, BERT, Claude, and LLaMA — the foundation of today’s Language AI revolution.
Summary
| Era | Method | Key Idea | Limitation |
|---|---|---|---|
| 2000 | Bag of Words | Count word occurrences | No meaning/context |
| 2013 | Word2Vec | Semantic relationships | Static embeddings |
| 2017 | Transformers | Context-aware attention | Compute intensive |
Final Thoughts
Language AI has evolved from counting words to understanding them.
Each generation of models — from Bag of Words to Word2Vec, and now Transformers — has brought us closer to enabling machines to comprehend human language.