Embeddings Deep Dive: Models, Dimensions, Normalization, and Production Patterns
Embeddings are the foundational primitive of modern AI applications: semantic search, RAG, clustering, classification, deduplication, and anomaly detection all depend on high-quality vector representations. This guide covers everything you need to deploy embeddings in production.
What is an embedding?
An embedding maps text (or other data) to a fixed-size vector of floats. Semantically similar texts produce similar vectors — measured by cosine similarity or dot product.
// "cat" and "kitten" → similar vectors
// "cat" and "database" → dissimilar vectors
const catVec = await embed('cat'); // [0.12, -0.87, 0.34, ...]
const kittenVec = await embed('kitten'); // [0.11, -0.89, 0.36, ...]
const dbVec = await embed('database'); // [-0.45, 0.23, -0.67, ...]
cosineSimilarity(catVec, kittenVec); // ~0.93
cosineSimilarity(catVec, dbVec); // ~0.08Embedding model comparison
| Model | Dimensions | Max tokens | Cost/1M tokens | Best for |
|---|---|---|---|---|
| text-embedding-3-small | 1536 | 8191 | $0.020 | General, cost-sensitive |
| text-embedding-3-large | 3072 | 8191 | $0.130 | Max accuracy |
| text-embedding-ada-002 | 1536 | 8191 | $0.100 | Legacy |
| BAAI/bge-large-en-v1.5 | 1024 | 512 | Free (self-hosted) | Open-source, on-prem |
| nomic-embed-text | 768 | 8192 | Free (Ollama) | Local dev, privacy |
| Cohere embed-v3 | 1024 | 512 | $0.100 | Multilingual |
Dimension reduction with Matryoshka
OpenAI's text-embedding-3 models support dimension reduction — smaller dimensions are cheaper to store and faster to search, with a small accuracy trade-off:
// Reduce dimensions for storage efficiency
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
dimensions: 256, // default 1536 → 256 (83% storage savings)
});
// Benchmark: text-embedding-3-small@256 still beats ada-002@1536
// on MTEB benchmarks despite 6x fewer dimensionsBatch embedding for efficiency
// Process up to 2048 texts in a single API call
async function batchEmbed(texts: string[], model = 'text-embedding-3-small'): Promise<number[][]> {
const BATCH_SIZE = 100; // safe limit for large texts
const results: number[][] = [];
for (let i = 0; i < texts.length; i += BATCH_SIZE) {
const batch = texts.slice(i, i + BATCH_SIZE);
const response = await openai.embeddings.create({ model, input: batch });
results.push(...response.data.map(d => d.embedding));
// Respect rate limits
if (i + BATCH_SIZE < texts.length) {
await new Promise(r => setTimeout(r, 100));
}
}
return results;
}
// Index 10,000 documents
const allTexts = documents.map(d => d.content);
const allEmbeddings = await batchEmbed(allTexts);
// Much faster than 10,000 individual API callsVector normalization
// OpenAI embeddings are already normalized (unit vectors)
// For other models, normalize manually before storing
function normalize(vector: number[]): number[] {
const magnitude = Math.sqrt(vector.reduce((sum, v) => sum + v * v, 0));
return vector.map(v => v / magnitude);
}
// For normalized vectors: cosine_similarity = dot_product (much faster)
function dotProduct(a: number[], b: number[]): number {
return a.reduce((sum, v, i) => sum + v * b[i], 0);
}
// Store normalized vectors for 3–5x faster similarity computationEmbedding cache
import { createHash } from 'crypto';
import { Redis } from '@upstash/redis';
const redis = new Redis({ url: process.env.UPSTASH_REDIS_URL! });
async function cachedEmbed(text: string, model = 'text-embedding-3-small'): Promise<number[]> {
const key = `embed:${model}:${createHash('sha256').update(text).digest('hex')}`;
const cached = await redis.get<number[]>(key);
if (cached) return cached;
const response = await openai.embeddings.create({ model, input: text });
const embedding = response.data[0].embedding;
await redis.set(key, embedding, { ex: 86400 * 30 }); // cache 30 days
return embedding;
}
// For static content (product catalog, docs), pre-compute and store foreverUse case patterns
Semantic search
// Embed query → find nearest documents in vector store
const queryVec = await cachedEmbed(userQuery);
const results = await vectorStore.query({ vector: queryVec, topK: 5 });Deduplication
// Find near-duplicate content (cosine similarity > 0.95)
async function findDuplicates(texts: string[]): Promise<[number, number][]> {
const embeddings = await batchEmbed(texts);
const duplicates: [number, number][] = [];
for (let i = 0; i < embeddings.length; i++) {
for (let j = i + 1; j < embeddings.length; j++) {
if (dotProduct(embeddings[i], embeddings[j]) > 0.95) {
duplicates.push([i, j]);
}
}
}
return duplicates;
}Classification without fine-tuning
// Zero-shot classification via embedding similarity to label descriptions
const labels = ['billing issue', 'technical bug', 'feature request', 'general inquiry'];
const labelEmbeddings = await batchEmbed(labels);
async function classify(text: string): Promise<string> {
const textEmbedding = await cachedEmbed(text);
const scores = labelEmbeddings.map(le => dotProduct(textEmbedding, le));
const bestIdx = scores.indexOf(Math.max(...scores));
return labels[bestIdx];
}
const category = await classify('My payment failed with error code 402');
// 'billing issue'Anomaly detection
// Flag messages that are semantically far from normal patterns
const normalMessages = await batchEmbed(historicalNormalMessages);
const centroid = normalMessages[0].map((_, i) =>
normalMessages.reduce((sum, v) => sum + v[i], 0) / normalMessages.length
);
async function isAnomalous(message: string, threshold = 0.5): Promise<boolean> {
const embedding = await cachedEmbed(message);
const similarity = dotProduct(embedding, normalize(centroid));
return similarity < threshold;
}Recommendation
// Find similar items in a catalog
async function findSimilarProducts(productId: string, topK = 5) {
const productEmbedding = await getStoredEmbedding(productId);
return vectorStore.query({
vector: productEmbedding,
topK: topK + 1, // +1 to exclude the query item itself
filter: { type: 'product' },
}).then(results => results.filter(r => r.id !== productId).slice(0, topK));
}Production checklist
- Cache embeddings for static content — never re-embed the same text twice.
- Version your embedding model — switching models invalidates all stored vectors.
- Normalize vectors before storage for faster cosine similarity computation.
- Use batching for indexing — never call the embeddings API in a loop one-by-one.
- Pre-compute label embeddings for classification — they never change.
- Monitor embedding API latency separately from generation latency.
Takeaway
Use text-embedding-3-small with dimensions: 256 as your default — it outperforms ada-002 at a fraction of the storage cost. Cache aggressively, batch your indexing pipeline, and never change your embedding model mid-project without re-indexing everything.