Anthropic Claude API Guide: Messages, Tool Use, Vision, and Cost Control
·12 min read
Claude's API offers a different set of strengths to GPT-4o: a 200k context window, excellent instruction-following, and extended thinking for complex reasoning. This guide covers the complete API surface with production-ready patterns.
Setup
npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
maxRetries: 3,
timeout: 60_000,
});Model selection
| Model | Context | Best for | Relative cost |
|---|---|---|---|
| claude-opus-4-5 | 200k | Complex reasoning, research, long docs | $$$$ |
| claude-sonnet-4-5 | 200k | Balanced — coding, analysis, agents | $$ |
| claude-haiku-4-5 | 200k | Speed-critical, high-volume tasks | $ |
Basic message
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
system: 'You are a helpful engineering assistant focused on practical examples.',
messages: [
{ role: 'user', content: 'Explain database indexing in simple terms.' }
],
});
console.log(message.content[0].type === 'text' ? message.content[0].text : '');
console.log('Input tokens:', message.usage.input_tokens);
console.log('Output tokens:', message.usage.output_tokens);System prompt best practices
// Claude responds strongly to role definition and constraints const systemPrompt = `You are an expert database consultant with 15 years of PostgreSQL experience. Your communication style: - Lead with the most actionable recommendation - Use concrete examples with actual SQL - Flag trade-offs explicitly - Mention performance implications when relevant Constraints: - Never recommend deprecated features - Always note when a suggestion requires a specific PostgreSQL version - If you're uncertain about something, say so explicitly`;
Multi-turn conversation
// Messages must alternate user/assistant
const conversationHistory: Anthropic.MessageParam[] = [];
async function chat(userMessage: string): Promise<string> {
conversationHistory.push({ role: 'user', content: userMessage });
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
system: systemPrompt,
messages: conversationHistory,
});
const assistantMessage = response.content[0].type === 'text' ? response.content[0].text : '';
conversationHistory.push({ role: 'assistant', content: response.content });
return assistantMessage;
}
await chat('What index type should I use for a text search column?');
await chat('How does that compare to a full-text search index?');
const answer = await chat('Which would you recommend for a 10M row table?');Vision (image analysis)
import fs from 'fs';
// Base64 image
const imageData = fs.readFileSync('diagram.png').toString('base64');
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
messages: [{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: imageData,
},
},
{
type: 'text',
text: 'Analyze this system architecture diagram and identify potential single points of failure.',
},
],
}],
});
// URL-based image
const response2 = await anthropic.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
messages: [{
role: 'user',
content: [
{ type: 'image', source: { type: 'url', url: 'https://example.com/chart.png' } },
{ type: 'text', text: 'What trend does this chart show?' },
],
}],
});Extended thinking (Claude 3.7+)
// Extended thinking lets Claude reason through complex problems before answering
const response = await anthropic.messages.create({
model: 'claude-opus-4-5',
max_tokens: 16_000,
thinking: {
type: 'enabled',
budget_tokens: 10_000, // tokens allocated for internal reasoning
},
messages: [{
role: 'user',
content: `Design a distributed rate limiting system that works across 50
microservices with sub-millisecond overhead. Consider Redis, token buckets,
sliding windows, and eventual consistency trade-offs.`,
}],
});
// Response includes thinking blocks (optional to display)
for (const block of response.content) {
if (block.type === 'thinking') {
console.log('Claude thought:', block.thinking); // internal reasoning
}
if (block.type === 'text') {
console.log('Final answer:', block.text);
}
}Streaming
const stream = await anthropic.messages.stream({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Write a detailed PostgreSQL performance tuning guide.' }],
});
for await (const event of stream) {
if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
process.stdout.write(event.delta.text);
}
}
const finalMessage = await stream.getFinalMessage();
console.log('
Total tokens:', finalMessage.usage.input_tokens + finalMessage.usage.output_tokens);Prompt caching (cost optimization)
// Cache large stable prefixes — saves up to 90% on cached tokens
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 1024,
system: [
{
type: 'text',
text: largeStaticDocument, // e.g., full product docs
cache_control: { type: 'ephemeral' }, // cache this block
},
],
messages: [{ role: 'user', content: userQuestion }],
});
console.log('Cache read tokens (cheap):', response.usage.cache_read_input_tokens);
console.log('Cache write tokens (normal):', response.usage.cache_creation_input_tokens);Tool use pattern
const tools: Anthropic.Tool[] = [{
name: 'analyze_query',
description: 'Analyze a SQL query and return performance metrics',
input_schema: {
type: 'object' as const,
properties: {
query: { type: 'string', description: 'The SQL query to analyze' },
dialect: { type: 'string', enum: ['postgresql', 'mysql', 'sqlite'] },
},
required: ['query'],
},
}];
async function runToolLoop(userMessage: string) {
const messages: Anthropic.MessageParam[] = [{ role: 'user', content: userMessage }];
for (let step = 0; step < 5; step++) {
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 2048,
tools,
messages,
});
messages.push({ role: 'assistant', content: response.content });
if (response.stop_reason === 'end_turn') {
const textBlock = response.content.find(b => b.type === 'text');
return textBlock?.type === 'text' ? textBlock.text : '';
}
if (response.stop_reason === 'tool_use') {
const toolUseBlocks = response.content.filter(b => b.type === 'tool_use');
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const block of toolUseBlocks) {
if (block.type === 'tool_use') {
const result = await executeToolByName(block.name, block.input);
toolResults.push({ type: 'tool_result', tool_use_id: block.id, content: JSON.stringify(result) });
}
}
messages.push({ role: 'user', content: toolResults });
}
}
}Cost comparison vs OpenAI (June 2026)
| Task | Best Claude model | Best OpenAI model |
|---|---|---|
| Long document analysis (>50k tokens) | claude-haiku-4-5 | gpt-4o-mini (higher cost) |
| Complex reasoning | claude-opus-4-5 + thinking | o3 |
| Coding tasks | claude-sonnet-4-5 | gpt-4o |
| High-volume classification | claude-haiku-4-5 | gpt-4o-mini |
Takeaway
Use Claude for long-document analysis (200k context), complex multi-step reasoning with extended thinking, and applications where instruction-following consistency is critical. Leverage prompt caching for repetitive system prompts or large static documents — it reduces cached token costs by 90%.