Anthropic Claude API Guide: Messages, Tool Use, Vision, and Cost Control

Claude's API offers a different set of strengths to GPT-4o: a 200k context window, excellent instruction-following, and extended thinking for complex reasoning. This guide covers the complete API surface with production-ready patterns.

Setup

npm install @anthropic-ai/sdk

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  maxRetries: 3,
  timeout: 60_000,
});

Model selection

Model	Context	Best for	Relative cost
claude-opus-4-5	200k	Complex reasoning, research, long docs	$$$$
claude-sonnet-4-5	200k	Balanced — coding, analysis, agents	$$
claude-haiku-4-5	200k	Speed-critical, high-volume tasks	$

Basic message

const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-5',
  max_tokens: 1024,
  system: 'You are a helpful engineering assistant focused on practical examples.',
  messages: [
    { role: 'user', content: 'Explain database indexing in simple terms.' }
  ],
});

console.log(message.content[0].type === 'text' ? message.content[0].text : '');
console.log('Input tokens:', message.usage.input_tokens);
console.log('Output tokens:', message.usage.output_tokens);

System prompt best practices

// Claude responds strongly to role definition and constraints
const systemPrompt = `You are an expert database consultant with 15 years of PostgreSQL experience.

Your communication style:
- Lead with the most actionable recommendation
- Use concrete examples with actual SQL
- Flag trade-offs explicitly
- Mention performance implications when relevant

Constraints:
- Never recommend deprecated features
- Always note when a suggestion requires a specific PostgreSQL version
- If you're uncertain about something, say so explicitly`;

Multi-turn conversation

// Messages must alternate user/assistant
const conversationHistory: Anthropic.MessageParam[] = [];

async function chat(userMessage: string): Promise<string> {
  conversationHistory.push({ role: 'user', content: userMessage });

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-5',
    max_tokens: 1024,
    system: systemPrompt,
    messages: conversationHistory,
  });

  const assistantMessage = response.content[0].type === 'text' ? response.content[0].text : '';
  conversationHistory.push({ role: 'assistant', content: response.content });

  return assistantMessage;
}

await chat('What index type should I use for a text search column?');
await chat('How does that compare to a full-text search index?');
const answer = await chat('Which would you recommend for a 10M row table?');

Vision (image analysis)

import fs from 'fs';

// Base64 image
const imageData = fs.readFileSync('diagram.png').toString('base64');

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-5',
  max_tokens: 1024,
  messages: [{
    role: 'user',
    content: [
      {
        type: 'image',
        source: {
          type: 'base64',
          media_type: 'image/png',
          data: imageData,
        },
      },
      {
        type: 'text',
        text: 'Analyze this system architecture diagram and identify potential single points of failure.',
      },
    ],
  }],
});

// URL-based image
const response2 = await anthropic.messages.create({
  model: 'claude-sonnet-4-5',
  max_tokens: 1024,
  messages: [{
    role: 'user',
    content: [
      { type: 'image', source: { type: 'url', url: 'https://example.com/chart.png' } },
      { type: 'text', text: 'What trend does this chart show?' },
    ],
  }],
});

Extended thinking (Claude 3.7+)

// Extended thinking lets Claude reason through complex problems before answering
const response = await anthropic.messages.create({
  model: 'claude-opus-4-5',
  max_tokens: 16_000,
  thinking: {
    type: 'enabled',
    budget_tokens: 10_000,  // tokens allocated for internal reasoning
  },
  messages: [{
    role: 'user',
    content: `Design a distributed rate limiting system that works across 50 
    microservices with sub-millisecond overhead. Consider Redis, token buckets, 
    sliding windows, and eventual consistency trade-offs.`,
  }],
});

// Response includes thinking blocks (optional to display)
for (const block of response.content) {
  if (block.type === 'thinking') {
    console.log('Claude thought:', block.thinking);  // internal reasoning
  }
  if (block.type === 'text') {
    console.log('Final answer:', block.text);
  }
}

Streaming

const stream = await anthropic.messages.stream({
  model: 'claude-sonnet-4-5',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Write a detailed PostgreSQL performance tuning guide.' }],
});

for await (const event of stream) {
  if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
    process.stdout.write(event.delta.text);
  }
}

const finalMessage = await stream.getFinalMessage();
console.log('
Total tokens:', finalMessage.usage.input_tokens + finalMessage.usage.output_tokens);

Prompt caching (cost optimization)

// Cache large stable prefixes — saves up to 90% on cached tokens
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-5',
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: largeStaticDocument,  // e.g., full product docs
      cache_control: { type: 'ephemeral' },  // cache this block
    },
  ],
  messages: [{ role: 'user', content: userQuestion }],
});

console.log('Cache read tokens (cheap):', response.usage.cache_read_input_tokens);
console.log('Cache write tokens (normal):', response.usage.cache_creation_input_tokens);

Tool use pattern

const tools: Anthropic.Tool[] = [{
  name: 'analyze_query',
  description: 'Analyze a SQL query and return performance metrics',
  input_schema: {
    type: 'object' as const,
    properties: {
      query:   { type: 'string', description: 'The SQL query to analyze' },
      dialect: { type: 'string', enum: ['postgresql', 'mysql', 'sqlite'] },
    },
    required: ['query'],
  },
}];

async function runToolLoop(userMessage: string) {
  const messages: Anthropic.MessageParam[] = [{ role: 'user', content: userMessage }];

  for (let step = 0; step < 5; step++) {
    const response = await anthropic.messages.create({
      model: 'claude-sonnet-4-5',
      max_tokens: 2048,
      tools,
      messages,
    });

    messages.push({ role: 'assistant', content: response.content });

    if (response.stop_reason === 'end_turn') {
      const textBlock = response.content.find(b => b.type === 'text');
      return textBlock?.type === 'text' ? textBlock.text : '';
    }

    if (response.stop_reason === 'tool_use') {
      const toolUseBlocks = response.content.filter(b => b.type === 'tool_use');
      const toolResults: Anthropic.ToolResultBlockParam[] = [];

      for (const block of toolUseBlocks) {
        if (block.type === 'tool_use') {
          const result = await executeToolByName(block.name, block.input);
          toolResults.push({ type: 'tool_result', tool_use_id: block.id, content: JSON.stringify(result) });
        }
      }

      messages.push({ role: 'user', content: toolResults });
    }
  }
}

Cost comparison vs OpenAI (June 2026)

Task	Best Claude model	Best OpenAI model
Long document analysis (>50k tokens)	claude-haiku-4-5	gpt-4o-mini (higher cost)
Complex reasoning	claude-opus-4-5 + thinking	o3
Coding tasks	claude-sonnet-4-5	gpt-4o
High-volume classification	claude-haiku-4-5	gpt-4o-mini

Takeaway

Use Claude for long-document analysis (200k context), complex multi-step reasoning with extended thinking, and applications where instruction-following consistency is critical. Leverage prompt caching for repetitive system prompts or large static documents — it reduces cached token costs by 90%.