Understanding the LLM Client

The LLM client is the core of RANA's interaction with AI models. This lesson covers configuration, provider switching, and advanced options for optimizing your AI applications.

The LLMClient Class

RANA provides a unified client that abstracts away provider differences:

import { LLMClient } from '@rana/core';

// Create a client with default settings
const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4-20250514',
  apiKey: process.env.ANTHROPIC_API_KEY
});

// Or use auto-detection based on env vars
const autoClient = LLMClient.auto();

Configuration Options

The client accepts numerous configuration options:

const client = new LLMClient({
  // Required
  provider: 'anthropic' | 'openai' | 'google' | 'azure',
  model: string,

  // Authentication
  apiKey: string,
  baseURL?: string,        // For custom endpoints
  organization?: string,   // OpenAI org ID

  // Request defaults
  temperature?: number,    // 0.0 - 1.0
  maxTokens?: number,      // Max response tokens
  topP?: number,           // Nucleus sampling
  topK?: number,           // Top-k sampling

  // Timeouts and retries
  timeout?: number,        // Request timeout (ms)
  maxRetries?: number,     // Auto-retry count
  retryDelay?: number,     // Delay between retries

  // Advanced
  stream?: boolean,        // Default streaming mode
  cache?: CacheConfig,     // Response caching
  rateLimit?: RateLimitConfig
});

Making Requests

Basic Chat Completion

const response = await client.chat({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing.' }
  ]
});

console.log(response.content);
console.log(response.usage); // { inputTokens, outputTokens, totalTokens }

Streaming Responses

const stream = client.stream({
  messages: [
    { role: 'user', content: 'Write a short story about robots.' }
  ]
});

for await (const chunk of stream) {
  process.stdout.write(chunk.content);
}

// Or collect all chunks
const fullResponse = await stream.collect();

With Tool Calling

const response = await client.chat({
  messages: [
    { role: 'user', content: 'What is the weather in Tokyo?' }
  ],
  tools: [{
    name: 'get_weather',
    description: 'Get current weather for a location',
    parameters: {
      type: 'object',
      properties: {
        location: { type: 'string', description: 'City name' }
      },
      required: ['location']
    }
  }]
});

if (response.toolCalls) {
  for (const call of response.toolCalls) {
    console.log(call.name, call.arguments);
  }
}

Provider-Specific Features

Anthropic-Specific

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4-20250514',
  // Anthropic-specific options
  anthropicBeta: ['prompt-caching-2024-07-31'],
  cacheControl: true
});

// Use prompt caching for repeated system prompts
const response = await client.chat({
  messages: [
    {
      role: 'system',
      content: longSystemPrompt,
      cacheControl: { type: 'ephemeral' }
    },
    { role: 'user', content: 'User query here' }
  ]
});

OpenAI-Specific

const client = new LLMClient({
  provider: 'openai',
  model: 'gpt-4o',
  // OpenAI-specific options
  organization: 'org-xxxxx',
  responseFormat: { type: 'json_object' }
});

// Use JSON mode
const response = await client.chat({
  messages: [
    { role: 'user', content: 'Return a JSON object with name and age' }
  ],
  responseFormat: { type: 'json_object' }
});

Error Handling

import { LLMError, RateLimitError, AuthError } from '@rana/core';

try {
  const response = await client.chat({ messages });
} catch (error) {
  if (error instanceof RateLimitError) {
    // Wait and retry
    await delay(error.retryAfter);
    return client.chat({ messages });
  }
  if (error instanceof AuthError) {
    console.error('Check your API key');
  }
  if (error instanceof LLMError) {
    console.error('LLM error:', error.code, error.message);
  }
  throw error;
}

Caching Responses

Enable caching to reduce costs and latency for repeated queries:

const client = new LLMClient({
  provider: 'anthropic',
  model: 'claude-sonnet-4-20250514',
  cache: {
    enabled: true,
    ttl: 3600,              // 1 hour
    storage: 'redis',       // or 'memory', 'file'
    keyPrefix: 'llm-cache:'
  }
});

// Responses are automatically cached
const response1 = await client.chat({ messages });
const response2 = await client.chat({ messages }); // From cache

Rate Limiting

Built-in rate limiting prevents API throttling:

const client = new LLMClient({
  provider: 'openai',
  model: 'gpt-4o',
  rateLimit: {
    requestsPerMinute: 60,
    tokensPerMinute: 90000,
    strategy: 'sliding-window'  // or 'fixed-window'
  }
});

// Requests are automatically queued and throttled

Switching Providers

RANA makes it easy to switch between providers:

// Factory function for easy switching
function createClient(provider: 'anthropic' | 'openai' | 'google') {
  const configs = {
    anthropic: {
      provider: 'anthropic',
      model: 'claude-sonnet-4-20250514',
      apiKey: process.env.ANTHROPIC_API_KEY
    },
    openai: {
      provider: 'openai',
      model: 'gpt-4o',
      apiKey: process.env.OPENAI_API_KEY
    },
    google: {
      provider: 'google',
      model: 'gemini-pro',
      apiKey: process.env.GOOGLE_API_KEY
    }
  };

  return new LLMClient(configs[provider]);
}

// Switch based on env or config
const client = createClient(process.env.LLM_PROVIDER as any);

Best Practices

Reuse clients - Create one client instance and reuse it rather than creating new ones for each request
Set appropriate timeouts - Longer for complex queries, shorter for simple ones
Use streaming for long responses - Better UX and allows early termination
Enable caching for repeated queries - Significant cost savings for common requests
Handle errors gracefully - Implement proper retry logic with exponential backoff

What's Next?

Now that you understand the LLM client, the next lesson covers RANA's React hooks for building interactive AI interfaces.