Understanding the LLM Client
The LLM client is the core of RANA's interaction with AI models. This lesson covers configuration, provider switching, and advanced options for optimizing your AI applications.
The LLMClient Class
RANA provides a unified client that abstracts away provider differences:
import { LLMClient } from '@rana/core';
// Create a client with default settings
const client = new LLMClient({
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
apiKey: process.env.ANTHROPIC_API_KEY
});
// Or use auto-detection based on env vars
const autoClient = LLMClient.auto();Configuration Options
The client accepts numerous configuration options:
const client = new LLMClient({
// Required
provider: 'anthropic' | 'openai' | 'google' | 'azure',
model: string,
// Authentication
apiKey: string,
baseURL?: string, // For custom endpoints
organization?: string, // OpenAI org ID
// Request defaults
temperature?: number, // 0.0 - 1.0
maxTokens?: number, // Max response tokens
topP?: number, // Nucleus sampling
topK?: number, // Top-k sampling
// Timeouts and retries
timeout?: number, // Request timeout (ms)
maxRetries?: number, // Auto-retry count
retryDelay?: number, // Delay between retries
// Advanced
stream?: boolean, // Default streaming mode
cache?: CacheConfig, // Response caching
rateLimit?: RateLimitConfig
});Making Requests
Basic Chat Completion
const response = await client.chat({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain quantum computing.' }
]
});
console.log(response.content);
console.log(response.usage); // { inputTokens, outputTokens, totalTokens }Streaming Responses
const stream = client.stream({
messages: [
{ role: 'user', content: 'Write a short story about robots.' }
]
});
for await (const chunk of stream) {
process.stdout.write(chunk.content);
}
// Or collect all chunks
const fullResponse = await stream.collect();With Tool Calling
const response = await client.chat({
messages: [
{ role: 'user', content: 'What is the weather in Tokyo?' }
],
tools: [{
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name' }
},
required: ['location']
}
}]
});
if (response.toolCalls) {
for (const call of response.toolCalls) {
console.log(call.name, call.arguments);
}
}Provider-Specific Features
Anthropic-Specific
const client = new LLMClient({
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
// Anthropic-specific options
anthropicBeta: ['prompt-caching-2024-07-31'],
cacheControl: true
});
// Use prompt caching for repeated system prompts
const response = await client.chat({
messages: [
{
role: 'system',
content: longSystemPrompt,
cacheControl: { type: 'ephemeral' }
},
{ role: 'user', content: 'User query here' }
]
});OpenAI-Specific
const client = new LLMClient({
provider: 'openai',
model: 'gpt-4o',
// OpenAI-specific options
organization: 'org-xxxxx',
responseFormat: { type: 'json_object' }
});
// Use JSON mode
const response = await client.chat({
messages: [
{ role: 'user', content: 'Return a JSON object with name and age' }
],
responseFormat: { type: 'json_object' }
});Error Handling
import { LLMError, RateLimitError, AuthError } from '@rana/core';
try {
const response = await client.chat({ messages });
} catch (error) {
if (error instanceof RateLimitError) {
// Wait and retry
await delay(error.retryAfter);
return client.chat({ messages });
}
if (error instanceof AuthError) {
console.error('Check your API key');
}
if (error instanceof LLMError) {
console.error('LLM error:', error.code, error.message);
}
throw error;
}Caching Responses
Enable caching to reduce costs and latency for repeated queries:
const client = new LLMClient({
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
cache: {
enabled: true,
ttl: 3600, // 1 hour
storage: 'redis', // or 'memory', 'file'
keyPrefix: 'llm-cache:'
}
});
// Responses are automatically cached
const response1 = await client.chat({ messages });
const response2 = await client.chat({ messages }); // From cacheRate Limiting
Built-in rate limiting prevents API throttling:
const client = new LLMClient({
provider: 'openai',
model: 'gpt-4o',
rateLimit: {
requestsPerMinute: 60,
tokensPerMinute: 90000,
strategy: 'sliding-window' // or 'fixed-window'
}
});
// Requests are automatically queued and throttledSwitching Providers
RANA makes it easy to switch between providers:
// Factory function for easy switching
function createClient(provider: 'anthropic' | 'openai' | 'google') {
const configs = {
anthropic: {
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
apiKey: process.env.ANTHROPIC_API_KEY
},
openai: {
provider: 'openai',
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY
},
google: {
provider: 'google',
model: 'gemini-pro',
apiKey: process.env.GOOGLE_API_KEY
}
};
return new LLMClient(configs[provider]);
}
// Switch based on env or config
const client = createClient(process.env.LLM_PROVIDER as any);Best Practices
- Reuse clients - Create one client instance and reuse it rather than creating new ones for each request
- Set appropriate timeouts - Longer for complex queries, shorter for simple ones
- Use streaming for long responses - Better UX and allows early termination
- Enable caching for repeated queries - Significant cost savings for common requests
- Handle errors gracefully - Implement proper retry logic with exponential backoff
What's Next?
Now that you understand the LLM client, the next lesson covers RANA's React hooks for building interactive AI interfaces.