Reliability

Build resilient AI applications with automatic retries, circuit breakers, fallback chains, and health monitoring.

npm install @rana/core

Automatic Retry

Smart retry logic with exponential backoff and jitter

import { withRetry, RetryConfig } from '@rana/core';

const config: RetryConfig = {
  maxAttempts: 3,
  initialDelay: 1000,      // 1 second
  maxDelay: 30000,         // 30 seconds max
  backoffMultiplier: 2,
  jitter: true,            // Add randomness to prevent thundering herd
  retryOn: [
    'rate_limit',
    'timeout',
    'server_error'
  ]
};

// Wrap any async function
const result = await withRetry(
  () => chat('Hello, world!'),
  config
);

// Or use as middleware
const client = createClient({
  retry: config
});

Circuit Breaker

Prevent cascade failures with circuit breaker pattern

import { CircuitBreaker } from '@rana/core';

const breaker = new CircuitBreaker({
  failureThreshold: 5,     // Open after 5 failures
  successThreshold: 2,     // Close after 2 successes
  timeout: 30000,          // Half-open after 30s
  volumeThreshold: 10      // Minimum requests before tripping
});

// Execute with circuit breaker
try {
  const result = await breaker.execute(
    () => externalAPICall()
  );
} catch (error) {
  if (error.name === 'CircuitOpenError') {
    // Circuit is open, use fallback
    return fallbackResponse();
  }
  throw error;
}

// Check circuit state
console.log(breaker.state);  // 'closed' | 'open' | 'half-open'
console.log(breaker.stats);  // { failures: 3, successes: 10, ... }

Fallback Chains

Chain multiple providers with automatic failover

import { FallbackChain } from '@rana/core';

const chain = new FallbackChain({
  providers: [
    {
      name: 'primary',
      provider: openAIClient,
      timeout: 10000
    },
    {
      name: 'secondary',
      provider: anthropicClient,
      timeout: 15000
    },
    {
      name: 'local',
      provider: localLLMClient,
      timeout: 30000
    }
  ],
  onFallback: (from, to, error) => {
    console.log(`Falling back from ${from} to ${to}: ${error.message}`);
  }
});

// Automatically tries each provider
const result = await chain.chat({
  messages: [{ role: 'user', content: 'Hello' }]
});

console.log(result.provider);  // Which provider succeeded

Timeout Management

Configurable timeouts with graceful cancellation

import { withTimeout, TimeoutError } from '@rana/core';

// Simple timeout wrapper
try {
  const result = await withTimeout(
    longRunningOperation(),
    { timeout: 5000 }  // 5 seconds
  );
} catch (error) {
  if (error instanceof TimeoutError) {
    console.log('Operation timed out');
  }
}

// With cleanup on timeout
const result = await withTimeout(
  streamingChat(messages),
  {
    timeout: 30000,
    onTimeout: async (controller) => {
      await controller.abort();  // Clean up resources
    }
  }
);

Rate Limiting

Client-side rate limiting to respect API limits

import { RateLimiter } from '@rana/core';

const limiter = new RateLimiter({
  requests: {
    max: 100,
    window: '1m'
  },
  tokens: {
    max: 100000,
    window: '1m'
  }
});

// Check before making request
const allowed = await limiter.checkLimit('requests');
if (!allowed.success) {
  console.log(`Rate limited. Retry in ${allowed.retryAfter}ms`);
  await delay(allowed.retryAfter);
}

// Or use automatic queuing
const result = await limiter.execute(
  () => chat(messages),
  { cost: { requests: 1, tokens: estimatedTokens } }
);

Health Checks

Monitor provider health and availability

import { HealthChecker } from '@rana/core';

const health = new HealthChecker({
  providers: ['openai', 'anthropic', 'local'],
  checkInterval: 30000,  // Check every 30s
  timeout: 5000
});

// Start monitoring
health.start();

// Get current status
const status = await health.getStatus();
// {
//   openai: { healthy: true, latency: 150, lastCheck: '...' },
//   anthropic: { healthy: true, latency: 200, lastCheck: '...' },
//   local: { healthy: false, error: 'Connection refused', lastCheck: '...' }
// }

// Get healthy providers only
const available = health.getHealthyProviders();

// Subscribe to status changes
health.on('statusChange', (provider, status) => {
  if (!status.healthy) {
    alertOps(`${provider} is unhealthy: ${status.error}`);
  }
});

Best Practices

1.Always configure timeouts - never let requests hang indefinitely
2.Use circuit breakers for external dependencies to prevent cascade failures
3.Implement fallback chains with at least 2 providers for critical paths
4.Add jitter to retries to prevent thundering herd problems
5.Monitor health checks and alert on degraded states before failures