Andy Hinkle

Building a Circuit Breaker for LLM Services in Laravel

If you're integrating LLM services like OpenAI, Anthropic, or vLLM into your Laravel application, you've probably experienced the frustration of service outages. Your queue fills up with failed jobs, your workers keep hammering a dead service, and your application grinds to a halt.

Let me introduce you to the circuit breaker pattern - a design pattern that helps your application gracefully handle failures in external services, especially LLMs.

What is a Circuit Breaker?

Think of a circuit breaker like the one in your home's electrical panel. When there's an electrical fault, the breaker trips and cuts power to prevent damage. Once the issue is resolved, you reset the breaker and power flows again.

In software, a circuit breaker monitors for failures. When failures exceed a threshold, the circuit "opens" and stops sending requests to the failing service. After a cooldown period, it allows a single probe request through. If that succeeds, the circuit closes and normal operation resumes.

The pattern has three core states:

  • Closed: Normal operation. Requests flow through.
  • Open: Service is down. Requests are blocked to prevent cascading failures.
  • Half-Open: Testing recovery. A single probe request determines if the service is back.

Circuit Breaker State Diagram

Why LLM Services Need This

LLM services are particularly prone to outages. They're resource-intensive and can fail for various reasons: GPU memory exhaustion, model loading issues, network problems, or simply being overwhelmed with requests.

Without a circuit breaker, here's what happens when your LLM service goes down:

  1. Jobs start failing with connection timeouts
  2. Laravel's queue worker retries each job multiple times
  3. Your queue backs up with thousands of failing jobs
  4. Each retry hammers the already-struggling service
  5. When the service recovers, it gets immediately overwhelmed again

With a circuit breaker, the pattern is much healthier:

  1. A few jobs fail with connection timeouts
  2. The circuit breaker trips after the threshold is reached
  3. Workers stop processing LLM jobs entirely
  4. After a backoff period, a lightweight probe checks if the service is back
  5. If the probe succeeds, workers gradually spin back up

The State Machine

Beyond the basic three states, a robust implementation includes extended states for handling repeated failures:

1enum CircuitBreakerState: string
2{
3 case Closed = 'closed';
4 case Open = 'open';
5 case HalfOpen = 'half_open';
6 case OpenExtended = 'open_extended';
7 case HalfOpenExtended = 'half_open_extended';
8 
9 public function isOpen(): bool
10 {
11 return in_array($this, [self::Open, self::OpenExtended], true);
12 }
13 
14 public function allowsProcessing(): bool
15 {
16 return $this === self::Closed;
17 }
18 
19 public function allowsProbe(): bool
20 {
21 return in_array($this, [self::HalfOpen, self::HalfOpenExtended], true);
22 }
23}

The extended states handle a common scenario: the service comes back briefly, the probe succeeds, but then fails again immediately. Instead of using the short initial backoff, the extended states use a longer backoff to give the service more time to stabilize.

Configuration

Make your circuit breaker configurable so you can tune it for your specific use case:

1// config/llm.php
2'circuit_breaker' => [
3 // Number of failures within the window to trip the circuit
4 'failure_threshold' => env('CIRCUIT_BREAKER_FAILURE_THRESHOLD', 3),
5 
6 // Rolling window (minutes) for counting failures
7 'failure_window_minutes' => env('CIRCUIT_BREAKER_FAILURE_WINDOW', 5),
8 
9 // Wait time (minutes) after circuit opens before first probe
10 'initial_backoff_minutes' => env('CIRCUIT_BREAKER_INITIAL_BACKOFF', 5),
11 
12 // Wait time (minutes) after probe failure before next probe
13 'extended_backoff_minutes' => env('CIRCUIT_BREAKER_EXTENDED_BACKOFF', 15),
14 
15 // Time (minutes) at each scale level before increasing workers
16 'scale_up_interval_minutes' => env('CIRCUIT_BREAKER_SCALE_INTERVAL', 5),
17 
18 // Maximum number of worker processes when fully scaled
19 'max_workers' => env('CIRCUIT_BREAKER_MAX_WORKERS', 4),
20],

These defaults mean: if 3 jobs fail within 5 minutes, trip the circuit. Wait 5 minutes, then probe. If the probe fails, wait 15 minutes before trying again.

Detecting Relevant Failures

Not every job failure should trip the circuit breaker. A validation error or business logic exception shouldn't affect the circuit state. You only care about infrastructure failures:

1class FailureDetector
2{
3 private const FAILURE_PATTERNS = [
4 'Connection timeout',
5 'Connection refused',
6 'Operation timed out',
7 'cURL error',
8 'Failed to connect',
9 'Connection reset',
10 'Network is unreachable',
11 ];
12 
13 public function isCircuitBreakerFailure(Throwable $exception): bool
14 {
15 if ($exception instanceof ConnectionException) {
16 return true;
17 }
18 
19 return Str::contains(
20 $exception->getMessage(),
21 self::FAILURE_PATTERNS,
22 ignoreCase: true
23 );
24 }
25}

Listening for Failures

Use Laravel's job failed event to record failures:

1class LlmJobFailedListener
2{
3 public function __construct(
4 private readonly CircuitBreakerManager $manager,
5 private readonly FailureDetector $detector,
6 ) {}
7 
8 public function handle(JobFailed $event): void
9 {
10 // Only monitor the LLM queue
11 if ($event->job->getQueue() !== 'llm') {
12 return;
13 }
14 
15 // Only trip on infrastructure failures
16 if (! $this->detector->isCircuitBreakerFailure($event->exception)) {
17 return;
18 }
19 
20 $circuitType = $this->resolveCircuitType($event);
21 
22 $this->manager
23 ->for($circuitType)
24 ->recordFailure($this->detector->extractReason($event->exception));
25 }
26}

The Probe Job

When the circuit transitions to half-open, dispatch a lightweight probe job to test if the service is back:

1class CircuitBreakerProbeJob implements ShouldQueue
2{
3 public int $timeout = 60;
4 public int $tries = 1;
5 
6 public function __construct(
7 public readonly string $circuitType,
8 ) {
9 $this->onQueue('llm');
10 }
11 
12 public function handle(CircuitBreakerManager $manager): void
13 {
14 $breaker = $manager->for($this->circuitType);
15 
16 // Send a minimal request to test connectivity
17 $response = Llm::connection($connection)
18 ->timeout(30)
19 ->chat([
20 ['role' => 'user', 'content' => 'Reply with exactly: OK'],
21 ]);
22 
23 if ($response->failed()) {
24 $breaker->probeFailure();
25 throw new RuntimeException("Probe failed: {$response->text()}");
26 }
27 
28 // Success! Close the circuit
29 $breaker->recordSuccess();
30 }
31 
32 public function failed(Throwable $exception): void
33 {
34 // If the job itself fails, record that as a probe failure
35 app(CircuitBreakerManager::class)
36 ->for($this->circuitType)
37 ->probeFailure();
38 }
39}

The probe is intentionally minimal. You're not trying to do real work, just checking if the service responds at all.

Adaptive Worker Scaling

Here's where things get interesting. Rather than going from 0 workers to full capacity when the circuit closes, scale up gradually. This prevents overwhelming a service that just recovered:

1public function checkScaleUp(): bool
2{
3 if ($this->state() !== CircuitBreakerState::Closed) {
4 return false;
5 }
6 
7 $stableSince = $this->scaleStableSince();
8 $currentLevel = $this->scaleLevel();
9 $maxWorkers = Config::integer('llm.circuit_breaker.max_workers', 4);
10 
11 if ($currentLevel >= $maxWorkers) {
12 return false;
13 }
14 
15 // Must be stable for the configured interval before scaling up
16 if ($this->minutesSince($stableSince) < Config::integer('llm.circuit_breaker.scale_up_interval_minutes', 5)) {
17 return false;
18 }
19 
20 $newLevel = min($currentLevel + 1, $maxWorkers);
21 
22 $this->remember('scale_level', $newLevel);
23 $this->remember('scale_stable_since', now()->timestamp);
24 
25 return true;
26}

This means after the circuit closes, you start with 1 worker. After 5 minutes of stability, scale to 2. Another 5 minutes, scale to 3. And so on up to your max.

The Daemon Command

Tie it all together with a daemon that manages workers based on circuit state:

1class LlmQueueDaemonCommand extends Command
2{
3 protected $signature = 'llm:daemon {--check-interval=10}';
4 
5 private array $workers = [];
6 
7 public function handle(CircuitBreakerManager $manager): int
8 {
9 $this->info('Starting LLM Queue Daemon with Circuit Breaker...');
10 
11 while ($this->shouldRun) {
12 foreach ($manager->all() as $type => $breaker) {
13 // Check if we should probe
14 if ($breaker->shouldTransitionToHalfOpen()) {
15 $breaker->transitionToHalfOpen();
16 CircuitBreakerProbeJob::dispatch($type);
17 }
18 
19 // Check if we can scale up
20 if ($breaker->state() === CircuitBreakerState::Closed) {
21 $breaker->checkScaleUp();
22 }
23 }
24 
25 // Adjust workers to match target scale level
26 $targetWorkers = $manager->getEffectiveScaleLevel();
27 $this->adjustWorkers($targetWorkers);
28 
29 sleep($this->option('check-interval'));
30 }
31 
32 return self::SUCCESS;
33 }
34 
35 private function adjustWorkers(int $target): void
36 {
37 $current = count($this->workers);
38 
39 if ($current < $target) {
40 $this->startWorkers($target - $current);
41 } elseif ($current > $target) {
42 $this->stopWorkers($current - $target);
43 }
44 }
45}

Run this as a long-running process: php artisan llm:daemon. It continuously monitors the circuit state and adjusts workers accordingly.

Using Cache for State

Store the circuit state in cache so it's shared across all workers and survives restarts:

1private function transitionTo(CircuitBreakerState $state): void
2{
3 Cache::put(
4 "circuit_breaker:{$this->connection}:state",
5 $state->value,
6 now()->addHours(24)
7 );
8}
9 
10public function state(): CircuitBreakerState
11{
12 $state = Cache::get("circuit_breaker:{$this->connection}:state");
13 
14 return CircuitBreakerState::tryFrom($state) ?? CircuitBreakerState::Closed;
15}

Use Redis or another shared cache driver in production. File cache won't work across multiple servers.

Separate Circuits for Different Services

If you're using multiple LLM services (e.g., one for chat, one for embeddings), use separate circuit breakers for each:

1class CircuitBreakerManager
2{
3 /** @var array<string, CircuitBreaker> */
4 private array $breakers = [];
5 
6 public function for(string $type): CircuitBreaker
7 {
8 return $this->breakers[$type] ??= new CircuitBreaker($type);
9 }
10 
11 public function getEffectiveScaleLevel(): int
12 {
13 // If any circuit is open, scale to 0
14 foreach ($this->breakers as $breaker) {
15 if ($breaker->state()->isOpen()) {
16 return 0;
17 }
18 }
19 
20 // Otherwise, use the minimum scale level across all circuits
21 return min(array_map(
22 fn ($b) => $b->scaleLevel(),
23 $this->breakers
24 ));
25 }
26}

This way, if your embedding service goes down, you can still process chat requests (assuming they don't depend on each other).

Health Checks Before Probing

For self-hosted services, you might have a health check endpoint. Use it before dispatching a probe to avoid unnecessary work:

1if ($manager->healthCheckBeforeProbe() && ! $vllmMetrics->isAvailable()) {
2 $this->line("Health check failed, skipping probe");
3 return;
4}
5 
6$breaker->transitionToHalfOpen();
7CircuitBreakerProbeJob::dispatch($type);

It's a bit of work to set up, but the first time your LLM service goes down at 2 AM and your app just handles it - you'll be glad you did.