Skip to content

Capacity Planning Guide

This guide provides sizing recommendations for deploying the Varpulis CEP engine based on measured benchmark data. All figures were collected on commodity hardware (AMD Ryzen 5800X or equivalent, DDR4-3200, NVMe SSD) and represent single-core throughput unless otherwise noted. Real-world results will vary depending on event payload size, pattern complexity, JIT/OS tuning, and I/O characteristics.

For latency and availability targets, see SLO Definitions.


CPU Sizing

The primary factor in CPU sizing is the type of operations applied to each stream. The table below shows measured throughput per core for each operation type.

OperationThroughput per CoreNotes
Simple filter (.where())234K events/secPredicate evaluation only
Sequence pattern (SASE)256K events/sec2-event sequence with time window
Kleene+ pattern97K events/secMatch-all semantics; throughput decreases with longer matches
Hamlet trend aggregation (1 query)6.9M events/secSingle aggregation query
Hamlet trend aggregation (5 queries)2.8M events/secShared Kleene structure
Hamlet trend aggregation (10 queries)2.1M events/secShared Kleene structure
Hamlet trend aggregation (50 queries)950K events/secShared Kleene structure
PST prediction (single symbol)~19.6M predictions/sec51 ns per prediction
PST PMC forecast (1 active run)93K events/sec10.8 us per event
PST online learning5.4M updates/secIncremental tree updates
PST online learning + pruning5.0M updates/secWith KL-divergence pruning

How to Estimate CPU Requirements

  1. Identify the bottleneck operation for each stream (typically the slowest op in the pipeline).
  2. Divide your target event rate by the per-core throughput of that operation.
  3. Add 30% headroom for GC pauses, OS scheduling, and burst absorption.

Example: A workload of 200K events/sec through a Kleene+ pattern requires 200K / 97K = 2.06 cores. With 30% headroom: 3 cores.


Memory Sizing

ComponentMemory UsageNotes
Base process (no streams)~10 MB RSSRuntime, allocator overhead
Per stream definition~2-5 KBStream definition, router entry, compiled ops
Per event in time window~200-500 bytesDepends on field count and value sizes
SASE per active run~500 bytes - 2 KBGrows with pattern length and partial matches
PST tree (10 event types, depth 5)~50-100 KBPer forecasting stream
Preloaded events (CLI simulate, default)~400 bytes/eventIncludes parsed fields and timestamp
Typical production (10 streams, 100K events/sec)50-200 MB RSSVaries with window sizes and match fanout

Memory Estimation Formula

Total RSS = Base (10 MB)
          + Streams * 5 KB
          + Sum(window_duration_sec * event_rate * 350 bytes)   # per-stream window
          + SASE_active_runs * 1 KB                             # per-stream
          + PST_streams * 100 KB                                # if using .forecast()
          + 30% headroom

Example: 5 streams, each with a 60-second window at 20K events/sec, 100 active SASE runs per stream:

10 MB + 5 * 5 KB + 5 * (60 * 20000 * 350 B) + 5 * 100 * 1 KB + 30%
= 10 MB + 25 KB + 2.1 GB + 500 KB + 30%
= ~2.7 GB

Window size dominates memory. Reduce window durations or use .partition_by() to distribute state across workers when memory is constrained.


Disk Sizing

ComponentDisk UsageNotes
Dead Letter Queue (DLQ)~500 bytes per failed eventJSONL format, includes original event + error
RocksDB checkpointsProportional to window + SASE stateCheckpoint size roughly matches in-memory state
Log output (INFO level)1-10 MB/dayHigher at DEBUG/TRACE; rotate with logrotate or equivalent
Binary size~30-50 MBSingle statically-linked binary

DLQ Projection

DLQ growth/day = failed_events_per_day * 500 bytes

At a 0.01% failure rate with 100K events/sec: 0.0001 * 100000 * 86400 * 500 B = ~430 MB/day. Configure DLQ rotation accordingly.


Network Bandwidth

Per-Event Overhead by Connector

ConnectorProtocol OverheadTypical Event PayloadTotal per Event
MQTT~100 bytes100-500 bytes200-600 bytes
NATS~50 bytes100-500 bytes150-550 bytes
Kafka~100 bytes100-500 bytes200-600 bytes
REST API~200 bytes100-500 bytes300-700 bytes

Bandwidth Estimation

Inbound bandwidth  = event_rate * avg_total_event_size
Outbound bandwidth = emit_rate * avg_total_event_size

Example: 100K events/sec via NATS with 300-byte average events: 100000 * 350 B = 35 MB/sec (~280 Mbps).

Cluster Coordination Traffic

Traffic TypeSizeFrequency
Heartbeat~1 KBEvery 5 seconds per worker
Partition assignment~2-5 KBOn rebalance only
Health check~500 bytesEvery 10 seconds

Cluster coordination overhead is negligible (under 1 Mbps even with 100 workers).


Reference Configurations

Small: 10K events/sec

ResourceRecommendation
CPU1 core
RAM256 MB
Disk1 GB (logs + DLQ)
Network10 Mbps
TopologySingle process, no clustering

Suitable for: Development, testing, low-volume monitoring, edge deployments.

Medium: 100K events/sec

ResourceRecommendation
CPU2-4 cores
RAM1 GB
Disk10 GB (logs + DLQ + checkpoints)
Network100 Mbps
TopologySingle process or 2-node cluster

Suitable for: Production workloads with moderate pattern complexity, typical enterprise monitoring.

Large: 1M events/sec

ResourceRecommendation
CPU8-16 cores
RAM4-8 GB
Disk50 GB (logs + DLQ + checkpoints)
Network1 Gbps
Topology3+ node cluster (1 coordinator + 2+ workers)

Suitable for: High-volume production, complex patterns with Kleene+, multi-query trend aggregation.


Scaling Guidance

When to Scale

IndicatorThresholdAction
CPU utilization> 70% sustainedAdd cores or workers
Queue backlog> 10K eventsAdd workers or increase batch size
Memory utilization> 80% RSSReduce window sizes, add RAM, or partition
Event latency (p99)> SLO targetProfile bottleneck op; scale vertically or horizontally

Coordinator vs Worker Sizing

  • Coordinator: Minimal CPU requirements (mostly coordination and health monitoring). 1 core, 256-512 MB RAM is sufficient for most deployments.
  • Workers: CPU and memory scale with event throughput and pattern complexity. Size according to the tables above.

Vertical vs Horizontal Scaling

Workload TypePreferred ScalingRationale
Sequence/Kleene patternsVertical (faster cores)SASE runs are single-threaded per partition; single-core performance dominates
Multi-query aggregationHorizontal (more workers)Hamlet shares structure across queries; benefits from parallelism
High fan-out (many streams)Horizontal (more workers)Distribute independent streams across workers
PST forecastingVerticalOnline learning and prediction are CPU-bound per stream

Partition-Based Scaling

Use .partition_by(field) in VPL to distribute events across workers by a key field. This enables horizontal scaling while maintaining ordering guarantees within each partition.

vpl
stream Alerts = SecurityEvent as e
    .partition_by(e.source_ip)
    .within(5m)
    .where(e.severity > 3)

Each worker processes a disjoint subset of partitions, allowing near-linear throughput scaling.

Scaling Limits

  • Single-node practical limit: ~500K events/sec (CPU-bound operations).
  • Cluster practical limit: Scales linearly with workers for partitioned workloads up to connector throughput limits.
  • MQTT single-connection ceiling: ~6K events/sec (QoS 0). Use multiple connections or switch to NATS/Kafka for higher throughput.

Monitoring Recommendations

Track these metrics for capacity planning decisions:

  • varpulis_events_processed_total -- Total events processed (rate = throughput).
  • varpulis_event_latency_p99 -- Processing latency at p99.
  • process_resident_memory_bytes -- RSS memory usage.
  • varpulis_sase_active_runs -- Number of active SASE pattern runs (correlates with memory and CPU).
  • varpulis_queue_depth -- Internal queue backlog (leading indicator of saturation).
  • varpulis_dlq_events_total -- Dead letter queue growth rate.

Set alerts at 70% of capacity limits to allow time for scaling actions before SLO breaches.

Varpulis - Next-generation streaming analytics engine