Skip to content

Varpulis vs Kafka Streams / ksqlDB for CEP

Kafka Streams and ksqlDB are popular tools in the Apache Kafka ecosystem for stream processing. Varpulis is a purpose-built complex event processing (CEP) engine. While both process event streams in real time, they are designed for fundamentally different workloads.

This page explains the key differences and helps you decide when to use each -- or when to use them together.

The Core Difference

Kafka Streams / ksqlDB are general-purpose stream processing tools. They excel at filtering, transforming, aggregating, and joining streams. They operate on individual events or windows of events.

Varpulis is a specialized CEP engine. It excels at detecting complex temporal patterns across sequences of events -- "event A followed by event B followed by one or more events C, all within 10 minutes, for the same user." This is a fundamentally different problem from filtering or aggregation.

CapabilityKafka Streams / ksqlDBVarpulis
Filter eventsYesYes
Transform eventsYesYes
Windowed aggregationYesYes
Stream joinsYesYes
Temporal sequences (A -> B -> C)Manual implementationNative -> operator
Kleene closures (A+, A)*Not supportedNative SASE+ operator
Negation (A -> NOT B -> C)Manual implementationNative NOT operator
Pattern forecastingNot supportedBuilt-in PST engine
Multi-event correlationCustom codeDeclarative patterns

At a Glance

DimensionVarpulisKafka StreamsksqlDB
Primary focusComplex event processingStream processingStream processing (SQL)
LanguageVPL (declarative DSL)Java/Kotlin APISQL-like (KSQL)
RuntimeNative Rust binaryJVM libraryJVM server
Kafka dependencyOptional (has Kafka connector)RequiredRequired
DeploymentStandalone binaryEmbedded in applicationksqlDB server cluster
Pattern modelSASE+ (NFA-based, academic)NoneNone
Memory baseline~10--50 MB200 MB+ (JVM)500 MB+ (JVM + RocksDB)
Cold start< 100 msSeconds (JVM + rebalance)Seconds to minutes

Pattern Matching: Where the Gap Is Largest

The biggest difference between these tools is in temporal pattern detection. Consider detecting a fraud pattern: "Login from a new IP, followed by three or more small transactions, followed by a large withdrawal -- all within 2 hours, for the same user."

Varpulis -- 20 lines of declarative pattern

vpl
event Login:
    user_id: str
    ip_address: str

event Transaction:
    user_id: str
    amount: float
    type: str

stream FraudPattern = Login as login
    -> Transaction where user_id == login.user_id
                     and amount < 100 and type == "purchase" as small1
    .within(1h)
    -> Transaction where user_id == login.user_id
                     and amount < 100 and type == "purchase" as small2
    .within(30m)
    -> Transaction where user_id == login.user_id
                     and amount < 100 and type == "purchase" as small3
    .within(30m)
    -> Transaction where user_id == login.user_id
                     and type == "withdrawal" and amount > 1000 as withdrawal
    .within(2h)
    .emit(
        alert_type: "FRAUD_PATTERN",
        user_id: login.user_id,
        withdrawal_amount: withdrawal.amount
    )

Kafka Streams -- 80+ lines of stateful processing

java
// Kafka Streams requires manual state management for multi-step patterns
// You need: a custom Transformer, a KeyValueStore for pattern state,
// a state machine to track which step each user is on,
// timestamp checks to enforce windows, punctuators to expire
// partial patterns, and handling for out-of-order events.

events.transformValues(() -> new ValueTransformerWithKey<>() {
    private KeyValueStore<String, PatternState> stateStore;

    @Override
    public void init(ProcessorContext context) {
        this.stateStore = context.getStateStore("pattern-state");
        context.schedule(Duration.ofMinutes(1),
            PunctuationType.WALL_CLOCK_TIME, this::expireWindows);
    }

    @Override
    public Alert transform(String key, Event event) {
        // 50+ lines: manual state machine logic for
        // Login -> SmallTx -> SmallTx -> SmallTx -> LargeWithdrawal
        // with per-step timestamp checks and state transitions
    }
}, "pattern-state")
.filter((k, v) -> v != null)
.to("fraud-alerts");

ksqlDB -- Not directly expressible

ksqlDB has no syntax for multi-step temporal sequences. You could chain windowed joins (one per step), but there is no way to enforce ordering between intermediate results, and each additional step requires a new stream and join. ksqlDB has no sequence operator, no Kleene closure, and no pattern state machine.

The fundamental limitation is that Kafka Streams and ksqlDB operate on pairs of events (via joins) or sets of events (via windows), but have no native concept of ordered sequences of events with correlation constraints.

Kleene Closures: Variable-Length Patterns

One of the most powerful CEP features is Kleene closure -- detecting patterns with a variable number of repetitions. For example: "a rising price sequence of any length."

Varpulis -- native Kleene+ operator

vpl
pattern RisingPrices = SEQ(
    StockTick as first,
    StockTick+ where price > first.price as rising,
    StockTick where price > rising.price as last
) within 60s partition by symbol

This pattern matches sequences of 3 or more events where prices continuously rise. Varpulis's SASE+ engine finds all valid subsequences -- in benchmarks, this produces 99K+ matches from 100K events, compared to ~20K with greedy matching approaches.

Kafka Streams / ksqlDB have no equivalent. You would need to build a custom state machine that tracks open sequences, handles branching (multiple concurrent rising sequences), and manages memory for potentially exponential match combinations. This is exactly the problem that CEP engines are designed to solve.

Performance

Varpulis Throughput (Measured)

From our benchmark suite, Varpulis achieves the following throughput on a single node:

WorkloadThroughputMemory
Simple filter234K--745K events/sec10--54 MB
Windowed aggregation335K--1.46M events/sec51 MB
Sequence detection (A -> B)256K--2.15M events/sec36 MB
Kleene+ pattern97K--1.7M events/sec58 MB
Negation pattern2.2M events/sec~50 MB
Pattern forecasting93K events/sec~50 MB

Throughput varies by benchmark mode: lower numbers are end-to-end (including I/O, parsing); higher numbers are pure engine benchmarks (no I/O).

Kafka Streams Throughput

Kafka Streams throughput depends heavily on workload complexity:

  • Simple filter/transform: Comparable to Varpulis (limited by Kafka broker I/O, not computation).
  • Windowed aggregation: Well-optimized in Kafka Streams with RocksDB state.
  • Pattern detection: No native support. Custom implementations add significant overhead from state store reads/writes, serialization/deserialization, and manual state management.

The key insight: for simple operations (filter, aggregate, join), Kafka Streams performs well because these are its core design targets. For CEP-specific operations (sequences, Kleene closures, negation), Kafka Streams requires custom code that is both harder to write and slower to execute than a purpose-built pattern engine.

Resource Comparison

MetricVarpulisKafka StreamsksqlDB
Memory (simple workload)10 MB200--500 MB500 MB+
Memory (stateful pattern)50 MB500 MB -- 2 GB1 GB+
Cold start< 100 ms5--30 seconds10--60 seconds
Kafka dependencyOptionalRequiredRequired
State storageIn-memoryRocksDB (local disk)RocksDB
Horizontal scalingCluster modeKafka partition-basedksqlDB cluster

Feature Comparison

FeatureVarpulisKafka StreamsksqlDB
Event filtering.where()filter()WHERE
Windowed aggregation.window().aggregate()windowedBy()WINDOW TUMBLING/HOPPING
Stream joins.join()join() / leftJoin()JOIN ... WITHIN
Temporal sequencesNative -> operatorCustom transformerNot expressible
Kleene closuresNative A+, A*Not supportedNot supported
Negation patternsNative NOTNot supportedNot supported
Pattern forecastingBuilt-in PST engineNot supportedNot supported
Trend aggregationBuilt-in Hamlet engineNot supportedNot supported
Exactly-onceAt-least-onceExactly-onceExactly-once
State recoveryCheckpointingChangelog topics + RocksDBChangelog topics
ConnectorsKafka, MQTT, NATS, PostgreSQL CDCKafka (native)Kafka (native)
Horizontal scalingCoordinator + workersKafka partition countksqlDB cluster
Interactive queriesREST APIInteractive queriesPull queries

When to Use Kafka Streams / ksqlDB

Kafka Streams and ksqlDB are the better choice when you need:

  • General-purpose stream processing within the Kafka ecosystem: Filtering, transforming, enriching, and routing events between Kafka topics. This is their core strength.
  • Simple aggregations and joins: Counting, summing, and joining streams by key within time windows. Both tools handle this well with SQL-like syntax (ksqlDB) or a fluent API (Kafka Streams).
  • Exactly-once processing: Kafka Streams provides exactly-once semantics via idempotent producers and transactional consumers -- a strong guarantee for financial workloads.
  • Massive state with disk-backed storage: RocksDB state stores can handle terabytes of state, spilling to disk when RAM is insufficient.
  • No additional infrastructure: If you already run Kafka, Kafka Streams adds no new services (it is a library). ksqlDB adds a server but is tightly integrated.
  • Simple event routing and enrichment: Joining event streams with lookup tables, enriching events with metadata, filtering and routing to different topics.

When to Use Varpulis

Varpulis is the better choice when you need:

  • Complex temporal patterns: Detecting ordered sequences of events (A -> B -> C) with temporal constraints and correlation predicates. This is what Varpulis is built for.
  • Kleene closure matching: Finding variable-length repeated patterns (one or more rising prices, three or more failed logins). Kafka Streams has no equivalent.
  • Negation in patterns: Detecting that something did NOT happen within a sequence (login followed by NO password change within 1 hour, then large transfer).
  • Exhaustive matching: SASE+ finds all valid pattern instances, not just the first or longest. Critical for security monitoring and compliance.
  • Pattern forecasting: Predicting pattern completions before they happen, with 51 ns per prediction.
  • Low resource requirements: 10--50 MB baseline vs 200 MB+ for JVM-based tools. Important for edge, IoT, or high-density container deployments.
  • Kafka-independent operation: Varpulis works with MQTT, NATS, PostgreSQL CDC, or file-based input. Kafka is supported but not required.

Using Them Together

Varpulis and Kafka Streams are complementary tools, not mutually exclusive. The recommended architecture uses Kafka as the event backbone, with Varpulis consuming from and producing to Kafka topics:

[Sources] -> [Kafka] -> [Kafka Streams: ETL/Enrichment] -> [Kafka]
                                                              |
                                                              v
                                                         [Varpulis: CEP]
                                                              |
                                                              v
                                                         [Kafka: Alerts]
                                                              |
                                                     +--------+--------+
                                                     |                 |
                                              [Alert Service]  [Dashboard]

Varpulis Kafka Connector

Varpulis has a built-in Kafka connector. No Kafka Connect or external bridges required:

vpl
connector Broker = kafka(brokers: "kafka:9092", group_id: "varpulis-cep")

event Login:
    user_id: str
    ip_address: str

event Transaction:
    user_id: str
    amount: float
    status: str

stream Logins = Login.from(Broker, topic: "user-logins")
stream Transactions = Transaction.from(Broker, topic: "user-transactions")

stream FraudAlerts = Logins as login
    -> Transactions where user_id == login.user_id
                      and status == "failed" as failed_tx
    .within(10m)
    .emit(
        alert_type: "LOGIN_THEN_FAILED_TX",
        user_id: login.user_id,
        amount: failed_tx.amount
    )
    .to(Broker, topic: "fraud-alerts")

This pattern is common in production: Kafka Streams handles ETL and enrichment, Varpulis handles CEP pattern detection, and alerts flow back to Kafka for downstream consumers. Each tool handles the part of the pipeline where it excels.

Summary

Kafka Streams and ksqlDB are strong general-purpose stream processing tools that excel within the Kafka ecosystem. Varpulis is a specialized CEP engine that excels at detecting complex temporal patterns. For simple filtering, aggregation, and joins, all three tools work well. For multi-step sequences, Kleene closures, negation patterns, and pattern forecasting, Varpulis provides native capabilities that Kafka Streams and ksqlDB do not have. The two approaches are complementary: use Kafka for the event backbone and stream processing, and Varpulis for specialized pattern detection.

Varpulis - Next-generation streaming analytics engine