SASE+ Pattern Matching Guide
Advanced guide to SASE+ pattern matching in Varpulis for detecting complex event sequences.
Overview
SASE+ (Sequence Algebra for Stream Events) is a pattern matching algorithm for Complex Event Processing. Varpulis implements SASE+ based on the SIGMOD 2006 paper "High-Performance Complex Event Processing over Streams" by Wu, Diao, and Rizvi.
Key Features
- NFA-based matching: Efficient finite automaton execution
- Kleene closures: Match one or more (
+) or zero or more (*) events - Negation: Detect absence of events within time windows
- Logical operators: AND (any order), OR (either)
- Partition-by optimization: Independent matching per partition key
- Temporal constraints: Patterns must complete within time bounds
Pattern Types
Sequence (->)
Events must occur in the specified order.
// A followed by B
pattern Simple = A -> B
// A followed by B followed by C
pattern ThreeStep = A -> B -> C
// With conditions
pattern Conditional =
Login[status == "success"] -> Action -> LogoutNFA Structure:
[Start] -> [Match A] -> [Match B] -> [Match C] -> [Accept]Kleene Plus (+)
One or more occurrences of a pattern.
// One or more failed logins
pattern RepeatedFails = LoginFailed+
// Failed logins followed by success
pattern BruteForce = LoginFailed+ -> LoginSuccess within 10mNFA Structure:
[Start] -> [Match Event] -> [Accept]
^ |
+------+ (self-loop)Implementation Notes:
- Uses a stack to track Kleene state
- Each match pushes to the stack
- Non-greedy by default (first complete match)
Kleene Star (*)
Zero or more occurrences.
// Start, any events, then end
pattern Session = SessionStart -> Activity* -> SessionEnd
// Optional middleware
pattern Request = ClientRequest -> Middleware* -> ServerResponseKey Difference from +:
A*accepts immediately (zero matches allowed)A+requires at least one match
Negation (NOT)
Detect the absence of an event within a time window.
// Order not confirmed within 1 hour
pattern UnconfirmedOrder =
OrderPlaced -> NOT(OrderConfirmed) within 1h
// Payment started but never completed
pattern AbandonedPayment =
PaymentStart -> NOT(PaymentComplete) within 5mHow Negation Works:
- The NFA enters a "negation state" after matching the preceding pattern
- A timeout is set based on the
withinclause - If the negated event occurs before timeout, the pattern fails
- If timeout expires without the event, the pattern succeeds
Implementation: See NegationInfo and StateType::Negation in sase.rs:108
AND (Any Order)
Both patterns must match, but order doesn't matter.
// Both documents required (any order)
pattern BothDocs =
AND(DocumentA, DocumentB) within 1h
// Application complete when both submitted
pattern ApplicationComplete =
AND(FormSubmitted, PaymentReceived) within 24hImplementation:
- Creates an AND state that tracks which branches have matched
- Accepts when all branches are satisfied
- See
AndConfigandStateType::Andinsase.rs
OR (Either)
Either pattern matches.
// Accept either payment method
pattern PaymentReceived =
OR(CreditCard, BankTransfer)
// Multiple termination conditions
pattern SessionEnd =
OR(Logout, Timeout, ForceDisconnect)Predicates
Field Comparisons
pattern HighValue =
Transaction[amount > 10000]
pattern SpecificUser =
Login[user_id == "admin" and ip != "10.0.0.1"]Operators: ==, !=, <, <=, >, >=
Reference Comparisons
Reference fields from earlier events using aliases:
pattern SameUser =
Login as login
-> Activity[user_id == login.user_id] as activity
-> Logout[user_id == login.user_id]
within 1hCompound Predicates
pattern Complex =
Event[(value > 100 and status == "active") or priority == "high"]Temporal Constraints
Pattern-Level Constraints
Apply to the entire pattern:
pattern MustBeQuick =
Start -> Middle -> End
within 5mPer-Transition Constraints
Different timeouts for different steps:
pattern VariableTiming =
FastEvent
-> SlowEvent within 1h
-> FinalEvent within 5mTimeout Handling
When a pattern times out:
- Partial matches are discarded
- Resources are freed
- No match is emitted
Partition Strategies
Partition-By Attribute
Process patterns independently per key:
pattern PerUser =
Login+ -> Logout
within 1h
partition by user_idBenefits:
- Parallel processing across partitions
- Memory isolation (one partition doesn't affect others)
- Natural grouping for user/device/session patterns
Memory Impact:
- Each partition maintains its own NFA state
- N partitions = N × base memory
Without Partitioning
Global pattern matching across all events:
pattern GlobalPattern =
SystemAlert -> AdminResponse
within 30mEvent Selection Strategies
SASE+ supports different strategies for selecting events when multiple matches are possible.
Skip-Till-Any-Match (Default)
Match as many patterns as possible, potentially with overlapping events.
// Given events: A1, B1, A2, B2
// Pattern: A -> B
// Matches: (A1, B1), (A1, B2), (A2, B2)Skip-Till-Next-Match
Each event participates in at most one match.
// Given events: A1, B1, A2, B2
// Pattern: A -> B
// Matches: (A1, B1), (A2, B2)Strict Contiguity
Events must be immediately adjacent (no skipping).
// Given events: A1, C1, B1
// Pattern: A -> B (strict)
// Matches: none (C1 breaks contiguity)Debugging Patterns
Verbose Output
Use --verbose with simulation to see pattern state:
varpulis simulate -p rules.vpl -e events.evt --verbosePattern Tracing
Enable trace logging:
RUST_LOG=varpulis_runtime::sase=trace varpulis simulate ...Common Issues
1. Pattern Never Matches
Possible causes:
- Event types don't match exactly (case-sensitive)
- Predicates are too restrictive
- Timeout is too short
Debug steps:
// Remove predicates to test basic matching
pattern Debug1 = A -> B within 1h
// Add predicates back one at a time
pattern Debug2 = A[field > 0] -> B within 1h2. Too Many Matches
Possible causes:
- Missing predicates to constrain matches
- Skip-till-any-match creating overlapping matches
- Missing
partition bycausing cross-user matches
Solution:
// Add partition to isolate matches
pattern Isolated =
Login -> Action -> Logout
within 1h
partition by user_id3. Memory Growth
Possible causes:
- Unbounded Kleene closure (
+or*) - Too many partitions
- Timeout too long
Solutions:
// Limit Kleene matches
pattern Limited =
Event+ within 5m // Natural bound via timeout
// Reduce partition cardinality
partition by category // Use low-cardinality field4. Negation Not Triggering
Possible causes:
- Event type mismatch in NOT clause
- Timeout too short (event arrives just after)
- Negated event arriving before pattern start
Debug:
// Ensure event types match exactly
pattern Debug =
Start -> NOT(Exactly_This_Type) within 10mPerformance Considerations
NFA Complexity
| Pattern | States | Transitions |
|---|---|---|
A -> B | 3 | 2 |
A -> B -> C | 4 | 3 |
A+ | 2 | 2 (with self-loop) |
AND(A, B) | 4 | 4 |
A -> B+ -> C | 4 | 4 |
Memory Usage
Memory = O(partitions × active_runs × events_per_run)Recommendations:
- Use short timeouts to limit active runs
- Partition by low-cardinality fields
- Avoid unbounded Kleene without timeout
Throughput
Typical throughput on modern hardware:
- Simple patterns: 500K+ events/sec
- Complex patterns with Kleene: 100K+ events/sec
- Patterns with ZDD optimization: 200K+ events/sec
ZDD Optimization
Varpulis uses Zero-suppressed Decision Diagrams (ZDD) to represent Kleene capture combinations during SASE+ pattern matching. When a Kleene pattern like A -> B+ -> C matches many B events, ZDD compactly encodes all possible subsets — e.g., 100 matching B events produce ~100 ZDD nodes instead of 2^100 explicit combinations.
Benefits
- Exponential compression of Kleene match combinations
- Efficient subset/superset operations for match enumeration
- Reduced memory for patterns with many Kleene captures
When ZDD Helps
- Kleene+ patterns with many matching events
- Patterns with overlapping match candidates
- High-throughput streams where match combination count would explode
Implementation
See varpulis-zdd crate for ZDD data structures and sase.rs for integration with the pattern matcher.
Note: ZDD is used for pattern matching (Kleene captures). For trend aggregation over patterns, Varpulis uses the Hamlet engine — see trend aggregation.
Examples
Fraud Detection
// Multiple small transactions followed by large withdrawal
pattern SmurfingPattern =
Transaction[amount < 1000]+
-> Transaction[amount > 9000]
within 1h
partition by account_id
stream FraudAlerts = Transaction
pattern SmurfingPattern
emit alert("Smurfing", "Account {account_id} suspicious pattern")Session Analysis
// Complete user session
pattern UserSession =
Login as start
-> PageView*
-> OR(Logout, SessionTimeout) as end
within 24h
partition by user_id
stream Sessions = *
pattern UserSession
emit log("Session: {start.user_id}, duration: {end.timestamp - start.timestamp}")SLA Monitoring
// Request without response within SLA
pattern SLABreach =
Request as req
-> NOT(Response[request_id == req.id]) within 5s
stream SLAAlerts = *
pattern SLABreach
emit alert("SLABreach", "Request {req.id} not responded within SLA")IoT Device Monitoring
// Device going offline
pattern DeviceOffline =
Heartbeat as last_beat
-> NOT(Heartbeat[device_id == last_beat.device_id]) within 1m
partition by device_id
stream OfflineAlerts = Heartbeat
pattern DeviceOffline
emit alert("DeviceOffline", "Device {last_beat.device_id} stopped responding")Order Processing
// Order completion flow
pattern OrderComplete =
OrderPlaced as order
-> PaymentReceived[order_id == order.id]
-> AND(
ItemsPicked[order_id == order.id],
AddressVerified[order_id == order.id]
)
-> Shipped[order_id == order.id]
within 7d
partition by order_id
stream CompletedOrders = *
pattern OrderComplete
emit log("Order {order.id} completed successfully")Trend Aggregation Mode
For patterns where you need statistics over trends (COUNT, SUM, AVG) rather than individual matches, use .trend_aggregate() instead of the default detection mode:
# Instead of detecting each rising price pattern individually...
# Count how many rising trends exist (without enumerating them)
stream TrendCount = StockTick as first
-> all StockTick where price > first.price as rising
.within(60s)
.partition_by(symbol)
.trend_aggregate(count: count_trends())
.emit(symbol: first.symbol, trends: count)This uses the Hamlet engine (SIGMOD 2021) for O(n) aggregation instead of explicit trend construction, which can be exponential. Multiple queries sharing Kleene sub-patterns are automatically optimized via shared aggregation.
See Trend Aggregation Reference and Trend Aggregation Tutorial for details.
Pattern Forecasting
Use .forecast() to predict whether a partially-matched pattern will complete, and when. This uses Prediction Suffix Trees (PST) combined with the SASE NFA to form a Pattern Markov Chain (PMC):
stream FraudForecast = Transaction as t1
-> Transaction as t2 where t2.amount > t1.amount * 5
-> Transaction as t3 where t3.location != t1.location
.within(5m)
.forecast(confidence: 0.7, horizon: 2m, warmup: 500)
.where(forecast_probability > 0.8)
.emit(
probability: forecast_probability,
expected_time: forecast_time
)Parameters: confidence (min probability, default 0.5), horizon (forecast window), warmup (learning period, default 100), max_depth (PST depth, default 5)
Built-in variables (available after .forecast()): forecast_probability, forecast_time, forecast_state, forecast_context_depth
The PST learns online from the event stream — no pre-training required. See Forecasting Tutorial and Forecasting Architecture for details.
See Also
- Language Tutorial - VPL basics
- Windows & Aggregations - Windowed pattern matching
- Trend Aggregation -
.trend_aggregate()reference - Forecasting Tutorial - PST-based pattern forecasting
- Forecasting Architecture - PST/PMC design
- Troubleshooting Guide - Pattern debugging tips
- SIGMOD 2006 Paper - Original SASE+ research