Skip to main content
Arun Lakshman Ravichandran
Software Engineer, AWS
View all authors

Inside Flink's Control Plane: How Apache Pekko Powers the RPC Layer

· 21 min read
Arun Lakshman Ravichandran
Software Engineer, AWS

Flink's distributed components must communicate constantly. TaskManagers report task state changes to JobMaster. JobMaster requests slots from ResourceManager. Dispatchers serve REST API queries about job status. All these components access shared state, particularly the ExecutionGraph. Traditional multi-threading with locks would create race conditions, deadlocks, and unmaintainable code. Flink solves this by adopting the Actor Model through the Akka/Pekko framework. Each component processes all requests on a single thread through a FIFO mailbox. This design eliminates concurrency bugs by architecture, not by locks.

The Universal Primitive - How CAS Became the Foundation of Concurrent Programming

· 31 min read
Arun Lakshman Ravichandran
Software Engineer, AWS

This blog post is inspired by the first 6 chapters of The Art of Multiprocessor Programming by Maurice Herlihy and Nir Shavit.

Imagine building a distributed counter that must handle millions of updates per second across dozens of threads. Traditional locks serialize access, creating bottlenecks. You need something better: a way for threads to coordinate without blocking, without deadlocks, without the performance collapse that comes with contention. This isn't just a performance optimization problem; it's a fundamental question about what synchronization primitives are actually necessary. Can we build wait-free concurrent data structures? Which hardware instructions must processors provide? The answer, discovered through decades of theoretical work, reveals that one primitive, Compare-And-Swap (CAS), is universal.