Author Profile

Posts by Benjamin Herta

Benjamin Herta
Benjamin Herta
2 years ago

Using Spark's cache for correctness, not just performance

RDDs are immutable. Right? This is one of the first things we learn when we read about Apache Spark™. Here’s a little program which appears to contradict this. This Scala program creates a small RDD, performs a few simple transformations on it, and then calls RDD.count() on the same RDD twice. The values of the two calls to count are compared with an assert, and at first glance, we would think tha... Read More

Benjamin Herta
Benjamin Herta
2 years ago

From the Driver to the Executors

From the Driver to the Executors

I have worked with a number of people who are new to Apache Spark™, and have an existing program that they want to port to it.  Spark supports a number of programming languages, is relatively easy to... Read More