Reserve Locks, Apache Spark Performance

Tamiya Onodera and Tatsuhiro Chiba

Let us first talk about Java locks.  As you know, Java provides built-in support for multithreaded programming, and class libraries include many synchronized methods. As a result, locks (or monitors) are far more heavily exercised in Java than any other programming language. The early Java virtual Machines (JVMs) all suffered from high overhead due to locks, tempting many in industry and academia to start efforts to reduce the overhead.

All followed the golden principle of optimization – optimize the common case. If you look at the functions to acquire and release a lock, the uncontended path is far more frequently taken than the contended path. IBM researchers invented bimodal locks to optimize the uncontended path [1, 2]. When a lock is not contended, a thread acquires the lock with one atomic instruction and releases it without any atomic instruction.

IBM researchers then invented an exotic alternative, called lock reservation, observing that a lock tends to be (repeatedly) acquired and released by one thread for its lifetime [3]. The assumption obviously holds if a Java program is single threaded.  It also holds when a thread invokes synchronized methods against an object created by the thread and not escaped from the thread.

In lock reservation, when a lock is acquired by a thread for the first time, a lock is reserved for the thread. In the reserved mode, the thread acquires and releases the lock without any atomic instruction. If a second thread attempts to acquire the lock, the system cancels the reservation, falling back to the bimodal locking algorithm.

Lock reservation is quite effective for some applications, but unfortunately not for the J2EE (Java 2 Enterprise Edition) server, thus not enabled by default.

Let us now turn our attention to Apache Spark. Spark programs are data parallel over Resilient Distributed Datasets (RDDs), which are read-only and partitioned collections of records. Spark worker threads process their respective RDD partitions. That is, partitions are basically not shared among the worker threads.  This sounds like an ideal situation for lock reservation, depending on how frequently they end up with invoking synchronized methods against objects in and derived from RDD partitions.

To verify this, we ran TPC-H with Spark SQL without and with lock reservation. TPC-H is a decision support benchmark, consisting of a suite of business oriented ad-hoc queries and concurrent data modifications [4].  We ran all the queries for 100GB dataset on Spark 1.4.1 using IBM J9 VM Version 8 SR1 FP10 on a 24-core POWER8 machine operating at 3.3GHz and with 1TB RAM.

We made tuning/optimization efforts in multiple layers, such as rewriting queries for better query plans, controlling NUMA policy, configuring JVMs and setting JVM options. While we will not go into the details here, we deployed 8 worker instances (Java virtual machines) on the 24-core machine, each with 6 worker threads and using 24GB Java heap (196GB for Java heap in total, meaning we do not fully utilize the 1TB RAM in this experiment).

The figure shows the results for the 22 queries without and with lock reservation. We enable lock reservation with -XlockReservation for IBM J9 Java Virtual Machines. We ran each query once for warm up and then five times for measurement, taking the best result. The bars show execution times (to the left, vertical axis), while the line plot indicates reduction ratio in execution time by lock reservation (to the right, vertical axis). As the figure shows, lock reservation provides improvements of 10% or better for 7 of the 22 queries, with up to 18% for Query 5.  It only degrades performance in one case, Query 19 (5%).

LRAlthough we are still performing detailed analysis on the improvements observed, we can say that it is certainly worth trying to enable lock reservation for Spark applications!

[1] David F. Bacon, Ravi Konuru, Chet Murthy, and Mauricio Serrano, “Thin Locks: Featherweight Synchronization for Java”, Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, pp.258–268, 1998.
[2] Tamiya Onodera, Kiyokuni Kawachiya, “A study of locking objects with bimodal fields”, Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pp.223–237, 1999
[3] Kiyokuni Kawachiya, Akira Koseki, Tamiya Onodera, “Lock reservation: Java locks can mostly do without atomic operations”, Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pp.130–141, 2002.


You Might Also Enjoy

Kevin Bates
Kevin Bates
9 months ago

Limit Notebook Resource Consumption by Culling Kernels

There’s no denying that data analytics is the next frontier on the computational landscape. Companies are scrambling to establish teams of data scientists to better understand their clientele and how best to evolve product solutions to the ebb and flow of today’s business ecosystem. With Apache Hadoop and Apache Spark entrenched as the analytic engine and coupled with a trial-and-error model to... Read More

Gidon Gershinsky
Gidon Gershinsky
10 months ago

How Alluxio is Accelerating Apache Spark Workloads

Alluxio is fast virtual storage for Big Data. Formerly known as Tachyon, it’s an open-source memory-centric virtual distributed storage system (yes, all that!), offering data access at memory speed and persistence to a reliable storage. This technology accelerates analytic workloads in certain scenarios, but doesn’t offer any performance benefits in other scenarios. The purpose of this blog is to... Read More