During the last year we have been using Apache Spark to perform analytics on large volumes of sensor data. The Spark applications we have developed evolve around cleansing, filtering, and ingesting historical data. These applications need to be executed on a daily basis, therefore, it was essential for us to understand Spark resource utilization, and thus, provide customers with optimal time-to-insight and cost control throughbetter calculated infrastructure needs.
To date, the conventional way of identifying bottlenecks in a Spark application would include the inspection of the Spark Web UI (either throughout the duration of the jobs/stage execution and/or postmortem) which is limited to job-level application metrics reported by the built-in metric system of Spark (e.g. stage completion time). The current version of the Spark metric system supports recording the values of the metrics in local CSV files and also integration with external metrics systems like Ganglia.
We found it cumbersome to manually consume and efficiently inspect these CSV files generated at the Spark worker nodes. Although using an external monitoring system like Ganglia would automate this process, we were still plagued with the inability to derive temporal associations between system-level metrics (e.g. CPU utilization) and job-level metrics (e.g. job or stage ID) as reported by Spark. For instance, we were not able to trace back the root cause of a peak in HDFS Reads or CPU usage to the code in our Spark application causing the bottleneck.
To overcome these limitations we developed SparkOscope. Taking advantage of the job-level information available through the existing Spark Web UI and to minimize source-code pollution, we use the existing Spark Web UI to monitor and visualize job-level metrics of a Spark application (e.g. completion time). More importantly, we extend the Web UI with a palette of system-level metrics of the server/VM/container that each of the Spark job’s executor ran on. Using SparkOScope, the user can navigate to any completed application and identifyapplication-logic bottlenecks by inspecting the various plots providing in-depth timeseriesforall relevant system-level metrics related to the Spark executors, while also easily associating them with stages, jobs and even source code lines incurring the bottleneck. On a related use, system-level metrics are essential for us when it comes to careful infrastructure planning for the set of Spark applications we know our customers will be running on a daily basis.In the backend, SparkOScope leverages the open-source Hyperic Sigar library to capture OS-level metrics at the nodes where executors are launched. In the screenshot below, onecan see that the user is able to select which metric to visualize, among the new family of OS-level metrics, for all executors launched by the Spark job under scrutiny.
We used our tool to evaluate the RAM requirements on one of our most frequent workloads. Initially, we assigned each Spark executor 80GB RAM and inspected the metrics of RAM Utilization and KBytes Written on the disk.
It became clear to us that Spark utilizes all the available RAM given to the executors and once there is no more available it resolves to heavy disk writing of intermediate data. However, in the plots produced we could see that the nodes have more RAM available which can be assigned to the executors, thus reducing the need for heavy disk writes and finally faster completion time. With our tool, the user can potentially plot additional metrics of interest like the volume of HDFS Read bytes.
The Sparkoscope project is open source. Take a look.
This contribution is part of an ongoing effort of the High Performance Systems team, IBM Research Ireland to better realize Spark’s performance bottlenecks by utilizing visualization of the key metrics. That way we can be in a better position to extract useful insights that will drive our future work. We are exploring various possible avenues to make this available to the Spark user community.