Open Sourcing Design Through Apache Zeppelin


The Spark Technology Center (STC) focuses on making data accessible to everyone. One way we can get there is by improving the very tools that Data Scientists and Data Engineers use. Apache Zeppelin is an open source notebook with a few key features that make the tool stand out. It provides built-in Apache Spark integration and supports multiple programming languages within a single notebook. Still incubating under Apache Software Foundation, Zeppelin already has a devoted community of contributors, developers, and adopters.

Apache Zeppelin naturally sets the stage for a more inclusive experience in data science. Moon Soo Lee and his team at NFLabs have been the driving force behind the Zeppelin project, so the STC Design team decided to collaborate closely with them and forge a shared vision for the Zeppelin notebook. We invited Moon and his colleagues Chae-Sung, Mina, and Ah Young to our Design Studio at STC in San Francisco. We spent the day sharing recent progress on Zeppelin and discussing its future. Having face time with Moon and his team helped accelerate our endeavor.

Community and Pluggability

One of the key takeaways from our discussion was that Zeppelin, being an Apache incubator project, is deeply rooted in Apache’s collaborative, consensus-based development process. Community support is fundamental for contributions to take effect. Zeppelin publishes its roadmap for upcoming releases ahead of time, so the community knows what’s coming and what it can contribute to. Some of the notable focus areas include R language support, translation, pluggable visualizations, and a repository of pluggable modules. Zeppelin already has a pluggable back-end today, allowing any language or data processing back-end to be plugged into Zeppelin. Moon shared that, in his experience, “if there’s pluggability, then people will use it.”

With the community and pluggability in mind, our common goal is to improve the user experience of Zeppelin and make it customizable and extensible. The designers shared design prototypes that addressed pain points identified through observations of Data Scientists using Zeppelin. Moon and his team liked the proposed UX improvements, which generally aligned with their plans. The user community can already appreciate one such improvement: the ability to directly point and click to insert a new paragraph within a notebook. Novel ideas were also well received. We are looking to provide an open source design language that will not only serve as a guide to contributors within the open source community, but also as a template that can be easily customized and extended for application with other products. We foresee the open source community having a larger stake in the UX contributions and providing valuable feedback. It is especially vital that the community feels involved and on board with changes to the user experience; they are the users after all.

Open Sourcing Design

Jeremy Anderson - Design Lead
“Design is crucial to the success of products and ideas. Sharing our process with the open source community and offering resources and tools that will help them improve the experiences around the projects they are involved in only seems natural.”

Spearheading the project at the STC Design Studio is Lead Designer Jeremy Anderson. In just a few months, Jeremy has assembled a close-knit team of Designers, Developers, and User Researchers for this initiative. He wants to invest in a longstanding relationship with the open source community through meaningful contributions. He explains, “Design is crucial to the success of products and ideas. Sharing our process with the open source community and offering resources and tools that will help them improve the experiences around the projects they are involved in,only seems natural.”

Making Data Available to Everyone

We want to make Zeppelin accessible to everyone, not just data scientists and engineers. Our ultimate goal is to make data and insights available to everyone. As we work on Zeppelin and improve the users’ experience, we are taking a bold step in this direction.


You Might Also Enjoy

Kevin Bates
Kevin Bates
9 months ago

Limit Notebook Resource Consumption by Culling Kernels

There’s no denying that data analytics is the next frontier on the computational landscape. Companies are scrambling to establish teams of data scientists to better understand their clientele and how best to evolve product solutions to the ebb and flow of today’s business ecosystem. With Apache Hadoop and Apache Spark entrenched as the analytic engine and coupled with a trial-and-error model to... Read More

Gidon Gershinsky
Gidon Gershinsky
10 months ago

How Alluxio is Accelerating Apache Spark Workloads

Alluxio is fast virtual storage for Big Data. Formerly known as Tachyon, it’s an open-source memory-centric virtual distributed storage system (yes, all that!), offering data access at memory speed and persistence to a reliable storage. This technology accelerates analytic workloads in certain scenarios, but doesn’t offer any performance benefits in other scenarios. The purpose of this blog is to... Read More