apache spark

Welcome to our blog!

My name is Fred Reiss, and I work at IBM’s Spark Technology Center. The STC is a new part of IBM, located in downtown San Francisco. Our mission is to serve as an interface between IBM and the Apache Spark community. We will be contributing directly to the open-source Apache Spark project, we will help IBM leverage Spark across our product lines, and we will engage many of IBM’s customers and customer-facing organizations with Apache Spark. The STC is slated to grow to hundreds of data practitioners, developers, designers, and business-minded professionals over the next few months.

At the core of our work at the STC is our commitment to making IBM a major contributor to the Apache Spark project. An important part of our contribution is being good citizens of the open-source community. We already have over a dozen developers fixing bugs and improving performance — working full time on reducing Spark’s backlog. In the coming months, we will be contributing major features and components to Apache Spark; among the most significant will be our machine learning contribution. We will be bringing advanced IBM technology directly into open source.

The Spark Technology Center provides a single point of contact for IBM’s many product groups for all things Spark. We accelerate IBM’s adoption of new Spark technologies as they are released. For example, we helped bring Spark 1.3.1 to the IBM Open Platform with Apache Hadoop within days of its Apache release. We’re also responsible for making sure that new releases of Spark continue to work well with IBM’s entire hardware and software product line. And we are building a pool of deep Spark expertise to help IBMers design solutions and troubleshoot difficult problems related to Spark.

The third part of our mission at the Spark Technology Center is outreach, both inside and outside IBM. We bring IBMers up to speed on this exciting technology as quickly as possible, whether it’s client representatives learning how to advise customers on leveraging Spark, or systems programmers building the next generation of our Spark-enabled cloud. We educate the broader enterprise community about the capabilities of Apache Spark technology, and we create training materials that focus on solving end-to-end business problems using Spark. And we create open and free assets to enable Spark adoption throughout the community.

We will be posting lots more about Spark and the STC on this blog in the weeks to come — including demos of business applications built on Spark, the role of data design in creating usable, intuitive applications, and contributions from data scientists from IBM and the wider open source community. Please comment and subscribe — and remember, we are hiring for a wide variety of technical and design roles. If you’re interested in joining our team, take a look at our open job postings!

Newsletter

You Might Also Enjoy

Kevin Bates
Kevin Bates
2 months ago

Limit Notebook Resource Consumption by Culling Kernels

There’s no denying that data analytics is the next frontier on the computational landscape. Companies are scrambling to establish teams of data scientists to better understand their clientele and how best to evolve product solutions to the ebb and flow of today’s business ecosystem. With Apache Hadoop and Apache Spark entrenched as the analytic engine and coupled with a trial-and-error model to... Read More

Gidon Gershinsky
Gidon Gershinsky
4 months ago

How Alluxio is Accelerating Apache Spark Workloads

Alluxio is fast virtual storage for Big Data. Formerly known as Tachyon, it’s an open-source memory-centric virtual distributed storage system (yes, all that!), offering data access at memory speed and persistence to a reliable storage. This technology accelerates analytic workloads in certain scenarios, but doesn’t offer any performance benefits in other scenarios. The purpose of this blog is to... Read More