data products

Datapalooza! Real World Data Products

Before I get started, I highly recommend you go now to and buy a ticket. It may sell out before you finish reading this post.

Now let's begin.

Back in 2008, Apache Hadoop wasn’t a household name, the Business Intelligence group held all of the keys to the data. Statistics was left to research projects to emerge from time to time to provide insight. In 2010, we were suddenly given this free gift of a new malleable distributed environment to sculpt data into different shapes into what we now refer to as Data Products. Data products are the result of applying transformations, applied mathematics and domain specific knowledge to data to turn it into a usable product for specific outcomes.  A common example is the PYMK (people you may know) data product that helped Linkedin grow its install base. It is not, however, an insight that helped a decision maker nor was it the Linkedin application itself. This is the stuff dreams are made of. Its what every business large and small wants to uncover for their business and many did. Silicon Valley startups like Netflix, Facebook, Twitter, Nest, and even more traditional companies like Peugeot, ConstantContact, US Open, NASA are all disrupting their industries with data products.

At a large big data and analytics conference last week, I was struck by the idea that we are no longer focused on making data products effective. Most of the talks were by vendors selling technology or researchers talking about deeply technical capabilities of all the emerging technology that is flooding the market.  I, for one, am less interested in what the latest technology is and more interested in the outcomes of what people are achieving with this technology. For this reason, I am pleased to announce a first of its kind music concert combined with data + design workshops we at the Spark Technology Center call Datapalooza.


Not another conference where people speak at you. Come build a data product.

From November 10th to 12th, the Spark Technology Center in San Francisco hosts the first-ever Datapalooza — a deep-dive with industry leaders from AMPLab, Galvanize, Typesafe, Silicon Valley Data Science, IBM Watson, Spare5, Declara and numerous leaders who are the leaders of making data products. Take your data skills to the next level with hands-on experience and one-on-one coaching to make a data product in only three days. We have three main tracks we’ll support throughout the course of the entire three day event.

Data Engineering

Harmonize the Instruments

These courses aim to teach a suite of data engineering skills in the areas of data wrangling, data munging, and data pipelines. Our instructors will covers topics such as Twitter analyzing with Apache Spark and Watson, Building Word2Vec models, Natural Language Processing and more.

Data Science

Compose the Music

Build foundational knowledge around data variables, models and scoring methods with a compilation of courses focused around hot topics such as Recommendation Algorithms, Machine Learning Capabilities, Full-Text & Geospatial Search. These courses will show you how these techniques can be used to create beautifully designed data products with examples like RedRock.

Data App Development

Produce the Concert

What makes Datapalooza unique? Our instructors will tie together sessions from our Data Engineering and Data Science courses to help you bridge the gap between analyze, build, and deploy. These courses are focused around application frameworks, product launches, storytelling and data visualization. Featured data products like CalTrain Rider are at the core of our curriculum.

Oh, and did I mention we’ll have the band Big Data headline our event.

How awesome is that!

At Datapalooza, you’ll combine analytic and innovative skills to attack real world challenges using natural language processing, machine learning, cognitive computing, stream computing, distributed processing, design thinking, reactive platforms and many more key skills to make your product a success.

San Francisco is the kickoff event to a world tour that will take Datapalooza on a world tour to a city near you.

Join the movement

PS, if you share this post, I’ll inMail you a discount code to save 20% on the registration fee.


You Might Also Enjoy

Kevin Bates
Kevin Bates
9 months ago

Limit Notebook Resource Consumption by Culling Kernels

There’s no denying that data analytics is the next frontier on the computational landscape. Companies are scrambling to establish teams of data scientists to better understand their clientele and how best to evolve product solutions to the ebb and flow of today’s business ecosystem. With Apache Hadoop and Apache Spark entrenched as the analytic engine and coupled with a trial-and-error model to... Read More

Gidon Gershinsky
Gidon Gershinsky
10 months ago

How Alluxio is Accelerating Apache Spark Workloads

Alluxio is fast virtual storage for Big Data. Formerly known as Tachyon, it’s an open-source memory-centric virtual distributed storage system (yes, all that!), offering data access at memory speed and persistence to a reliable storage. This technology accelerates analytic workloads in certain scenarios, but doesn’t offer any performance benefits in other scenarios. The purpose of this blog is to... Read More