PixieDust is an open-source Python helper library that works as an add-on to Jupyter notebooks to improve the user experience of working with data. It also provides extra capabilities that fill a gap when the notebook is hosted on the cloud and the user has no access to configuration files. (Check out the project's github repository.)
Its current capabilities include:
- packageManager. Lets you install Apache Spark™ packages inside a Python notebook. This is something that you can't do today on hosted Jupyter notebooks, a gap that prevents developers from using a large number of Spark package add-ons. Note that you can install the packages or plain jars in your Notebook Python kernel without the need to modify a configuration file:
- Visualizations. One single API called
display()lets you visualize your Spark object in different ways: table, charts, maps, and so on. This module is designed to be extensible, providing an API that lets anyone easily contribute a new visualization plugin.
- Export. Easily download data to .csv, html, json, etc. locally on your laptop or into a variety of back-end data sources, like Cloudant, dashDB, GraphDB, Object Storage, and so on.
- Scala Bridge. Use Scala directly in your Python notebook. Variables are automatically transferred from Python to Scala and vice versa:
First define the variable in Python...
pythonVar = “pixiedust”
Then in Scala...
%%scala val demo = com.ibm.cds.spark.samples.StreamingTwitter demo.setConfig("twitter4j.oauth.consumerKey",”XXXXX") demo.setConfig("twitter4j.oauth.consumerSecret",”XXXXX") demo.setConfig("twitter4j.oauth.accessToken",”XXXXX") demo.setConfig("twitter4j.oauth.accessTokenSecret",”XXXXX") demo.setConfig("watson.tone.url","https://watsonplatform.net/tone-analyzer/api") demo.setConfig("watson.tone.password",”XXXXX") demo.setConfig("watson.tone.username",”XXXX”) import org.apache.spark.streaming._ demo.startTwitterStreaming(sc, Seconds(10)) println(pythonVar) val __fromScalaVar = “Hello from Scala”
And back to Python to use the Scala variable...
This sample visualization plugin can use d3 to show the different flight routes for each airport:
- Embed Applications. Encapsulate your analytics into compelling user interfaces better suited for line-of-business users:
Note: PixieDust currently works with Spark 1.6 and Python 2.7.
Note: PixieDust currently supports Spark DataFrames, Spark GraphFrames, and Pandas DataFrames, with more to come. If you can't wait, write your own today and contribute it back.