Red Rock.

RedRock is an alpha app that lets the user act on data driven insights discovered from Twitter. Powered by IBM Analytics running on Spark, it finds patterns in user tweets to see influential individuals, related topics of interest, and where in the world the conversation is taking place. In the hands of a marketer, this tool could become an extremely powerful way to connect with your target demographic or find emerging markets you might not have thought to look for. In the hands of someone at the increasingly overwhelmingly SXSW, this tool would help filter weather, private corporate events that aren't annnounced, surprise artists, pop-up studios, even food.

The data science algorithms used in RedRock are Word23Vec and K-means. The Word2Vec algorithm is based on deep neural networks and it assigns a numerical vector to each of the words in the Twitter data. Once a feature matrix if formed with the Word2Vec algorithm, K-means was applied to do it to cluster words. These two algorithms are used to build screens in the app.

SETI + Spark Explore Space.

The SETI Institute’s mission is to explore, understand and explain the origin and nature of life in the universe. The IBM jStart team has joined with the SETI Institute to develop a Spark application to analyze 100 million radio events detected over several years. These events look for faint signals, which may betray the presence of intelligent extraterrestrial life. The complex nature of the data demands sophisticated mathematical models to find faint signals, and machine-learning algorithms to separate terrestrial interference from signals truly of interest.

This application uses the iPython Notebook service on Apache Spark, deployed on IBM Cloud Data Services (CDS). Data is loaded into the CDS object store in a format that facilitates signal processing and experimentation. Data scientists from NASA Space Science Division, Penn State, and IBM Research build and refine analytic methodologies using iPython notebooks. These notebooks create a self-documenting repository of signal processing research that is collaboratively searched, referenced, and improved.

Read more about the data collected from SETI here: Here

Seti

Red Rock.

RedRock is an alpha app that lets the user act on data driven insights discovered from Twitter. Powered by IBM Analytics running on Spark, it finds patterns in user tweets to see influential individuals, related topics of interest, and where in the world the conversation is taking place. In the hands of a marketer, this tool could become an extremely powerful way to connect with your target demographic or find emerging markets you might not have thought to look for. In the hands of someone at the increasingly overwhelmingly SXSW, this tool would help filter weather, private corporate events that aren't annnounced, surprise artists, pop-up studios, even food.

The data science algorithms used in RedRock are Word23Vec and K-means. The Word2Vec algorithm is based on deep neural networks and it assigns a numerical vector to each of the words in the Twitter data. Once a feature matrix if formed with the Word2Vec algorithm, K-means was applied to do it to cluster words. These two algorithms are used to build screens in the app.

SETI + Spark Explore Space.

The SETI Institute’s mission is to explore, understand and explain the origin and nature of life in the universe. The IBM jStart team has joined with the SETI Institute to develop a Spark application to analyze 100 million radio events detected over several years. These events look for faint signals, which may betray the presence of intelligent extraterrestrial life. The complex nature of the data demands sophisticated mathematical models to find faint signals, and machine-learning algorithms to separate terrestrial interference from signals truly of interest.

This application uses the iPython Notebook service on Apache Spark, deployed on IBM Cloud Data Services (CDS). Data is loaded into the CDS object store in a format that facilitates signal processing and experimentation. Data scientists from NASA Space Science Division, Penn State, and IBM Research build and refine analytic methodologies using iPython notebooks. These notebooks create a self-documenting repository of signal processing research that is collaboratively searched, referenced, and improved.

Read more about the data collected from SETI here: Here

Ask Spark.

Ask Spark looks like a simple browser search engine—but beneath its user-friendly surface, Spark is running powerful and complex algorithms on massive amounts of data to generate sentiment analysis, in real time.

You enter a Twitter hashtag of your choice, and AskSpark shows the live Twitter feed and the general public opinion of that hashtag through a sentiment meter: you can easily find trends in data, in real-time. Behind the scenes, Spark Streaming is running K-means, DecisionTree, and linear regression machine-learning algorithms against live data to create dynamic visualizations.

Because of AskSpark’s responsive browser-based design, you can run multiple search queries side by side, and view the app in your mobile device.

To stay up to date with the process of the project, visit http://askspark.com/

Bluemix Genomics.

As scientists work to understand how genetics contribute to complex disease, they are processing and analyzing massive amounts of genome data. The typical person’s genome generates about 200 GB of raw data, making genome sequencing and data collection both a scientific challenge and computational one. Over the past four years, the end-to-end cost of sequencing one whole genome has dropped from $20,000 to $5,000, and the price continues to go down. Lower cost makes it easier for doctors, researchers, and biologists to prescribe genome sequencing and secondary analysis. What’s not as easy: access to faster and cheaper data storage, required for gaining insight from this massive amount of data.

Bluemix Genomics runs on IBM Bluemix and Spark™: a cost-effective and auto-scalable cloud system that increases the speed of processing in genomics data analysis at scale. It offers scientists easier genomic data exploration, and because costs are lower, doctors are able to run critical tests when are needed. Medical researchers and practitioners get better and more complete analysis on the genome, and patients get information they need to make informed decisions.

SFPD Loves Spark.

Imagine you knew when something bad was going to happen, in time to prepare for it—or to stop it from happening at all. This app, SFPD Loves Spark, is the start of predictive crime prevention.

Created at a Spark Hackathon, SFPD Loves Spark took San Francisco crime data from 2003-2006 and overlayed it on a map of the city to highlight locations where crimes had occurred. From the data, they divided crime incidents into low, medium, and high severity, and assigned a color to each category. The map was divided into a series of small squares, each square lit up with one of those three colors representing crime severity—and a heatmap of San Francisco’s criminal activity was created.

Using a decision tree algorithm, the team was able to get an average precision of around ~ 67% and an average recall of ~57%. The team plans to further optimize the app using random forest or boosting methods. While some of the findings from the data seem obvious, most crimes happen at night, for example—analyzing region, time of day, and predictability will result in information that will help police know where and when to assign more patrols, with the ultimate goal of using police presence to stop crime before it happens.

SFPD Loves Spark.

Imagine you knew when something bad was going to happen, in time to prepare for it—or to stop it from happening at all. This app, SFPD Loves Spark, is the start of predictive crime prevention.

Created at a Spark Hackathon, SFPD Loves Spark took San Francisco crime data from 2003-2006 and overlayed it on a map of the city to highlight locations where crimes had occurred. From the data, they divided crime incidents into low, medium, and high severity, and assigned a color to each category. The map was divided into a series of small squares, each square lit up with one of those three colors representing crime severity—and a heatmap of San Francisco’s criminal activity was created.

Using a decision tree algorithm, the team was able to get an average precision of around ~ 67% and an average recall of ~57%. The team plans to further optimize the app using random forest or boosting methods. While some of the findings from the data seem obvious, most crimes happen at night, for example—analyzing region, time of day, and predictability will result in information that will help police know where and when to assign more patrols, with the ultimate goal of using police presence to stop crime before it happens.

IBM Logo

IBM is inventing a new class of tools with Spark.

Working on something incredible? We'd love to show the world.