Meet our Turtlebot! J White Bear will be speaking about ATVs and SLAM at Spark Summit East on Feb 9th, and at Boston Machine Learning's Meetup on Feb 8th.

The autonomous vehicle (ATV) has become all the rage. While autonomous navigation has been a longstanding problem for military applications including guided missiles, reconnaissance robots, and submarines, the ATV has now crossed the line into commercial applications. We now have dreams of one day being able to let our cars safely navigate bumper to bumper traffic on the way home from work.

But we’re not dreaming just of a driverless, accident-free future; we’re moving rapidly towards fully integrating our everyday lives into the Internet of Things (IoT). What does that mean? Smart homes that tell us when we are out of ice cream, a refrigerator that tells you how many calories you consume, heating and cooling systems that learn your preferences, and showers that analyze your overall health and weight.

The question is, why have these things remained relatively separate? Why have we not innovated with connected cars so that they have the same “smart” features as, say, a smart home? Why is the ATV not automatically part of the IoT space?

The reason is that self-driving cars still present a hard problem. Google and Tesla have both had high profile and, in some cases, fatal disruptions, stalling the release of their fully autonomous vehicles. Reliance on GPS and embedded computer vision processing as a sole means of identifying roadside hazards is inadequate. When those systems fail, as in the case of Tesla’s ATV running into a semi because it was unable to distinguish the white truck from a cloud in the sky, there is no check on that embedded process. As enticingly close as the ATV is, we have not solved the problem because it was done in isolation and because until recent years we didn’t have the infrastructure to support the ATV. It had to be an embedded system.

The introduction of the “cloud” has changed all that. We no longer have to rely on the assumptions of previous computational eras. Because of the cloud, we can bring the ATV into the IoT space and create a safer operational for everyone. For instance, if all vehicles were in the IoT space, the Tesla vehicle could have verified the results of its computer vision algorithm in the cloud and known ahead of time that a semi-truck was present and determined its exact position. An even simpler solution: if that vehicle’s driver had a cloud-enabled cell phone, the Tesla vehicle could have presumed the presence of a person, not engaged the semi-truck, and saved a life.

When the ATV is not part of the “cloud,” GPS and onboard processing alone do not have access to information like, “there’s a group of pedestrians in a cross walk,” and if any error occurs with onboard processing we risk tragedy. Moreover, GPS is not available in indoor spaces and is not completely reliable outdoors.

Fortunately, the solution is simple: bring the ATV into the hybrid cloud. Then, integrate on-board processing with the public cloud, into our smart cities, and into our futures as a real possibility.

What is SLAM?

The above summation of the real-time navigational challenges is part of a much larger problem in vehicle navigation, one that is considered a fundamental problem in both robotics and for any ATV. Vehicle navigation is formally modeled in the simultaneous localization and mapping algorithm (SLAM), but what is SLAM exactly?

SLAM is a longstanding fundamental ATV problem that tries to reconcile the problem of defining where a vehicle or robot is in a given space, with or without GPS, called localization and map that space, either known, unknown, or changing, at the same time that is “aware” of its current position. The algorithm is necessarily probabilistic and in most standard cases relies on Bayesian inference to calculate things like its current state (pose, position, and so on) for a given time step. Below is an example of the Extended Kalman Filter implementation of the algorithm which we use in our use case. This equation is part of the update step used to predict the current state of the vehicle or robot:

Extended Kalman Filter

The overall flow of the Kalman Filter in the SLAM algorithm is as presented below:

Overflow of Kalman Filter


SLAM is ideal in many ways. Primarily, it makes a good use case for Kafka and Spark Streaming because it is designed as a real-time algorithm and uses built in probabilistic qualities such as covariance and normalized Gaussian distributions to quantify the “noise” that is experienced in real machinery and electronics that in the past has been difficult to quantify. Kalman filters were designed with no memory, and so present a major drawback: how do you derive better-informed distributions with no memory? In the past the limited processing power on embedded systems made this a hard problem, and we utilized the standard default Gaussian distributions. But bringing the ATV into the cloud allows us increased processing power by distributing the matrix based computations and allowing us to more accurately tune our distributions across vehicles and in different environments.

For instance, when your vehicle encounters a rainy night and a slick road, Spark analytics can calculate the best safety precautions including optimal speed, braking distance, and proximity to the vehicle in front and behind you, using data from previous rainfalls and from numerous vehicles at that specific location.

We know that ATVs have various manufacturers, each vehicle has various sensors, and there will be some reasonable assumption of privacy for proprietary learning algorithms embedded in the ATV.

Kafka and Apache Spark offer a great solution to the problem:

1) Kafka provides an open-source bidirectional means of communication between the private cloud (embedded system) and the public cloud.
2) Spark Streaming allows real-time streaming, analytics, and integration of sensor data. Data can be easily integrated across vehicles, within a smart city, or on a private network within the home.
3) Integrating with the clouds allows data storage and advanced analytics to improve future outcomes and safety, not just for your vehicle, but for other drivers, pedestrians, and the community in general.

The use of open-source technologies allows us to bypass the problematic space of integrating proprietary messages across vehicles or any sensor data in the IoT space. We can focus on formulating formats and analytics that “discover” a sensor, “learn” its data format, and integrate it immediately into the cloud architecture for real-time analytics. We create a plug-n-play hybrid cloud.

Let’s take a look at the framework to do this:

  1. Bidirectional Kafka Architecture. The ATV is equipped with a Kafka Producer and Consumer architecture. The sensor data produced on the ATV is used internally by the embedded system and is externally shipped as Kafka messages to the public cloud.Bidirectional Kafka Architecture
  2. The public cloud “discovers” the vehicle sensors and receives the vehicle sensor data as Kafka messages, and stores metadata about the vehicle, the type of sensor, and the message that is used for analytics. It can then push new data to the vehicle, such as “reduce speed,” because the public cloud is in the IoT spaces and knows the ATV is approaching a pedestrian or an obstacle.Bidirectional Kafka Architecture
  3. Overall, the framework provides a continuous feedback loop of all the IoT’s integrated devices, plus massive storage and analytics capabilities, while allowing for real-time stream and analytics.Bidirectional Kafka Architecture

Our Use Case:
In my Spark Summit talk Feb 9th, we take a look at a specific use using the Turtlebot 2 as a model of our ATV and get into some of the nuts and bolts of how to distribute and partition the Kafka architecture, explore examples of real-time analytics on the Spark Streaming nodes with some example code of processing and updating RDDs, and look at how learning informs the SLAM model.

Our Turtlebot, Quorom, will be there to help inform the talk! Hope to see you there!

Find out more about J — including her background in computational biology.



You Might Also Enjoy

Kevin Bates
Kevin Bates
10 months ago

Limit Notebook Resource Consumption by Culling Kernels

There’s no denying that data analytics is the next frontier on the computational landscape. Companies are scrambling to establish teams of data scientists to better understand their clientele and how best to evolve product solutions to the ebb and flow of today’s business ecosystem. With Apache Hadoop and Apache Spark entrenched as the analytic engine and coupled with a trial-and-error model to... Read More

Gidon Gershinsky
Gidon Gershinsky
a year ago

How Alluxio is Accelerating Apache Spark Workloads

Alluxio is fast virtual storage for Big Data. Formerly known as Tachyon, it’s an open-source memory-centric virtual distributed storage system (yes, all that!), offering data access at memory speed and persistence to a reliable storage. This technology accelerates analytic workloads in certain scenarios, but doesn’t offer any performance benefits in other scenarios. The purpose of this blog is to... Read More