GA-CCRi Analytical Development Services

Using Kafka with GeoMesa to visualize streaming data

Well-known GeoMesa use cases often center around its ability to work with large, scalable column-oriented database managers such as Accumulo, HBase, and Google Cloud Bigtable. These systems are well-known for their ability to store very large amounts of data, and GeoMesa brings a range of geospatial capabilities to them, letting you perform analytics and visualization with petabytes of data.

Some fire hoses of data available in many domains these days provide data on such a scale that, while saving it all is not worth it, analyzing this data as it arrives can still provide valuable insights. Because of this, “stream processing” has become a common way to work with data for many classes of applications. The open source Apache Kafka project, originally developed at LinkedIn, is a message queuing system that has become very popular for the handling of streaming data, and it plays a role in systems at LinkedIn, Netflix, PayPal, and Uber.

GeoMesa can use Kafka as a data source as well. This is especially nice when using GeoMesa with GA-CCRI’s browser-based geographical visualization tool Stealth. For example, if your system is reading position data about a fleet of vehicles, GeoMesa can read the data from Kafka and render thousands of animated points in Stealth with subsecond latency, giving a near real-time view of the vehicles’ positions.

We recently learned of Irish Rail’s API for accessing train statuses, so for St. Patrick’s day we used it with GeoMesa and Kafka to display their train locations as they moved to and from Dublin during yesterday’s evening rush hour. The following shows a repeating loop of this movement:

mar16-5-6GMTtrimmed

(As you might imagine, the animation above is sped up; those trains aren’t hurtling down the rails quite that fast.) Along with the latitude and longitude of the trains, the API provides additional data such as each train’s code (shown here as red numbers accompanying the green dots above), direction of travel, public message, and status, and GeoMesa makes this information available via Stealth as well. Below, on a zoomed-in view showing more detail around the Dublin area, we see a user checking the status of a running (“R”) train and then a terminated (“T”) train, and then entering a CQL filter query so that the map only shows the running trains:

filteringTrimmed

Data streamed via Kafka can also be stored in one of the large-scale data stores that GeoMesa supports. We saved data about this morning’s first flights across Ireland, the UK, and northwestern Europe into Accumulo, and the video below shows how this data can be played back. It also shows how manipulation of Stealth’s interactive sliders lets you control the speed of the replay and the number of recent points shown for each airplane, making the displayed “tails” longer or shorter to make it easier to see either common flight patterns or the current locations of each plane.

This kind of rewind and playback is also possible with data coming directly from Kafka, because GeoMesa provides a Kafka message consumer that allows you to “rewind” animations of the movements around the map. (See the Kafka Introduction for more on the roles of producers and consumers.) Also, the Irish Rail API only provides a few dozen data points every 30 seconds, while GeoMesa can retrieve thousands of data points per second from Kafka and render them on Stealth. As more IoT devices include geospatial coordinates with the data they transmit, this creates great new possibilities for the kinds of visualization and analytics that you can perform with GeoMesa.

Go Back