A Software Architect Blog

Data insights and products

December 15, 2019

We started looking at how to improve our website insights. The current solution relied heavily on GA (google anayltics) and an operational data store.

The main issues with GA was getting real-time data was hard. This data was available in BigQuery but only nightly. The operational data store was userful but was starting to show it’s age. Based on Vertica and RabbitMQ.

The Requirements

We wanted to be able to get insights of consumer events but in real-time. GA is a great self-serve system but there can be a delay in seeing the data. We also wanted to provide data streams for user events for Data Products.

Solution

Snowplow is a data event platform that can run in different clouds. This can be managed by snowplow or self hosted/managed. This allows you to create schemas to defined user events and data for messages.

Conclusions

A mixture of Kafka and Snowplow has enabled a clean data pipeline that can be viewed in real-time and drive data products.

Good

  • Schema’d data
  • Cleaned and versioned data.
  • Easy to stream and process.
  • Stored in Big Query - easy to access.

Andy Riley

Follow me on twitter @andyianriley
or see andyianriley @ linkedin.