December 15, 2019
We started looking at how to improve our website insights. The current solution relied heavily on GA (google anayltics) and an operational data store.
The main issues with GA was getting real-time data was hard. This data was available in BigQuery but only nightly. The operational data store was userful but was starting to show it’s age. Based on Vertica and RabbitMQ.
We wanted to be able to get insights of consumer events but in real-time. GA is a great self-serve system but there can be a delay in seeing the data. We also wanted to provide data streams for user events for Data Products.
Snowplow is a data event platform that can run in different clouds. This can be managed by snowplow or self hosted/managed. This allows you to create schemas to defined user events and data for messages.
A mixture of Kafka and Snowplow has enabled a clean data pipeline that can be viewed in real-time and drive data products.
Follow me on twitter @andyianriley
or see andyianriley @ linkedin.