Jan-David Stärk

Developing a Self Hosted Web Analytics Service - Part I

Dec 30, 2019

What, why and the battle plan

A good challenge for every full-stack dev with web-focus is to develop a custom, self-hosted analytics service. Just like Google Analytics - but in a more compact way.

In this series, I'll walk you through each of my steps. This first article is about creating a solid battle plan for this project.

What?

I want to create a simple tracking tool for websites like this one. There are plenty of tools for this task - Google Analytics, Matamo, and Etracker just to name a few.

Why?

Because.. why not? Seriously - this project will cover a lot of different aspects of software development:

  • Web: Tracking Code
  • API Connections
  • Data Processing
  • Data Visualization
  • Project Management

Plus if this project works out, I've got (hopefully) a nice alternative to the big players like Google Analytics. And it's always nicer to develop software with a real goal instead of just developing into thin air.

Goals

We need some goals for this project. Good goals will define and guide the development and thinking process.

Good goals are S.M.A.R.T.: Specific, Measurable, Achievable, Relevant and Time-bound. So if you want to set a goal for a given task, it should be defined by these five characteristics.

But screw these - we don't use SMART goals today. Why? Because I'm not sure yet in what direction this project will go. And I don't want to feel bound to these goals. I want the process of developing a tracking/analytics tool to decide itself, in which direction this project will continue. But I've got a few guidelines. The tool should be:

  • Privacy focused
  • Simple
  • Small (esp. the client code. Bandwith is spare - even in 2020)

To be honest, these guidelines are pretty obvious and apply to most projects. I mean, who doesn't want to have his or her project to be simple? But let's stick to these for now. Because our development will be very agile, we can modify these guidelines later.

First things first: What is a tracking tool?

A tracking tool is a (small) program that tracks a user on the internet. This is often used on websites to track user behavior. The tool is not a single piece of software, but split into more pieces with different tasks:

  • Tool for the web client (typically a javascript snippet)
  • Tool for the webserver (receives the data from the javascript snippet)
  • (External) Tool for the data processing

Why do people use tracking tools?

Reasons for embedding tracking software on a website are quite simple: Tracking user behavior.

  • How much time does the user spend on my site?
  • What links does the user click on?
  • ...

This data allows us then to optimize our website. And optimization is extremely important for a website. Let's make an example. Think of an e-commerce website like Amazon (fictitious numbers):

  • 100,000,000 visitors per day
  • On average, 8 % of the visitors buy products worth 50 €
  • That makes 100,000,000 * 0.08 * 50 € = 400,000,000 € sales per day

If Amazon uses tracking to understand its user's behavior, they may be able to increase the number of buyers from 8 % to 8.01 %:

  • 100,000,000 * 0.0801 * 50 € = 400,500,000 €

That small increase of just 0.01 % results in more sales of 500k €! That's insane. And there are a lot of leverages we can optimize Amazon for. So it's obvious, that a small optimization of a website can result in a large amount of cash. That's why tracking tools are popular and important.

Battleplan

I want to create a simple version of a tracking tool. It should be constructed like a classic tracking tool:

Please do not judge my drawing skills

I hope, that the above image is more or less self-explaining. The website loads a tracking script. Based on events (onload, onclick, ...) the tracking script will send a request with relevant data to a web-service (which can be either the origin host or a completely different, but reachable, host).

The Software Stack

I'm not quite finished my thoughts about the software stack. But it will probably look like this:

  • Tracking script on the client-side: JavaScript
  • Communication between the client and the server: RESTful API
  • Server-side: Node.js or ASP.NET Core with C#
  • Data storage: Maybe a simple text file (first) or a database
  • Data processing: ??? / Cube.js
  • Data visualization: Cube.js (Web-Dashboard) (?)

As you can see, I haven't finished this one yet. But I'm working on it. And because the current state-of-the-art software project management follows very agile techniques, we don't have to decide every little detail yet.

Conclusion

That's it for now. I'll probably start working on a prototype* soon. We'll see what that leads us to.

See you soon!

* Old software wisdom: Nothing lasts longer than a temporary prototype. So yeah..

Share