Why we chose PostgreSQL and Apache Avro to create a billing data store

Recently at Smartcar, we started to build an integrated billing infrastructure that facilitates, tracks, and automates all payment processing for our API platform. Our first step in this process was to create a reliable billing service that would allow us to store and retrieve all of our users’ API usage data.

But before getting started, we needed to make a couple of decisions: Which type of data store should we use? What types of data should we store? And how should we build the data store? This blog post will outline why we chose to build a relational data model in PostgreSQL, why we chose to build it as a microservice, and why we're using Apache Avro for data serialization.

We hope that you’ll find this post useful when creating a data store for your own product. Happy reading and happy billing, everyone! 💰💳

‍

1. Choosing a database

Which type of database should we use?

Our first step in creating a data store was to choose what type of database would best serve our use case. To make this decision easier, we created a list of criteria that we found most important:

Durability: As reliable data sits at the heart of billing, durability was our most important criterion. Errors in automatically generated bills can compromise user satisfaction and trust. Hence, our database should never accidentally erase or fail to record any transactions.
Portability: To make the best use of our billing database, we should be able to transfer its data to different services and keep it easy to migrate in the future.
Queryability: To be flexible in using our database, we want to be able to construct complex queries to retrieve data from it. That's why our database should have a structured, consistent interface, allowing us to retrieve data performantly and reliably.
Extensibility: Finally, we should be able to easily add additional components to our database. What and how we bill may change over time, so our database should make it easy to add, expand, and reuse code.

With those criteria in mind, we evaluated a few different databases. The table below shows whether each type of database is able to entirely meet each criterion.

When looking at this comparison, we quickly identified Postgres as the winner:

To ensure both durability and queryability, we decided that a relational database management system (RDBMS) like Postgres was the right choice for us. Postgres gives us transactional guarantees that makes our data durable. Data stored in relational databases is also highly queryable. Unlike noSQL alternatives or in-memory data store variants, relational databases support structured data organization and storage using schemas and tables. This allows us to easily construct our queries to calculate billing and flexibly change queries, should our business requirements change. A Postgres database is also extensible, as it can support different types of queries while keeping the same structure. Finally, as Postgres is open source, it offers a whole ecosystem of tools to export and transport data from our billing database to other services.

What types of data should we store?

In an ideal world, we would want to track and store all types of requests made to Smartcar, but this would require a lot of instrumentation work. To save time and effort, we prioritized to only track what we called “billable” requests, with the possibility of expanding that scope in the future. We defined a billable request as a successful API request made to a real vehicle. For each billable request, we store only the data that’s necessary for invoicing and auditing, meaning:

Metadata including the application’s client ID, timestamp, endpoint path, response time, and status code
No personally identifiable information

2. Building the data store

Architecture: building a microservice

Now that we decided on a type of data store and what data to store, it was time to choose an architecture and get building!

We knew that multiple services on Smartcar’s platform would need to interact with our billing database:

Both our API system and our auth service need to write events into the billing database.
Our dashboard needs to read events from the billing database in order to display usage metrics to developers.
Our payments service needs to read events from the billing database in order to calculate invoices.

As several services would depend on our billing database, we decided to build it as a separate microservice. In its code base, we wanted to apply the logic of doing one thing really well. We looked for inspiration and decided to follow the principle of decoupled persistent storage. This Medium Engineering post about monolithic and decoupled data persistent data storage helped us make this decision.

Communication: data serialization in Apache Avro

Now that we had decided to build our data store as a separate microservice, we had to find a way to easily communicate to and from it.

We considered to either build an internal JSON-based HTTP server interface, or to use a binary serialization format with a Remote Procedure Call (RPC) framework like Apache Avro. We decided to go with the latter, for the following reasons:

Avro allows for strict typing of interfaces, permitting us to easily create a service through its RPC framework with strong guarantees.
Smartcar’s tech stack is entirely built in Node.js. Thanks to its strict typing, Avro doesn’t require us to do a lot of manual validation work, as opposed to a JSON-based HTTP server. This means less work and less risk of error.
Smartcar already uses Avro in other parts of its stack.

With those three factors in mind, using Avro was the natural choice for us.

Final thoughts

Let’s recap the steps we took when creating our first billing data store at Smartcar. We decided to:

Build a relational data model in PostgreSQL
Track only billable requests and no personally identifiable information
Build a microservice with decoupled persistent data storage
Implement our interface and communication layer using Apache Avro

This is how our billing data store interacts with other services on our platform:

Flow of data between our billing data store and other services

We are now a few months into using this data store as the single source of truth for all our billing data. In this short amount of time, the way we architected the data store has already saved us a colossal amount of engineering work, and has improved user experience for the developers using our API. We have also begun building other services that depend on our billing data store, and we’re looking forward to using, testing, and improving it more as we build out our entire billing infrastructure.

If you've made it all the way to the end of this blog post, and this topic sounds like an interesting challenge to you, we'll tell you a secret: 🔐👉 We’re hiring!