An Overengineered View Counter

0 reads

I thought it would be cool to start counting and displaying total unique reads for each of my blog posts. There are plenty of 3rd party analytics tools to choose from, but they typically provide an ephemeral overview of page hits rather than a long running counter of total reads, so I decided to roll my own and have some fun building an overengineered view counter. Building this in a way that is accurate and secure actually turned out to be an interesting design problem.

There are a few main challenges with making view counts accurate and secure:

  1. Readers are unauthenticated
  2. I only want to count actual readers, not just page views or bots loading the page
  3. I want to prevent duplicate views from the same reader inflating the view count

High Level Overview

Before building this, my blog was a simple frontend-only NextJS site hosted on Vercel. In order to persist view counts, I needed to add a database. I also needed two API endpoints – one to fetch view counts

GET /api/reads

and one to increment the view count of a specific post.

POST /api/reads/{postId}

I decided to setup a simple Lambda, DynamoDB, and API Gateway backend. This might sound like overkill for my use case, but this is actually my go to stack for buildling quick and dirty CRUD apps, and I've been wanting to setup a backend for my blog anyway.

CDK Infrastructure

I setup this backend stack with a few lines of CDK code.
typescript
// DynamoDB table for storing read counts and temporary hash values const blogReadsTable = new dynamoDB.Table(this, 'BlogReads', { partitionKey: { name: 'pk', type: dynamoDB.AttributeType.STRING }, sortKey: { name: 'sk', type: dynamoDB.AttributeType.STRING }, billingMode: dynamoDB.BillingMode.PAY_PER_REQUEST, timeToLiveAttribute: 'ttl', }); // Lambda function to handle incrementing read counts const incrementReadsFunction = new lambda.Function(this, 'IncrementReadsFunction', { runtime: lambda.Runtime.NODEJS_18_X, handler: 'incrementReads.handler', code: lambda.Code.fromAsset(path.join(__dirname, 'lambda')), environment: { TABLE_NAME: blogReadsTable.tableName }, }); // API Gateway with security configurations const api = new apigateway.RestApi(this, 'BlogApi', { restApiName: 'Blog Backend API', defaultCorsPreflightOptions: { allowOrigins: ['https://owenmc.dev'], allowMethods: ['GET', 'POST', 'OPTIONS'], allowHeaders: ['Content-Type', 'X-Api-Key'], }, apiKeySourceType: apigateway.ApiKeySourceType.HEADER, }); // API Routes and Methods const reads = api.root.addResource('reads'); const postRead = reads.addResource('{postId}'); postRead.addMethod('POST', new apigateway.LambdaIntegration(incrementReadsFunction), { apiKeyRequired: true, }); // Grant Dynamo write access blogReadsTable.grantReadWriteData(incrementReadsFunction); // Outputs new cdk.CfnOutput(this, 'ApiUrl', { value: api.url, description: 'URL of the API Gateway endpoint', });
This snippet provides the infrastructure for the /api/reads/{postId} endpoint. My actual CDK stack has some additional resources for the other endpoint and the Lambda implementations, but this gives a general idea of how the stack is setup.

Only Counting Real Readers

The naive solution here is to just hit the /api/reads/{postId} endpoint on page load. The problem with this approach is that the system would increment the counter even if the user were to click on the page and exit right away. Also, a bot scraping the page would also likely trigger the counter. To get around this, I copied the mechanism mentioned in the Tweet below – once the user scrolls past a certain threshold, an on-scroll hook fires off a POST request.

Preventing Duplicates

The next thing to solve for is eliminating duplicates. The same reader should not be able to increment the read count more than once within a certain time frame. I decided I would allow "re-reads" meaning if someone comes back to the post at a later date, I'll count that as another read. To achieve this I added some client side and server side caching.

On the client side, I added some writes to local storage and session storage to tell the browser to skip the POST request if the user already has "read" the post within a certain TTL.

On the server side, I store a temporary hash of the client source IP using the DynamoDB TTL feature. This way if the user were to clear their browser storage and return to the page within the TTL, the counter would not be incremented again.

Of course this does have some limitations. Consider, for example, multiple users on the same network using different devices that are NATed to the same public IP – these reads will be undercounted. This is a conscious tradeoff, though. I would rather undercount the reads in exchange for more accurate data.

Dealing with Unauthenticated Users

When building apps for unauthenticated users, it's difficult to build a 100% airtight API. Fundamentally, you have an unauthenticated endpoint that allows writes to your database–there is only so much you can do to lock things down without adding a Captcha or requiring a log in.

At a minimum, I added an API key to both of the API Gateway endpoints, and put them behind Vercel routes. The secret key is provided by Vercel as an environment variable. This effectively locks down the AWS backend endpoints, and conceals the API key from the client.

For requests to the Vercel routes however, there is only so much I can do because these routes fundamentally need to be exposed to the unauthenticated reader in the browser. I added some additional validation against the request headers, but at the end of the day, a determined attacker could figure out the right headers to spoof to get around this validation. After the validation, the temporary IP hash is the next barrier, but this could also be side stepped by using different public IPs.

Given all of this, someone would have to be pretty motivated to maliciously increment my blog view counter. If one day I notice 100,000 views on my hello-world post, I'll know that this post reached the right audience.