The Story of CalypsDoH

So context:

I had a service I developed that watched the logs from a NextDNS.io account and would alert an admin of any suspicious requests made on any of the named devices tied to the NextDNS.io profile being monitored. It worked great but ultimately my service was dependent on the will of a third party which meant that I could not charge a service fee even if I wanted to and any braking changes that the third party did resulted in my service being broken immediately and until I had time to trouble shoot NextDNS.io’s api changes.

The Goal:

Build my own DNS Server that can handle requests quickly and efficiently block requests tied to publicly available and constantly updating lists for Ads, malware, trackers, spam and several other categories typically harmful to home and work networks.
I wanted it to not rely any any one third party service. So that meant it should work with various block list providers, Upstream DNS providers, and have my own maintained log format.
I also wanted to use server side encryption for the logs as an extra safety measure.
Requests need to be quick at least competing with similar services like cleanbrowsing or cloudflare malware or family.
It needed to work with many different types of devices to some capacity.

The Path:

The Service I wanted to update was written in PHP and used SES for notifications. It was really just a log walker that would walk each log entry to decide if it should notifiy the admin or not.

I decided that if I could keep the dependencies contained within PHP I would not need to worry about finding a web host because the code would be highly portable.

I was noticing more and more devices nowadays have added support for a DNS protocol called DNS over HTTPS or DoH for short. This technology works by sending every DNS Packet that would have traditionally been sent via plaintext UDP packets to an HTTPS endpoint instead. Thankfully the approach for how to send these has been standardized already and major Operating Systems have already started adding native support for sending DNS requests over DoH.

What is in a DNS Packet:

a request id
whether the message is a query or a response
a status code
an array of questions such as a requested domain name and optionally of a particular type of record.
an array of answers.
an array of authorities
and an array of additionals.

The Path (Continued):

After some reading of Google Public DNS over HTTPS documentation and Cloudflare’s 1.1.1.1 Public DNS over HTTPS documention I quickly learned that DoH uses 2 main HTTP methods a GET where the DNS Request payload is base64 encoded and passed to the ?dns= query param and a more commonly used POST where the body of the request is the raw DNS Payload.

DNS payloads for these requests use something called wire format and ther is a whole RFC write up on all of the shape of the requests but after searching StackOverflow for a while one helpful post said that there should be a from and to wire method. so I found a python library with just that and so the first version of the project used an exec call to call python with a dns payload passed as a command line argument. This worked as a start but I quickly realized that the more exec calls you have the slower every request to the DoH server is. More on tuning later.

I later found a PHP repo that had a whole composer library that was capable of doing all sorts of neat things with DNS, I saw the license was MIT and I quickly pulled out just the message parsing portion and brought it into my project.

This library was able to parse almost every request and get me the requested domain name which I now needed to match against public block lists.

I found another repo named blocklist project and they had fresh lists that appear to be updated regularly that were in a format I could very easily use, domain per line.

I quickly wrote some php code to pull the lists from their endpoints and cache them locally for a custom duration 1 day should work.

I then needed a way to quickly find matched in the lists and with a list for each category I quickly realized I may want to use a single exec call to grep that is likely more efficient than reading each file line by line in php.

Still to maintian as few dependencies as possible I opted to only use exec if it is available to the web hosting and fallback to a custom php grep line reader implementation I wrote named GrepLR.

So at this point I was able to parse the requested dns domain name out of the dns payload, check against a block list and now I needed to actually create a response payload so that the requesting device would accept a blocked response.

My first version just die(); but I knew this was a Hack so I wanted to create a valid response.

I went back to the PHP Repo that I used to parse the message because surely they had a way to serialize a message too. Sure enough a couple copies later and I was create responses.

At this point all that was left was to create some sort of logging. My first logging attempt tried to store everything in one log file unecrypted which had some odd behavior of portions of the logs getting wiped out by writes from other requests. I realized I needed to use file locking or flock calls.

I also realized that there should be separate logs for each device and type of log.

As the logs grew in complexity unbeknown to me so did the query times for dns devices. At the worst the logs were taking response times up by 15 seconds sometimes. On a site with 20 separate resource endpoints this brought normal page loads to a crawl that just simply wouldnt do.

Time to Optimize:

First thing I did was setup timers throughout the code base so I could guage what piece was taking so incredibly long. There were some pieces I was able to optimize like using Exec for grep when available but mostly I knew I needed to simplify logging immensly. So I rewrote logging to use one log file per account not device and type of log. I also changed the log to encrypt the logs line by line which allows me to append new logs anytime I need to and ready logs line by line from the other end of the file. Each line of logs was actually an encrypted json payload that allowed log walkers ( my main intended use case from my service I already had) to handle the influx in a simple manner. This also means that I don’t have to read all of the logs into memory for each request and only if the request needs to be logged does it get appended.

So after optimizations I found a DoH permformance testing cli tool written in NodeJS 11 called Dohzer and it worked great and letting me compare my project to other public DoH providers.

On average other filtering DoH providers give responses within 100-200ms. after I got my changes made I saw response times in as low as 40ms with an average less than 100ms.

This was going to work I thought so I shared about it on twitter.

A colleague asked if I could open source it so I did:

https://github.com/blaineam/CalypsDoH

I then noticed that Safari based DNS Requests seemed to hang for a long time.

I tried various alterations to the project but was unable to solve it so I shelved the issue for a week.

I then was trying to figure out how I wanted to setup DoH for windows 11 since the native implementation was not working and YogaDNS seemed to keep issuing timeouts even with sub 100ms responses.

I found an open source repo from the AdGuard team called DNSProxy and I gave it a spin. Sure enough it worked great if I kept a cmd window open on windows but I wanted to run it in the background.

I found a rather recent build of the code inside a docker hub contianer and gave it a spin since I have docker for windows running and after a bit of fiddling with run parameters I got it working with windows point to 127.0.0.1 and everything was working well for my windows install.

I got to thinking that maybe I could mimick the same blocking procedure my firewall does for me instead of just creating my own implementation. so I setup a custom block on my firewall and sure enough safari had no issues with it being blocked and stopped trying to request the site immediatlly.

I then was able to use DNSProxy on my mac to trouble shoot what requests safari is making behind the scenes that result in a halted page load. I saw the requests I was expecting but they never resolved. I then tweaked some settings in my project and the requests had a slightly different error.

I was able to understand 2 issues:

Using NXDOMAIN DNS Response Status codes is perfectly fine and I don’t need to answer the request with 0.0.0.0 or ::
I needed to respond to the DNS Request with the same requesting id.

After I adjusted my project to respond with those paylaods safari just started loading everything almost instantly and it was working infinitly better.

I was so glad I was able get this fully operational, not only is it a fast content blocker but it also has really low dependencies.

I decided to name the project CalypsDoH for 2 reasons:

Calypso is such a fun name to say and it is about a greek mythology creature that trappes the hero for years, so I figured it makes sense for the intended use case of blocking various content on your devices.
DoH sounds very funny if you just pronounce it dough.

Thanks for reading about my Journey through DNS Filtering.

– Blaine

The Story of CalypsDoH

Categories

Latest Articles