Route 53 Domain Outage Recovery (AWS)

Amazon Route 53 • CloudFront • DNS Troubleshooting • Beginner Incident Response

Summary

Project Type: Real-world AWS troubleshooting incident

Services Used: Amazon Route 53, Amazon CloudFront, ACM

Objective: Restore website reachability after missed AWS domain verification and investigate DNS failure

Project Overview

Earlier today (26th March), one of my previously working domains suddenly stopped resolving. This turned out to be due to my own oversight, as I had missed an AWS verification email when I first set the site up. I later received a notification that the domain had been suspended and needed email verification to be restored. After completing the verification, I expected the site to come back online shortly, but it remained unreachable. Not ideal timing, especially as I’m currently job hunting and relying on the site as part of my CV.

At first, I thought this might just be a delay caused by DNS propagation. After testing the domain more carefully, it became clear that the issue was not a normal delay. The domain was returning NXDOMAIN, which means the DNS lookup could not find a valid record for the site.

The goal of this troubleshooting exercise was to identify whether the outage was temporary or whether there was still a configuration issue inside AWS.

Architecture

The website uses a Route 53 hosted zone for DNS and points the root domain to a CloudFront distribution. ACM validation CNAME records were also present for certificate-related checks. I plan on creating a separate post in the future outlining how I built this site and the services I used.

User Browser
      ↓
DNS Lookup for jackdanielpainter.com
      ↓
Amazon Route 53 Hosted Zone
      ↓
Apex A Record / Alias
      ↓
Amazon CloudFront Distribution
      ↓
Website Content

Incident Symptoms

The main problem was that the website could not be reached in a browser, even after domain verification had been completed.

A DNS lookup from the command line returned:

nslookup jackdanielpainter.com

*** can't find jackdanielpainter.com: Non-existent domain

This was the key clue. A result like this does not normally point to a web server problem. It points to a DNS-level problem, which means the domain itself is not resolving correctly.

DNS Lookup Result
nslookup jackdanielpainter.com *** can't find jackdanielpainter.com: Non-existent domain

Initial Checks

I checked the Route 53 registered domain details first. These showed that the domain itself was active and using AWS nameservers.

This suggested that the issue was not caused by the domain being suspended anymore. It also suggested that the problem was somewhere between Route 53 DNS records and the CloudFront target.

I then checked the hosted zone and found these records:

jackdanielpainter.com      A       → d1e90n9pxjm7ij.cloudfront.net
jackdanielpainter.com      NS      → ns-649.awsdns-17.net
                                      ns-172.awsdns-21.com
                                      ns-1352.awsdns-41.org
                                      ns-1618.awsdns-10.co.uk
jackdanielpainter.com      SOA
ACM validation CNAME records

At first glance, this looked correct. However, the website was still down.

Deeper DNS Testing

The next useful test was to query the domain’s nameservers directly:

nslookup -type=ns jackdanielpainter.com

This confirmed that the domain was delegated to the expected AWS nameservers. That was an important finding because it ruled out one of the most common Route 53 problems: mismatched nameservers.

So at that stage, the troubleshooting path looked like this:

That meant the issue was not simply “wait longer”. There was still a DNS record problem that needed attention.

Root Cause

The issue turned out to be with the apex A record for the root domain. The website depended on Route 53 returning the CloudFront target correctly for jackdanielpainter.com.

Even though the record looked present in the hosted zone, the domain was still failing resolution. That meant the website was not going to recover by itself.

From a beginner AWS perspective, this was a useful lesson: seeing a record in the console does not always mean the DNS response is healthy. The real test is what external DNS tools return.

Recovery Actions

To recover the site, I focused on the Route 53 apex record for the root domain and attempted to recreate it correctly.

The record needed to represent the root domain and point traffic to CloudFront:

Record name: jackdanielpainter.com
Record type: A
Target: CloudFront distribution

While working on this, I also hit a Route 53 console issue where deleting the A record and recreating it immediately produced this error:

Route 53 Record Creation Error
Error occurred A record with the specified name already exists. (Tried to create resource record set [name='jackdanielpainter.com.', type='A'] but it already exists)

Thankfully the action had not fully cleared yet even though the record looked deleted. No harm done. No further action required

Troubleshooting Process

This incident was a good example of working through AWS troubleshooting step by step rather than guessing.

The main technical lesson was understanding the difference between:

Troubleshooting Notes for Beginners

As someone still learning AWS, this incident helped me understand a few important things more clearly.

This was a valuable reminder and good insight into cloud troubleshooting and how we should be isolating based on layers:

Domain Registration
        ↓
Nameserver Delegation
        ↓
Hosted Zone Records
        ↓
CloudFront Target
        ↓
Website Response

If one layer fails, the whole site can appear offline.

Skills Demonstrated

Key Takeaways

This incident helped me get hands-on in troubleshooting Route 53. It showed me how domain verification, hosted zone records, nameserver delegation, and CloudFront all connect together, and why a website will not recover on its own if DNS is still returning NXDOMAIN.
← Back to Homepage