DNS Outage – MediaBlaze Hosts

This is an update to the degraded service event experienced by MediaBlaze Hosts.

Following the DNS Outage caused a DDoS Attack on 22/09/20 – 25/09/20, please find below a full report into the events that occurred against our DNS Servers which we had previously been using a 3rd party supplier to provision. These DNS servers have been in use since 2014, and as such served their purpose well, without fail, until this very large and sophisticated attack took place.

We previously communicated via twitter and updated the news via our network status page. There has been no recurrence of the event since 01:00 Friday 25th September, and have now closed this case internally – and are now communicating an update which will also serve as a fault report.

Whilst we want to be as open and transparent as possible we also want it not to be “too techie” – but this is difficult due to the nature of the attack. If you would like to clarify anything further then please contact our Support Team.

If you would like to skip the timeline and explanation as to what happened, please scroll to the bottom of this mail to the heading of “The Future”.

THE ATTACK

As previously communicated, the degraded service was due to a Distributed Denial of Service (DDoS) attack. We deal with DDoS attacks on a routine basis but this attack was both exceptionally large at circa 80-100 Gbps against multiple targets and for multiple hours. In this instance, the attack was specifically targeted our DNS name servers which host the customer zones for the majority of our shared cloud services.

People who used external DNS services, reverse proxy services such as CloudFlare or were otherwise not using our primary and secondary DNS servers (ns1.mediablazehosts.coop and ns2.mediablazehosts.coop) were largely unaffected and our hosting servers were able to serve traffic freely as requests arrived.

There were three distinct phases of the attack:
1. Circa 15:30 GMT 22/09/20 to 00:15 23/09/20, against 149.255.60.1 and 149.255.60.9 (aka ns1.mediablazehosts.coop and ns2.mediablazehosts.coop).
2. Circa 12:00 GMT 23/09/20 to 00:00 24/09/20, against the same name servers.
3. Circa 12:20 GMT 24/09/20 to 01:00 25/09/20, repeat event as above.

THE IMPACT

The most immediate impact was against the attacks of the 22/09/20 and the 23/09/20:
1. We quickly identified the problem, and moved to transfer our DNS Glue records from our primary network and onto a secondary service in line with our suppliers recommendations, ensuring the service had traffic-scrubbing DDoS mitigation. This was complete by the early evening, and the Internet “glue” records which define the IP’s of the name servers were changed at the registrars. Due to the way these records work, propagation of this change then took some time and it was not expected to fully resolve across the Internet the issue before the next morning.

2. The attack stopped later in the evening, and DNS services appeared to be resolving correctly by 01:00 23/09/20. Due to the DDoS attack ceasing, and the time taken for the glue records to propagate around the Internet, both ourselves and our supplier, were unable to validate that the new infrastructure for ns1.mediablazehosts.coop and ns2.mediablazehosts.coop was able to serve requests with attack in progress, but we had no reason to doubt it would.

3. The attack recommenced on 23/09/20 circa 13:00 GMT and it became apparent that the new provider seemingly could not scrub the amount of DDoS traffic involved and maintain clean DNS requests to the servers. In fact, during the attack of 23/09/20, DNS requests against ns1.mediablazehosts.coop and ns2.mediablazehosts.coop were taking between 5 – 15 seconds to resolve. Effectively, this meant that the requests timed out and the name servers were functionally compromised against this volume of traffic. We then chose and moved the secondary name servers to another supplier, and again changed the glue records. This time, the chosen supplier could cope with the volume of the attack and maintain clean DNS requests to the secondary name servers.

4. Whilst propagation of the glue record change around the Internet again took several hours, we believe that the vast majority of DNS requests were being answered and customer services largely returned to normal by the evening of 23/09/20. This was based off observed traffic, which had returned to almost nominal levels. However, where the customers were using custom name servers or were using old, previously depreciated DNS server records and had not updated those records when previously instructed.

Further to this, the websites of MediaBlaze Hosts were attacked on the 24/09/20:
1. Whilst these websites individually exist on different infrastructure, the attackers again attacked multiple IP endpoints simultaneously. Rather than follow a similar approach to moving the secondary name servers off network, our supplier elected to move the websites behind CloudFlare and leave the web servers on our network.

2. Again, this required DNS changes to propagate around the Internet and took some hours for service to be restored. Whilst the client portals were not available for direct login, support was still functioning using e-mail.

3. We restored primary customer access to MediaBlaze Hosts by the early evening of 24/09/20,

THE INCREASED DEMAND ON SUPPORT

We apologize for any delays in customer support handling, however please understand that the support volumes to both our suppliers and ourselves were significantly raised over the period, this made rectifying these issues with our supplier extremely difficult.

During this period we continued to answer the majority of telephone calls and worked faster to respond to tickets.
As a small business with a small team, we worked through several nights, into the early hours of the morning to resolve these issues, respond to customer queries and also research the best way forward to deal with our supplier and to safeguard our clients and business.

We hope that you can appreciate the situation we were placed in and extra work put in. Whilst there will be individual cases where things possibly could have been done a little better, we do ask that you please appreciate the sheer amount of tickets and issues which our team had to get through during this period. Our staff really did their best.

THE FUTURE

In terms of observations, lessons learned and future planning:
1. Our supplier moved the secondary name servers to being off-network and with different suppliers, for a variety of reasons, including COVID-19 there were delays to their planned future migration. Changing a secondary name server locations usually requires significant lead times for notification where customers may have defined custom name server records, and we did not want to enforce this change during times where customers might be on furlough or otherwise unavailable. Obviously, the attack changed the balance from being cautious to making rapid changes to restore services for the many.

2. By the 24/09/20, the move of the secondary name servers behind a DDoS scrubbing service had largely nullified the attack, barring changing custom or improper name server records. With the above process already completed, the same attack vector should not be able to succeed in the future.

3. By 25/09/20, the move of our customer facing websites and location obfuscation largely resolved availability. Whilst attacks against websites are continuously evolving, the attacks used on 23/09/20 and 24/09/20 against our customer facing websites should no longer be successful.

4. Whilst we have already treated the infrastructure and services impacted between 22/09/20 and 25/09/20, and our suppliers have their own DDoS network mitigation for the cloud servers, they are also now reviewing the use of an external DDoS scrubbing service for our inbound BGP traffic as another line of defense. Tests are already being conducted against a small proportion of services for any problems or regressions, and would expect that testing to continue during October with hopefully a rollout in November.

5. We at MediaBlaze Hosts put into action, the provisioning of 2 new DNS servers 213.229.72.144/151.236.54.75 (aka ns1.coopdns.network and ns2.coopdns.network). This gives us greater independence and control within our infrastructure. All this built upon Open Source software.
This proved to be a successful move, as clients who updated nameserver records with their domain name provider and those who registered domain directly with us, were now beginning to see services restored. We are now working hard to assist those with migrating across to our new DNS servers, however full service has now been restored to our existing DNS servers ns1.mediablazehosts.coop and ns2.mediablazehosts.coop. We will also ensure that these DNS servers are protected and secure to provide increased reliability and longevity.

We hope you have found this informative in what happened and how we dealt with it, and also reassuring in terms of communicating how we have (and will be) mitigating it from happening in the future.

If you have any further questions then please don’t hesitate to ask us, and once again we thank you for your understanding and patience.
Kind regards,

The MediaBlaze Hosts Team.