We already have the tools to stop this from happening today. The problem is not the technology but the fact that companies do not want to work together to fix it. It is sad that we let the internet break because people are too slow to use the safety features we have.
> we pushed a change via our policy automation platform to remove the BGP announcements from Miami
Is there any way to test these changes against a simulation of real world routes? Including to ensure that traffic that shouldn’t hit Cloudflare servers, continues to resolve routes that don’t hit Cloudflare?
I have to imagine there’s academic research on how to simulate a fork of global BGP state, no? Surely there’s a tensor representation of the BGP graph that can be simulated on GPU clusters?
If there’s a meta-rule I think of when these incidents occur, it’s that configuration rules need change management, and change management is only as good as the level of automated testing. Just because code hasn’t changed doesn’t mean you shouldn’t test the baseline system behavior. And here, that means testing that the Internet works.