Plenary session
8 May 2017
4 p.m..
CHAIR: Hello everyone. Sit down for the second Plenary session of RIPE 74. We would like to remind everyone there is a Programme Committee election. There are two seats up for election. Please, we love new blood, we love newcomers, please submit a short bio to pc [at] ripe [dot] net to be considered for the Programme Committee. Please submit your bio by Tuesday afternoon as elections will start then. And we also have one quick schedule update with the lightning talks. We are moving the anti shut down policy lightning talk by Andrew to the last lightning talk. So the other one will move to the first lightning talk and your BGP is moving to the second lightning talk.
And with the quickest Plenary session in existence, mine Nisbet will be speaking on and your 60 seconds starts now.
BRIAN NISBET: I have just used up 15 seconds of my 60. So, hello. My name is Brian Nisbet. I am many different things. I am, but for this talk mostly I am the network operations manager in HEAnet. I am also a Working Group Chair, I am on the PC here and I am contractually bound to say that I am on the Programme Committee of INOG, the Irish network operators group. INOG.net. You should all check it out. Come and give talks to us. We have a lovely bunch of people.
So, this is not a 60 second talk.
Sadly, you don't get out that quickly. But, as a network operations manager, I spent a lot of time thinking about SLAs and processes and primarily thinking about how to make sure my clients stay online. This is I think a preoccupation by should be common to a lot of people in the room.
And I'm really hoping, this is a talk about that. I am hoping it's a talk that none of you need. I'm hoping everybody sits here during the next 20, 25 minutes and goes of course, we know all that, Brian, we are all doing it. Why the hell are you telling us this? And but hopefully maybe there are some things you haven't thought of of or some things that maybe you are not doing as much of, and I'm going to repeat some words. You may hear the word 'communication' crop up a couple of times in the next 20 minutes, which is I think one of the most important things in our industry. Now, all of of this is from the perspective of a national research and education network. The majority of you are working for commercial operators, fumbling in the greasy till for the profit but I think that all of these things apply to you as well regardless of where you are coming from. And there is a lot here about NRENS, about academic users of ISPs, flexibility of services and one of the big things I have been obsess being in the last while is ubiquitous access of users wherever they happen to be. For instance, I have some clients in a small furniture college on the west coast of Ireland far away from everything and I want them to have as good a service as the people back in the cities, and huge, huge challenge.
So, let us cast our minds back to a golden age, to the Garden of Eden, or there or thereabouts, when Telnet was safe, DDoS was almost unheard of, and no one thankfully had ever mentioned the Cloud or the Internet of things. And importantly, all of us could break the Internet and get away with it. So we used to have a maintains window which was from 8 to 10 on a Wednesday morning and we could do anything. We could shut the entire colleges down with no problems whatsoever. In fact, I distinctly recall informing University College Cork one day that we actually needed to extend that maintainance window from two hours to four hours and then we'd reconnect them. And their operations manager, instead of throwing me out of the building went, yeah, sure, whatever. This was 2003, it was a different time. And indeed commercial telcos could get away with it to a certain extent as well because the Internet wasn't the Internet. It was this thing that nerds played with.
But certainly our SLAs at the time were a lot more relaxed and we could get away with a lot. So, we certainly worked on a basis that our clients would no about their outage because we had rung them and said, Hi, you have an outage. They'd be like, oh, yeah, that thing isn't working, yeah... you are fixing it then... yeah, yeah, sure, and we'd go on about our day. We live in a very, very different world, ladies and gentlemen, as I think you are all aware. So when I first gave this talk at the TNC conference last year, I promised a magic trick, and what I was doing was taking all of our reaction times, all of those SLAs that you think you have that mention 99.5 or 99.99 or we have an hour to respond to your outage or whatever else. And I'm going to replace them with 60 seconds. Why 60 seconds? 60 seconds is roughly the amount of time that you have between the service going down and the user noticing it's not their problem.
So, it's not their computer, their Internet connection hasn't been eaten by a family pet or their cable hasn't been eaten by a family pet, someone hasn't suddenly lead‑lined their rooms, whatever it happens to be, it's that period of time. They'll take out their phone, check their 4G connection, things like that, but they will realise there is a problem with the service and they will do that really in about 60 seconds. And that's 24 by 7. That's not, oh, it's fine, it's after five o'clock, we can slack off. That's the entire time. From our point of view, education is certainly an absolutely 24 by 7 world at this point in time. We have clients who are setting up, Irish clients who are setting up campuses in China or, alternatively, they have just got students who are on their online courses and they want to connect to Moodle or Drupal or whatever virtual learning environment they have as much as anybody else does, and of course students are well known for having their submissions in weeks in advance, not one of them would ever be typing up at three o'clock in the morning desperate to submit it to a virtual learning environment. And nobody really wants a maintenance window during the day, especially researchers who are running big jobs, and realistically speaking what we are learning is that no one wants a service outage ever. It must always be on. This is a bit of a problem. Now, as I mentioned, I come from the ivory tower of academia, we are special in some undefined way. We have a lot fewer clients. I have 60 clients. Those of you who run DSL networks are like, no, you have missed some zeros, surely. 60 clients and then 4,000 schools but we treat them as one large client who mostly wants Windows updates and YouTube. But we are, we can give this specialised level of service. But we still have to react now in the way that commercial service providers would. Because people are expecting more and more from that. And but we have the flexibility I think to communicate with our clients and talk to them in ways that maybe commercial people wouldn't be because you have got thousands of clients. Now, there are many good examples of how to deal with things and there are many bad examples of how to deal with this kind of thing that we have looked at and looked at to inform the next few slides. The Amazon web services status page is a terrible reference for any of this. We talk about communication, information, we do not talk about a status page which says everything is fine, all of it is fine apart from this tiny little eye which tells you that storage is down or DNS. From the flip side you have things like Mythic Beasts in the UK who apologise on a Sunday evening. Normally they'd like to do that within 15 minutes. There is this huge variety of different kind of reactions in the commercial world.
But really it boils down to the fact that the clients, whether they are your clients, whether they are my students, they want their Facebook, they want their NetFlix, they probably want their work portal or virtual learning environments, they just need it to work.
So, this is an operator's life now. This obviously the poor, this is fine dog, the unofficial motto of 2016. How can we live in this world? Can we actually have, as operators, down time any more? We need our sleep. We need to eat. So, what do we do and how do we persuade the senior stakeholders who are telling us that everything must be up 24 by 7 that eternal uptime is a myth. It's a fairy story told to children. Things break and sometimes we need to turn them off before they break. And where is their appetite for risk versus budget and how do we need to evolve as engineers to stop talking technically and start talking in their language. I have been a manager just over five years so I'm worried it might be my language and maybe I don't know how to programme a router any more.
So where is the cost benefit? Where is the risk? How much more do you pay for resilience? How happy are they to live with the possible outage? For instance, when we built our optical network a number of years ago we said, hey, for an extra million, we can make the whole thing resilient, and our clients were like, no, it sounds like a very big number. We don't want that, it's fine, we trust you. And then one of the optical nodes in Dublin went off line because they plugged it into the wrong power source and an entire university, the biggest one in Ireland went off line. Suddenly everyone was like, so a million, that sounds like a bargain, where do we sign? Now we have a fully resilient optical network to our universities which meant that when some nice people recently knocked down a building and ripped some fibre out from under it, half of the universities in Dublin didn't go off line, we were very pro this.
From a commercial telco point of view, there are lots of different clients we have and there's a question you have to ask yourselves about what does an outage mean and how much can you afford to lose a client? Now, from my point of view all clients are created equally. I have 60 of them. A client is a client is a client, I love them. From your point of view, you have ‑‑ you can see it, just about see it ‑‑ how much do you care about your clients? Maybe, well, the very thin slice at the top is your home DSL users. Maybe, I don't know, this is how I see these things. Then we have, the slightly larger pieces is your small and medium enterprises, then you have enterprises and then you have your hyper mega global corporations, Facebooks, Googles, whatever huge, huge companies may still use your commercial services. And you can lose some DSL clients, sure, this happens all the time, this is the churn when your competition say that they can get a million megabits a day for €5, or whatever the race to the bottom is at the moment. But can you lose your enterprise clients? And certainly can you lose the really big clients? And when you are talking about how many 60 seconds does it take for you to lose a client and when you start losing clients, how many more clients will you lose after that?
So, I suppose the crux of all of this is what can we do about that? What tricks can you pull on the assumption you don't have an infinite budget to give everybody, including your home DSL users, 8 different egress points etc., etc.. how do you deal with that? And this is hopefully the obvious bit.
So, first of all, you do put resilience everywhere as much as possible. Now, again, no one is expecting resilience to a home DSL user, but your home DSL users are but they are not going to pay you any extra for it, so the reality is no one is expecting that. But there's a lot of places you can put it. A lot of places where you may not think about it as obviously. Not just the links, the data centres, the servers, the provisioning tools, the monitoring tools. What is your disaster recovery situation like? When was the last time you tested it? Resilience looks really good on paper, but it doesn't many anything unless you have gone and you have tested it and worked with your clients, especially your large enterprise clients who may have their own network setup and they may have changed something since you last did a failover test and then when something breaks they are like our network is done, you are going it's fine, nope! Half of our network is off line, did you change anything? Of course we didn't change anything. Has anyone here ever had someone tell them that and it not be a lie? Your links, your services, your back ends, your front ends. People forget about making their front ends resilient and then they try and log into things. Identity provision systems, things like that. And as I said, test it, test it and test it again. BCP testing, resilience testing is a pain, but it is time eminently well spent.
Automation: Robots coming and taking our jobs, etc., etc., etc. Thankfully, the robots are still contained in Ansible scripts and are incapable of doing everything yet.
But you have got a question about who you trust more? I would really like to think the automation argument is completely won at this point in time. But I think there is still people out there who are just that little bit worried about handing over their network to a bunch of scripts and a gooey. We in HEAnet have been doing this for 11 or 12 years at this point in time, so, we're well past it. All about it the new system we have for our new network is shaking my faith in this at the moment. Also, autobiographic server resolution. People like that come and spoken at RIPE meetings about what they are doing. NetFlix has their Chaos
Monkey. There is a risk appetite about how much you want to unleash on your network, but you should push that envelope and push it with the senior stakeholders and should show them the benefit and some of the fancy graphics that companies can bring to it. This is all done on an academic budget.
Change control, incident management. For some reason when I show this presentation on a large screen, the bottom arrow goes away. There should be a bottom arrow there and it's very important. This is not a single process. Change control. Management control. Change management even, where are you in your cycle? What is your level of maturity? Engineers tend to hate change control. They feel it's somehow shackles them unfairly, and then you run change control for a while and suddenly they are going, you know, change requests are awesome because someone has reviewed their change and they have plausible deniability after that point. They weren't the only person who looked at it. But in all seriousnessness, it becomes a wonderful liberating tool when you actually have to explain your code to somebody else, whether it's the debugging on your detection or another engineer it's a really useful thing and it's one of they say things that gets a lot of resistance but senior stakeholders love it. Clients love it because you are going, here is my process. Once you have got that process you actually have to follow it. It can't just be a dusty piece of documentation somewhere that you are ignoring.
You should have change control everywhere in everything, but your methodology is up to you. It can be idle, it can be something that isn't idle, it can be some bastardised version of idle, it's entirely up to you, just make sure it's there and people are paying attention to it.
Incident management is similarly the case, this will solve you so many problems and if you think that's fine, when something breaks we fix it, you might know that, and maybe if you are in a very small company that works, but the moment you get any bigger than that, any more clients than that it all begins to fall apart because communication, communication, communication. If something breaks, that is what your clients want. They don't want to ‑‑ none of them will have an unshakeable faith that you are fixing it unless you tell them you are. And the moment you don't communicate, the moment you don't follow up, then they will assume you are sitting back having coffee, complaining about IOS or Junos or whatever, I mean something, we complain a lot. But if you take nothing out of this talk other than communication, communication, communication, then that makes this whole thing worthwhile. There is, bar you turning up at your client's house at three o'clock in the morning knocking on the door and telling them you are working on it, there is literally no level of communication you can go to that is too much. They will appreciate everything, and the moment you convince them you're working on it, you will have more time. It will just stretch out because they will ‑‑ they realise things break, but if they know you are fixing it, it's great. There was a Dublin County Council who had a water mains burst and everyone was complaining about them and they put up pictures of the digger and the pipes and what they were doing and the community was going yeah, we can see them fixing it, this is awesome!
We have this thing called social media these days you may have heard of. That's really useful. Don't be one of those people who gives canned answers on your social media account which just either ask the same question, have you tried it on wired as well as wireless? Or we don't see any problem in your area at the moment! What are those flames in the distance? Nothing to be concerned about. Use it, leverage it properly. Whatever communications method you have, it's entirely possible that you think I might be bothering my clients a bit. They will probably appreciate it and give them details and most importantly and this should be an engineering crime punishable by something terrible. If you tell them you are going to ring them in an hour's time, even if there is no change, and you don't ring them, just it's a bad thing, you are bad people, and you have done a bad thing.
It is vital that any promise you make in regards to that is kept.
And then you take that communication, you look at it afterwards, you look at the incident, blame free postmortems, anyone who shouts at their engineers or gets shouted at by their boss, you are a bad boss, or you have a bad boss, no one will ever fix anything if they think they are going to be blamed the moment they put up their hand about it. It's very difficult to do. It's very difficult to get that culture working, but it's absolutely vital because it's going to be the best possible thing to do.
You go back to this, and you check your communication, your refining, your processes, and that involves communicating internally as well whenever you change a process. And literally again you can't do that internally enough either. Engineers don't read e‑mails when they are sent. They might hear you when you say it in a team meeting, then you tattoo it on to their forehead in language they can read in a mirror and they are probably going to get there. I have a lot of real estate for that, it's the only reason I went with this choice of hairstyle. You have to improve the processes, you have to look at it afterwards again. What this all comes down to is, the first time you make a mistake on one of these fairly simple things, it will probably be okay, but the fifth time you do it, when they don't believe you care, then that's the point at which you are going to lose those clients. For me, I have 60 clients. Losing one of them is pretty much a disaster. For you, maybe losing one of them isn't a disaster but maybe if you add a few zeros onto the end of that it begins to become quite complicated.
So, to kind of, you know, recap through some of that, users expect their service to work. How long do we give the poor NCC operations staff when wi‑fi goes down? How quickly do knives come out and we looking dangerously to the desk at the top of the room? We expect it to work because they do such a fantastic job of it. They won't tolerate it for very long but with communication, but having put the right processes in place so your compliance and your engineers know what's happening. With that resilience that solves the problem in the first place and with proper, well‑argued, well‑analysed cost benefit, risk analysis with your senior stakeholders, you can do a lot of magic, you can do more magic and make that 60 seconds last a very long time indeed.
Thank you very much.
(Applause)
CHAIR: Okay. Does anyone have any questions for Brian? We do have some time.
AUDIENCE SPEAKER: I'll bite. Daniel Karrenberg, no operational responsibility any more, thank God!
I have been in a spot a couple of times when there was a choice in communications. Either be brutally honest and said we screwed up and we did it in the most embarrassing way, and make the argument we learnt from it, or basically say well, something went wrong and we fixed it. And inevitably everybody around me was for the second option. What would be your answer and why?
BRIAN NISBET: I believe everyone around you was wrong. I believe in ‑‑ I believe in being honest. Now, okay, there is ways of crafting the message, obviously, and there is ways of telling that story while still telling the truth but I genuinely believe that being honest with people might be problematic in the short‑term but in the longer term it's going to be for the best for you and your operations and everything else.
I have never worked in a commercial telco so maybe everyone is sitting there going, oh, God, the poor innocent boy, but I genuinely believe in telling the truth and going, hey, we screwed up, we did this thing, here is some details on what we did. Here is what we know ‑‑ not just, we're going to fix it, but here is how we are going to fix it and here is how we are going to make sure it doesn't happen again is a vitally important thing.
Okay. And everyone else does agree with me.
Thank you very much.
(Applause)
CHAIR: Next up we are going to hear about how to create the SDN platform of the future Alex Saroyan.
ALEX SAROYAN: Hello. I am from XCloud Networks. I have been a network engineer before and still I am kind of.
So, about one‑and‑a‑half years ago I came to a question because I asked myself I am network engineer, me and my team are building and maintaining telecom and data centre networks for about 15 years, what we do besides configuring and managing network? We are always busy, but why? And I figured out that most of the time we are doing this. Please open access between this IP address and that IP address on that port. Increase traffic policer for this customer or configure BGP, and so we are basically doing what people are requesting. And of course we heard a lot of like my request is urgent, I am going to escalate, this is important, this is blocking the project and so on. So people are pretty urgent and they have a lot of requests.
So, we are not only building and maintaining networks, but most of the time is going for doing what other people want us to do. And it is not like they always request right things, I believe.
So this was the problem. And I wanted to solve. So I decided to create a system where customers, consumers, sales people, system engineers, network engineers, all people that need something related to the network will be able to define their requirements into an intuitive web portal using a couple of mouse clicks and the things will be done automatically without wasting precious time of network engineers.
So, about two, three months of trying, testing, failing with a couple of my friends, we have got a precise understanding what the concept will be and what we are going to do with. So we decided to take a white box switch, most of you know what it is, it's kind of equipment available from many vendors which is kind of open. So it's like Cisco switch but without Cisco IOS. It's still using industry standard silicone. We need Cumulus Linux because its operating system, production operating system for white box switches and we need a piece of coding. So everything looks achievable.
So, the idea was to build a fabric using bare metal switches running Cumulus Linux, then set up a BGP adjacencies between the switches and develop intuitive web portal and develop a piece of software which will be run right inside the switches themselves. So, we didn't want to develop something which is going to SSH into switch or kind of connect to the switch and do something there. We decided to do something which would run deep inside the switch. So we built this in our lab. So we set up a BGP, we started using BGP unnumbered, BGP unnumbered is using extended next hoppen coding which is defined by RFC 5549. Unfortunately, it is not available by most of the vendors yet, but thanks to Cumulus efforts of Quagga project, which is currently called Free Range Routing, BGP Unnumbered is available on Cumulus Linux.
So, with BGP Unnumbered, it is very easy and it's very quick to configure the switches. You don't need to set up a link IP address. You just say, I want to run BGP on this port and use this AS number and it's up and running. It's pretty easy. Then we set up IP addresses on loop back interfaces of those switches and started advertising those loop back IP addresses so every switch gets reachabilty so every other switch loop back IP address. We connected a small virtual machine to this network. We set up our software there which started to act as a controller.
We developed the UI inside the controller and actually the piece of software which is running in every switch on top of the Cumulus Linux was contacting with the controller, pulling kind of configuration, like I am this, this, switch, do you have any information for me and pulling configuration and deploying locally, the configuration started to deploy in distributed manner.
Because we were initially focused on the case of carrier network, so we decided to use VXLAN, which is supported inside the hardware, so on ASIC level, and VXLAN has up to 16 million of circuits. So the customer traffic, which is entering in this switch for example here, the Internet frame is encapsulated into the VXLAN packet which is IP packet and later on it is transmitted over the network from one IP address to other IP address and we know that layer 3 can scale to the size of the Internet, so, we were sure on the scaling.
There are some virtual machines called service nodes and they are used for VXLAN signalling and we also run policers on every switch on the ingress, again this is supported by most of the white books which silicone.
We had a case when we had to calculate the price, so in carrier network the price how much will be the cost for the circuit between this point and that point? And we decided that BGP will be our friend in this case. So, we configure, as I told, we use BGP for ‑‑ so these switches are switching BGP between each other and if two switches are located in the same logical location, we run iBGP and the locations which are in different locations are running BGP. So when building, when it is us building engine to figure out which switches will be on the way between these two end points, so billing engine just checks BGP routing table and looking into AS path it is clear which switches are on the path between the end points of the circuit.
Then, we need someone who will be using this thing and sofa connect and innovative carrier, they started using this technology on their production network which helped us a lot actually. We did a lot of improvements but approach proved to be kind of production ready.
Up to that point, we understood that we solved automation problem and we get some freedom of choosing among different, between different hardware vendors because as I told the white box switches which work with this approach are available from about eight vendors on the market. We were inspired and decided to go further to find out new problem and new challenges.
So what about data centre deployment? During decades network to network realisations in data centre were evolving. Starting from this excess layer data centre networks became kind of advanced fabrics running BGP and overlay technologies with inter scaleability, but still server to network connectivity is still using legacy technologies, so spanning three broadcast, multi‑‑‑ some other problems are still in the data centres. So we decided let's try to remove shared segment between server and Tor switches and see what will be. We will remove broadcast and LAG and there will be no wasting of IP addresses on sub‑netting. So VLAN borders and also we decided that staking Tor switches is not very good idea. So we decided that servers should become part of the switching fabric.
And we came to this network. So, we built fabric of switches running BGP on every switch, and as I said, thanks to Cumulus networks, it is today easy to run BGP on numbered not only on switch side but also on server side. So we decided to run BGP between server and switch. It is not actually hard. It's very simple, because with BGP unnumbered, we don't need link IP address. So, we just say I want BGP on this port. We set up just one loop back IP address on the loop back. We connect all the server ‑‑ all the server link to the all possible switches, and we ran BGP over all those physical links. So, top ‑‑ in this case top Tor switches don't need to be stacked so one switch doesn't know anything about other switch. And that's great. So, failure on one switch can't bring to failure on another switch.
We have BGP attributes for traffic engineering, which is very flexible.
And to even simplify the configuration on server side, we came to kind of standard configuration and we put this configuration into Quagga package, when network engineer doesn't need to deeply learn BGP but just to install the Quagga package which comes with the standard configuration already inside.
With this network, we decided that access ‑‑ because switching silicone has ability to filter traffic, so we decided to move access list on the level of switch silicone. But switch silicone has comparatively smaller TCAM size comparing with big firewalls, so if we apply access list on the switch globally, then we will end up with wasting the TCAM. So BGP is our friend. Again, we decided to use BGP here. Because we run BGP between servers and switches, we precisely know which IP addresses is located behind which physical port. So, when a web server engineer wants to get an access to the database, they go to intuitive web portal, they use their mouse and some keys on the keyboard, it's very simple, just five seconds, they type this IP, this IP and this port. Then the software which is running on every switch, pulls this information from the centralised database and then compares with the BGP table to figure out that this particular rule is relevant for this particular port and sends this rule to the ASIC for the proper port. So with this approach we found a way to use switch silicone resources efficiently.
So, considering this, Broadcom Trident2 allows 4,000 of ingress ACLs. And server mobility came easier, because when you move ‑‑ currently when you move server from one rig to another rig, first you don't need to tell network engineer to configure the VLAN on another rig, you just go to the interface and say, I am moving my server there, and when BGP comes up in new location, the access lists are following the server. So, in new place, the server is up and running with the access list in place.
So, Anycast for load balancing has been used in the Internet for decades, or for load balancing for redundancy of DNS servers or some CDM platforms. So since we have BGP here, we decided why not to use Anycast for load balancing here. Switch silicone has ability of ECMP hashing. So it works in the following way:
We configure Unicast addresses on the server. Another Anycast address which is the same for any server. So servers are advertising the Anycast addresses for the same matrix for all the switches and there is the ECMP standard, eco cost multipath routing. So traffic is balanced between servers. Considering hashing based on source destination, IP proposal and port. So when server goes down, BGP is down and traffic is rerouted. And what to do if Apache is down for example but the server is operational and BGP is up. So piece of software which is running on every switch is contacting with the Unicast IP addresses, checking if service is alive. And in case server is not alive, service is not alive but server is running, again traffic is routed using BGP attributes.
We decided that we should make the same with the virtual machines will be great if physical and virtual machines will have the same treatment, same approach. We currently tried this with Openstack and Proxmox utilisation platforms. So what we did is, this is the classic hypervisor where virtual servers have some virtual interfaces. The traffic is passing through IP tables into the bridge which connects to physical interfaces. Because the idea was to run BGP between server and the network, we did the topology. So, we run Quagga, we run BGP inside the hypervisor, so there is a BGP session between hypervisor and switches between the fabric and we pull out every virtual interface from the bridge and terminated locally enabling IPv6 because BGP unnumbered is using IPv6 for establishing BGP session, IPv6 link local I mean. So we pull out the virtual interface, we enable IPv6 on those interfaces, they get IPv6 link locals, and they establish BGP between virtual machine and hypervisor. So everything advertised from virtual machine goes to hypervisor which advertises to the rest of the network.
What about Microsoft operating system? Microsoft running BGP on the host? So it sounds at least strange. But, actually Microsoft server starting from 2012 R2 it supports BGP natively without installing anything. It really supports.
Of course not without problems. So, of course BGP unnumbered was not supported. It refuses to work when advertising IPv4 over IPv6 adjacency, so according to documentation they claim that it should work, but it doesn't. I don't know why. But we found kind of a trick. We configure IPv4 link local addresses, start advertising the real IP address from the loop back, and thanks, Microsoft, it works. So Microsoft Windows understands the loop back ‑‑ that link local IP address is not good for generating, for originating traffic and that helps so they understand to put the loop back IP into the source of the packet, which is very crucial for this approach.
A lot of people told us that, okay, that's interesting, but it's not production‑ready. So, thanks to another creative company called Innova, they actually run this how long on the host approach on their live infrastructure, and they don't waste IP addresses any more on sub‑netting, so, one server can just use one IP address, I mean IPv4 address so they don't need to loose another 20 IP addresses on sub‑netting. And because of that, Innova was able to remove all the net used. Today, their network are running purely on public IP addresses and they are kind of bringing back original concept of the Internet which says that Internet is a network between wherever network IP address with reachability to other IP address.
And finally, they decided to fully swap their existing Cisco /Juniper /F5 infrastructure to our solution, currently serving 24 million of user base using the approach which I have described. So routing on the host is something real and it is already used on live production network.
Summarising what we got, the automation, access list collected in the centralised place but deployed on the network level. Load balancing in many cases without using separated load balancers.
Hardware disaggregation from joined vendors, I respect everyone in Cisco and Juniper, but let's try to disaggregate, maybe it's time.
And better agility with the operations.
So, I want to say a special thanks to Atilla De Groot, Sean Cavanaugh, Pete and Stefan, and the rest of the Cumulus team, they were helping us a lot. So thank you, guys.
Thank you.
(Applause)
CHAIR: Thank you. Any questions? And I'd also like to encourage people who aren't necessarily the usual suspects to also step up, so...
AUDIENCE SPEAKER: Hi, my name is Osama. I am not representing anybody. So, I wanted to ask a question with regards to the BGP routing, are you doing it in the virtual environment scenario? Are you doing the BGP routing between every VM and hypervisor?
ALEX SAROYAN: So every VM has a BGP adjacency with a hypervisor and every hypervisor has BGP adjacency with the top of the rec switches.
AUDIENCE SPEAKER: Have you considered relying on the hypervisor again ‑‑ without the need to run the BGP with all the VMs so the hypervisor even knows all the IP addresses of all the VMs, simply in the status of all the VMs?
ALEX SAROYAN: I got your point, but because we are doing this Anycast load balancing, so it was actually important that the virtual machine advertises its Anycast IP address locally.
AUDIENCE SPEAKER: So if the hypervisor itself knows the status of every VM so it can say if it's up or down and it can basically remove or add a route.
ALEX SAROYAN: Yes, but also, there is actually also a case of moving virtual machine, live migration, so do you remember live migration sometimes there is a situation where the instance has been moved but the older hypervisor still thinks that VM is running. So that can cause problems.
AUDIENCE SPEAKER: Jen Linkova. Could you please operate BGP on a scaleability on the this because it looks like you are running a lot of BGP sessions to use kind of route reflectors and so on?
ALEX SAROYAN: Yes, of course we run route reflectors. The spine ‑‑ between leaf and spine so spines are route reflectors. And the spine layer and hypervisor layer are different AS numbers, so there is eBGP.
AUDIENCE SPEAKER: And the second question is with all this multipath especially using the link local, what do you experience with establish this when you need to make sure you sent back a particular path during your trouble shooting?
ALEX SAROYAN: Well, so Paris trace route, and sometimes we apply some access lists on the physical switches so check counters to figure out if particular traffic is here or not. Most of that.
AUDIENCE SPEAKER: Emile Aben from the RIPE NCC with a question from the remote participant. Sascha Luck. Question is, what is the Quagga Cumulus package you are using on the host? What is the Quagga Cumulus package you are using on the host?
ALEX SAROYAN: It's Cumulus Networks are running open source project which is actually a fork of Quagga which is, today, actually renamed to Free Range Routing, but it is based on, partially based on Quagga source, and Cumulus Networks developed a part of the BGP unnumbered which was very important for doing all this stuff. But it is still open source project, so anyone can contribute to this project, can use the code freely and like normal open source project.
AUDIENCE SPEAKER: Benedikt Stockebrand. I see one huge problem here but maybe that's because I spent too much time in enterprise environment rather than ISP or whatever context. It's bad enough to have the VM people trying to run these things come to grips with basic network concepts like VLANs and I'm going to talk about dynamic routing even. But actually running BGP on a Windows box sounds scary to me. Not because BGP or Windows, but because the combination and the problem in front of it. And I'm not even going to comment on having database people get control over viable configurations because that's inevitably allow everything from to everybody else. So this is as interesting as it is from a technological point of view, in a lot of environments I have seen, this approach is infeasible at best and absolutely scary if you just really think about it.
ALEX SAROYAN: It is actually, and, as I told, not without problems, and we experienced a lot of problems with the Windows, but right now at this very moment as I told the company called Innova, they have in gaming business and most of the game servers are requiring Windows and it actually works. Finally is works.
AUDIENCE SPEAKER: In that case if we talk about gaming servers, yes, this is a very difficult business, but if something goes down, yes it can be scary for commercial operators of these servers, but it's not like in a lot of enterprises where you don't have the luxury, as Brian mentioned before, of having the occasional unscheduled down time and no repercussions.
CHAIR: Let us remember that there are many database people who have a very wide range of knowledge, and also let's cut off the lines after everyone in line. Randy?
AUDIENCE SPEAKER: Christian pay trick stand I can. I have one question about the security. If you have a web portal where every customer can do the stuff he needs, how do you prevent that you open up a hole you don't want to open up?
ALEX SAROYAN: Thanks for the question. Actually, what we do, we create different tenants. We assign users to tenants. So tenant represents the customer or project. Then we assign different resources. It can be a physical ports or IP address or physical server. So we assign resources to the tenants and every tenant can freely do, with the resources within each tenant and at the moment when tenant wants to do something between each and other tenant for example to create a circuit or to request access to the neighbours database server, the system, we check who the resources are engaged in this request and we send this request for the approval procedure for another tenant. Another tenant should make sure, should say okay, I agree that's going to happen and only after that we apply. So it makes ‑‑ it doesn't avoid security procedures kind of, but it improves. So, kind of lifecycle is easier.
AUDIENCE SPEAKER: And do you have a standard access list or something elsewhere some stuff is blocked?
ALEX SAROYAN: Yes. Yes.
AUDIENCE SPEAKER: Interesting stuff. Thank you.
CHAIR: Let's keep these two questions brief, if you can, Randy.
AUDIENCE SPEAKER: Randy Bush. About 15 years ago I moved from a highly automated network to one of the larger telcos in the world, and I started to talk about automation and the first thing I heard was the network is the database of record. That's a form of suicide. I strongly suggest that Bernhard's advice be taken by all my competitors and do not use database to manage your network. Do it with monkeys and keyboards, it's great.
ALEX SAROYAN: Okay. Thanks for the comment. We actually work a lot to make database network be more safe. And maybe next time I will propose a presentation how we do database ‑‑ what precautions we consider to make database network safer?
AUDIENCE SPEAKER: Friso Feenstra. We do have a network which you would call a legacy, we have still got VLANs, we have still got switches, we have still got Cisco, we have still got F5s and I want to talk about the F5 thing. One of the things about load balancing you were talking about saying, we use BGP multipath and that way the traffic is separated over all the various servers. One of the things our F5s do is have something called session states. A lot of multipath and Multicast or Anycast over service over Internet works for instance with DNS which is not session orientated. We have a lot of sessions which are session orientated and the session state is very important for the session to actually work and give something back and queue connect those type of proposals need session states. If in ICMP ‑‑ sorry, in multipath, if you suddenly switch to another server which is not aware of the session state, the session is normally broken and you don't want that in a normal network. How do you cope with that in your solution?
ALEX SAROYAN: Thanks for the question. Actually, we are not trying to replace F5 like firewalls, load balancers for 100%, but in some cases, this approach works and first, ECMP considers hashing, so, until switchover, the sessions are going to the same servers. As you told, yes, whichever is a problem, but there is another ‑‑ with this approach, there is another opportunity for the network, they can use tools like HAProxy or other Open Source software proxying software, and in combination with Anycast and such proxies that the result can be achieved. Of course not 100% of expensive load balancer, but many companies doesn't need all the features which expensive load balancers introduce.
CHAIR: Thank you very much. And I look forward to your next submission about security.
(Applause)
CHAIR: And now we are going to learn about periodic behaviour in trace route sequences, for our first of three lightning talks.
MATTIA IODICE: So, good afternoon, my name is Mattia Iodice. Today, with this presentation, I'm going to present to you the current state of a project on periodic behaviour in Internet measurements. I am working on this project with these two gentlemen.
In particular, it is our research on periodic Internet measurements but DHCP Internet route level.
I'd like to begin with why the reason we use RIPE Atlas. The main reason it's because it's the largest hardware measurement platform available and compared to the softer one it gives us a better accuracy and the accuracy is a key requirement for our purpose. Indeed, the process of automatic periodicity in France is not a hard attacks especially when you are facing with measurement voice inaccuracy so probably RIPE Atlas is the best solution for this type of work.
On the slide is hot periodicities impact network management. You can characterise the stability between two hosts and if you are an operator and you want to check if load balancing policies are properly working, you can perform a periodicity analysis on a trace route data. You can also use it to study BGP instability from a different point of view and to define stable Interet regions. Finally, if you have a problem and a study about periodicity may be useful to understand the impact in this field.
Okay. Let's start from a trace route data analysis. Here you can see an upload of a trace route sequence from a probe and an rank from an anchor measurement data. In this diagram you have a trace route IDs on Y axis and the sampling instances on the X axis. It means that the same height you have the same trace route path. It's easy to see that there is a continuous a termination and from this kind of scenario you want to understand if there is some periodicity and eventually characterise it in terms of trace routing involved and time isolation. What you have got is something very similar to, and the detect the periodicity is well‑documented in literature. The main problem is it's not possible to use your usual tools because we can't get the trace route in other words there is no defined matrix that allows us to use this for trace route to form a correlation without being dependent from the choice of trace route IDs. So in order to solve this problem, we first tried converting trace route to strings and ordering them by string distances from the most common one. But this approach doesn't work in all the situations. So we modified auto correlation performing an exact match between trace route IDs in this way we managed to do it.
This is just another view of how it works. Let me show you this, this is an example of performing this kind of analysis. In particular, thanks to the use of trace route which is a parameter in ranking measurement. You can study the sponsored custody of the measurement. In this way it's very easy to check if the load balancing is properly working or not. Using this approach, it's also possible to discover the topology of a network from the outside, in particular from a set of trace route you can build a graph of connections of a cluster of nodes or of the data centre. We have applied the technique on a set of 50 IPv4 anchor measurement and we observed that in 10% of the analysed peer, there was some kind of periodicity. In particulars we have followed conservative approach so we considered periodic only ‑‑ an observation periodically if you show a list to a trace route sub sequences occurring. It's a very interesting result because the sampling frequency of RIPE Atlas is about 900 seconds. There are some problems of Ale asking for example.
We have not identified the technique to allow its application also in BGP context, in particular we executed a process of periodicity characterisation on a set of unstable Internet prefixes reported on a list. Due to the huge amount of data, we could only perform analysis on a restricted time window, on an observation no longer than six or seven hours. What happened is that in many cases, the analysis showed a presence of recurring state of regular intervals. This is an example of what I'm talking about, showing some catcher on BGP play. This is affected by a 570 seconds isolation. So, the frames indicated by the red arrow show a recurring state. It means that every ten minutes the system reaches exactly the same configuration. It's a scenario due to the establishment of BGP bed gadget or a load balancing policies not properly configured and it's probably one of the best examples of how periodicity may affect network management.
In conclusion, we think this kind of analysis may be more useful if associated with a utilisation tool, especially in a trace route domain. It would be interesting to compare BGP instabilities in trace route isolation. And this approach of the network management for several reasons. So, the result is a development of the a system able to identify nominees in periodic measurement such like the ones Atlas produces or BGP Play or a misconfiguration in BGP data. In the specific case of trace route, we want to integrate such system into TraceMON, which is a new tool for visualisation trace route and for network trouble shooting that will be presented later next Wednesday on MAT Working Group. We wanted to rate such system into TraceMON into the visualisation and discovered issues.
Anyway, it's not sufficiently explored. So if you have questions or if you have ideas or some consideration, you can contact us at this e‑mail or if you have a question, thank you for the essentially.
(Applause)
CHAIR: Okay. If anyone has any questions, we have a couple of minutes.
AUDIENCE SPEAKER: It looks like we don't have any questions. So thank you very much.
(Applause)
Next we're going to hear about monitoring your BGP routing at a glance.
LUCA MARZIALETTI: Hello. I am from university also. With this project I am happy to show you you this new proposal to monitoring BGP routing at a glance. This project was made with the corporation of my colleagues before, with Professor Guisseppi Debatissa.
When we are talking about BGP ‑‑ when we are talking about routing, we also talking about BGP. That's clear. That's fine, because these are the questions that maybe are often asked. So, how do trace my prefixes on my sources available on the Internet or if the routing is stable. A good answer to this question is the BGPlay, of course, and it's a tool which shows the visibility by means of a graph. Each path shows the best part of BGP connected by the origin which is the node in red. The other nodes in black and blue are the AS probes and in black we see the the autonomous systems. That's good. But sometimes the graph gets much complicated and topology the entire interaction with this tool, maybe it's not so perfect to do a monitoring tools. It's good for analysis. So, we were talking ‑‑ we were thinking about a new proposal, a new tool, that were works in cooperation with BGPlay. This shows the stability of a prefix. These are much simpler tool because we have no interaction with this one. And just showing a snap the BGP for something prefix. The BGP string is a BGPStream graph, and the you can see that on the access is the time. On the Y axis there is the percentage, obviously. So there is the model. The colours associated to an autonomous system, so for each range of time you can see various strings of visibility. You can fork it on GitHub and you can try it on the URL I can show you later. We found this this approach was already used by the, but we don't know much about it because there is no tool, that is not online, so we can try it. These compare about BGPlay and BGPStream graph. This tool is not made to be one of the opposite. They are made to work together. What I mean:
We can see the BGPlay, we can see how the viability of the routing arrow on the single path. Ensure the single ‑‑ but requires interactivity to get the job. Otherwise BGPStream graph is not interactive, or maybe is less interactive and so if and what happens. This could be much clearer with a demo. I have taken from Geoff Huston, he put some instability prefixes and tried it on my tool. What you see here the just a simple example showing a single target with just two upstreams. In red and blue left and right for the both tools. That's the demo and I synchronised it down. You can easily see the changing of the baths on BGPlay, and the changing of the area BGPStream graph. But remember to see that you have to do it in animation. BGP shows just a capture. So, you see here the changes and here in the peaks.
Okay. Just go more in detail about Stream graphs. Stream graphs are quantitative model so they show the visibility on the percentage so get a better hint. They also trying to develop a heat maps model that is showing the steps between BGPlay and BGPStream graph. The heat maps is built for each row just a single probe and for each column, just the single one but they are already in alpha testing so they are not so ready.
We put the kind of representation there are some problems. What do I mean? The layering, is a factor of M, you have an an element you have a factor of tentatives. To do a good drawing, we had to rate the drawing and we build up on the metrics and the heuristics to get a very good drawing about it. So we also found a very interesting to try the data structure, that permutation of the datasets very easily and it's very comfortable to use.
Now you can get it on the URL if you download the slides.
Now let's see some outage.
So in the 2015, Facebook down outage, here is really clear what happened, because in each of the states you can see the visibility is stable. But in some moment the visibility goes down and after the accident, the visibility just returning the same configuration. That's clear. This is another accident, the YouTube knock‑out of Pakistan Telecom. This is a hijack and here it's easy to see in in moment there is the Pakistan AS is taking all the visibility. Then the fix about this announcing two more specific.
These are the other two examples. These are submarine cable cut in Ireland. In a moment, everything is stable and you can see here the dropdown is just a single one. Again we got the stable routing in the same configuration.
This one was the research attack was the DDoS attack, again you see the initial configuration stable but in a certain moment, it's ticking in the middle, it's taking part of the traffic and again something happened and come back in the same configuration as before. You can just click in the moment and go to the BGPlay and see what happened there so you can drill down.
So if you have any questions. Thank you.
(Applause)
AUDIENCE SPEAKER: Hi. Thomas from DFN. Just maybe I missed the point but what exactly is the Y axis showing? It's a percentage, but a percentage of what?
LUCA MARZIALETTI: A percentage of visibility. Sorry, I interrupted you.
AUDIENCE SPEAKER: A percentage of visibility. Can you go a bit deeper, what do you mean by that?
LUCA MARZIALETTI: I mean we have with the probes, the probes collect the announcements of BGP. In a moment a prefix is available on many upstreams and these upstreams lead a pile of traffic. With this one you can see which upstream is is taking care of your routing. So, I don't know if it's clear, but it shows the percentage of visible taken by this these upstreams.
AUDIENCE SPEAKER: Geoff Huston. I am still very confused. Your sort of introduction said I'm going to talk about visibility of prefixes across the Internet. It's something we have all sort of worried about because when you look at RIS or route views, not everyone actually advertises the same set of routes even though we call it default. But then your talk talks about specific prefixes against some Y axis that I still don't understand over time and you seem to be tracking a prefix in some limited context of visibility that a small set of BGP systems are giving you information on. I am kind of lost here in the talk about exactly what bounds your visibility and why.
LUCA MARZIALETTI: I don't know if I take the question. But all the drawings are made on the same data of BGPlay so what BGPlay is the stream graph sees. So, the data comes from RIS. I don't know if...
AUDIENCE SPEAKER: But that it comes from RIS is what I heard.
AUDIENCE SPEAKER: I can give the answer. So the Y ‑‑ Massimiliano Stucchi, RIPE NCC. The Y axis basically is the person of peering session with our route connecters.
CHAIR: Thank you, Massimiliano. And now for our last lightning talk is ‑‑ thank you very much. Sorry.
(Applause)
CHAIR: For our last lightning talk is a talk about antishutdown policies. I'm sure this will be slightly conversion so I want to remind everyone before we get to the questions, please think of a way to formulate your question to make it very short.
ANDREW ALSTON: Firstly, thanks for having me.
Okay. We want to talk a little bit by the antishutdown policies, the rationale for this and why we proposed it before AfriNIC.
Let's get into it.
The policy on the antishutdown that was proposed before AfriNIC firstly attempts to define what an Internet shutdown is and we came up with a definition that says an government ordering blocking access to the general Internet. Said definition doesn't preclude a government from censoring content that is not legally permissible within laws of the said country, on the provision that said sensor ISP does not include a law that says you can block anything irrespective of what it is. We went on to say that the policy creates a definition to define when had a shutdown has occurred. Internet shutdown is deemed to have occurred when it can be proved that there was an attempt failed or successful to restrict access to the Internet to a segment of the population irrespective of the provider or access medium that they utilise.
What is the policy actually say in the first draft? We said, if the states want to shutdown the Internet for 1 months after they have shutdown the Internet we'll stop allocating them any resources. If they shutdown the Internet three or more times within a period of ten years we'll revoke their resources. And the policy goes pretty wide in the first draft, includes all the government owned entities and entities with direct government relations.
So, what's the rationale behind proposing something like this? Shutdowns between June 2015 and June 2015 had an estimated cost of 22 billion dollars to the economies that were affected. In November 2016, there were 63,000 domains, sub‑domains of the Cameroonian ccTLD from the stats we have been given by domain tools. As of three days ago after 94 days of shutdown the figure had dropped to 31,000. We speculate that's because people said enough and they took the business off the continent and out of the country. That has a dramatic effect on the economy and it has a dramatic effect on the Internet. So shutdowns are hurting the African economies. They are hurting investment, they are damaging to the Internet ecosystem and something had to be done.
So what do we hope to achieve by proposing something like this?
Firstly, shine a spotlight on an ever‑increasing problem. The problem is increasing.
Secondly, we want to create a debate, let's hear the ideas that are out there and find a way to put an end to she is shutdowns. We know that the policy is draconian. We know that it's flawed in its first instance, so let's take the ideas out of that debate and modify the policy until we find something that works for the whole community. But we have got to create the debate first. The next provision we have taken some of the comments that we have had back off the policy list and we have said, how does this figure? And someone in AfriNIC are saying there is a shutdown. AfriNIC then goes back and said to the community, okay, where is the evidence? Give it a two‑week time period. Then hand it off to the governance committee and said adjudicate on the evidence and make a decision. We also added a definition of what a partial shutdown is, as you see on the slide there. To try and define what happens if people partially shut down the Internet and it include it in this policy.
We have also said we'll create an exclusion for academia because there is a lot of places where academia is state controlled and you don't want to turn off academia. That would be bad. And then limit the policy to target the state directly, and only entities in which the state holds a 50% or greater shareholding.
So we are reducing the potential collateral damage of the policy in the second draft.
Now, we have heard some other ideas. They are not part of the policy but here they are.
It was proposed to us that why don't we just ask ICANN to strip the countries' ccTLD. That we thought has a potential for rather some high collateral damage, so rather not at this point, but it was something that was suggested.
We have also said let's change the policy. Let's not revoke the space. We will put a rather massive financial penalty per day that you are shut down and then we'll turn around and say to the ITU you have an UN mandate, you go and collect that financial penalty and then you can give it to the Internet freedom of speech advocacy groups. It's an interesting idea that was proposed. What we're looking for here is to create the debate. Open the floor and let's see how we stop this. Because it's costing the economies, it's costing the people, and if we are going to stand for freedom of speech and freedom of Internet, we have to stand up and do something.
So what's next? More debate, more dialogue, more ideas, more pressure. And the fight goes on. We can't claim any longer that this is somebody else's problem. It affects all of us. And it's time that we as an Internet community take a stand, whatever that stand is, but let's debate until we find that stand and do something about it.
And so, finally, we are always open for dialogues and suggestions. The e‑mail addresses are there, they are on the policy. He also get on the AfriNIC policy debate list, have your say, let's have the debate, open up the floor and put an answered to this because it's hurting people. Thanks for hearing us out. We know it's not perfect but we welcome your Internet. Hopefully there is still no rotten tomatoes in the room. But questions...
(Applause)
CHAIR: So, thank you very much. Because this is the last part of the lightning talks, we want to keep the mikes open as long as there is continuing discussion, but I will interrupt when the time is officially up so people that want to go outside and get coffees can go do so. So...
AUDIENCE SPEAKER: I'm Daniel, I am an Internet citizen. Have you considered the damage that you are doing by using our buddies or our communities of for self‑regulation to address this political issue? And do you just consider it peculiarly damage or is the principle much more important than anything else.
ANDREW ALSTON: The answer, firstly, is yes, we did consider at length the potential ramifications of a policy like this. At this point, though, those ramifications are unsubstantiated and we welcome the debate for people to actually put those forward. We also looked at the by‑laws for AfriNIC and whether or not the RIRs are truly a political and there's a clause in 3.4.5, I believe, of the AfriNIC bylaws which says that the RIR, and its mandate, should be lobbying for legislative change, that is taking a partisan position that is not apolitical. That clause, interestingly enough, came straight out of, it was a copy and paste when the AfriNIC bylaws were created, straight out of the APNIC bylaws. And so, yes, we realise that there are potential ramifications, but I would say this to you: a government that shuts town the Internet and in that communications blackout uses that blackout to commit atrocities, people die, and if I have to trade that for the potential ramifications of creating a debate, I'm going to create the debate all day, every day, and I'm going to sleep soundly doing it.
AUDIENCE SPEAKER: Malcolm Hutty, concerned Internet citizen. Thank you for your passionate piece of advocacy. I have apparently quite a lot of sympathy for the concerns that you're seeking to raise, and I wish you the best of luck in stimulating the debate and shining a spotlight on what is clearly a real problem. However, in my opinion, your proposal stinks. And the specific reasons I would say, are that if we were to do what you did ‑‑ what you are asking, it will have ‑‑ it will only exacerbate the very problem that you are seeking to deal with. This community, the RIR communities are here to enable those that need Internet resources to have access to them so that they can use them. We are not here to take away Internet resources from people that are doing bad things we don't like. If we cross that line and start to become a body that takes away Internet resources from people we don't like, we will become very quickly just a tool for exactly what you are talking about, what you are highlighting. So, for that reason, we really should be very, very, very wary of going down that route. Even in your own terms.
Secondly, clearly you are asking this community, and the RIR communities generally, to set themselves up in direct one to one head‑on confrontation with governments, not on a matter of a particular policy, but simply a pure tussle for power as to who has more power to exert policy control. That's only something that we will lose if we set something like that up and you would get to then these resources again under the direct control of the governments that are doing the kinds of things that you're complaining about where it will only again exacerbate the problem, not mitigate it. So, I respect your concerns and I support what you're trying to do in terms of shining a spotlight on a real problem here, but do not do it this way. It's a terrible idea.
(Applause)
ANDREW ALSTON: So, if I can respond to that. Firstly, I respect your concerns and I understand the position that you have taken, but I look at it like this: so firstly, the argument that the Internet address space will simply end up under the control of governments. In 2006, the ITU stood at a meeting in Cairo and they said, let's make all the Internet space under the governments, we'll put it all into national Internet registries, and one said, okay, great, how are you going to divide the resources before the countries? And the ITU said, we'll divide it up by country GDP. They made a mistake because they did that in front of the 40 of the world's poorest countries. That idea went nowhere. We don't believe that that could actually happen. We understand that,yes, the policy is flawed, as we have said. We want the ideas, but at the same time to sit and say that we should do nothing because we don't have a better idea, let's face it, all the RIRs have released statements, so has ISOC, saying we condemn this, but shutdowns are now sitting last year at 116 separate shutdowns and rising. So, we can sit and go, somebody else's problem. But when is it going to stop? Where does the responsibility actually lie, and, if it's not us, do we sit and wait for somebody else to claim it and do nothing? Are we not being the Pontious Pilote of the modern day? Washing our hands?
CHAIR: We are officially at the end of time, so those of you want to stay and keep discussing, we'll keep the mikes open for a little bit, but everyone else is free to enjoy some coffee.
AUDIENCE SPEAKER: Martin, Internet citizen also. I'm really thinking again and again of this very bad solution. So, think just two seconds about dictators, they exist. So, they consider publicly or not publicly, that Internet is their enemy. So with this solution you are really serving them, saying hey, so we will revoke your resources. Good, thank you! You want to apply for more space. Thank you. That's what exactly you are giving them. Think of those dictators and think again of your solution.
ANDREW ALSTON: One quick comment on that. One of the most ardent supporters around the AfriNIC list of this policy was the Zimbabwean regulator POTRAZ who came out in favour of this. Now, you are thinking why POTRAZ, why Zimbabwe who is run by Robert Mugabe, etc.? And the argument was simple. Yes, the government tells us to shut off the Internet. But up until now we have nothing to show them that this will actually cost them. This gives us a way to push back to the government, and the regulator itself, a regulator that has done shutdowns is disagreeing with what you say.
AUDIENCE SPEAKER: One last comment: There was a known debate in France about financial allowance to poor families for their children. Some of the recommendations were to cancel the financial allowance if the children behave badly. Think of that also. So, you are going to accelerate that objective. Full stop.
AUDIENCE SPEAKER: Emile Aben, RIPE NCC with, a question from a remote participant. This is Mark Elkins from Posix Systems asking about what about making this a global policy?
ANDREW ALSTON: That's an interesting one and it is something that we have thought about and said maybe we should have the ASO look at it. I, however, think that your views on this policy are also dictated by the socio‑economic status and about the situation you find yourselves in. So, I'm not sure that this is a one size solution fits all. In looking at what we faced in Africa, it probably is more applicable than in certain other areas. But it possibly could be modified. As I said, we welcome the debate. We welcome the ideas and we'll modify based on what we get. So it's not something we are opposed to. Let's hear the ideas.
AUDIENCE SPEAKER: Hi. My name is Steve Crocker, I am Chairman of the board of ICANN and I am here to help you.
I just want to speak to the one small point that you put in there that maybe you would formulate requests that ICANN should take out names from the route zone for the affected CC TLDs.
ANDREW ALSTON: Just a comment. That was an idea that was proposed to us. Not something that we have decided whether or not we support. Just need to add that.
AUDIENCE SPEAKER: While you are thinking about that, I want to add just two points of context. The idea of taking names out of the root as a mechanism for dealing with whatever problem is perceived has come up once in a while and even has been claimed to have happened, but it hasn't, there never has been an incidence of that. But even taking the proposal on its face value, let me tell you from where I'm sitting how to proceed in order to make that happen. We have put an awful lot of work into a multistakeholder model precisely to prevent capture by one group or another by governments or whatever, and so the process that we have is that there would have to be a policy development process would have to come presumably through the CCNSO, the country code name supporting organisation, and it would also be subjected to review by the generic name supporting organisation, by the governmental advisory committee, which includes representatives from various governments. So, if you want help initiating that process, we have a staff standing ready to help you. My estimate is somewhere between two and five years from the time that you initiate the process in order to get to a point where we can have a decision about whether to proceed or not. And I'm sure that that will be immediately responsive to the problem that you have. Thank you.
ANDREW ALSTON: Thank you.
CHAIR: I'm just going to say that I'm closing the lines, so, we'll get to everyone standing and then that's it.
AUDIENCE SPEAKER: Hi. Aaron Glenn, general Internet curmudgeon. I don't mean to be flippant, this is obviously a very important subject and I fully support doing something about it, but my question to you is, what exactly are you doing by revoking allocations or IP space other than updating a data space, what is the end result? Do you hand out those IPs to other organisations. Do you expect DPS RIPE database because those IPs were removed. I don't see large tier ones that make money from these governments saying, oh, well, sorry, you are not in the RIPE database, you are not in the Afrinic database, you are not in the RIR database, so we're not going to route your traffic any more.
ANDREW ALSTON: Yes, we expected that question. And the answer to it is this: We know that potentially revoking the space is not really feasible. It may have some effect on limiting reachability for people who are respecting bogons lists if the RIR updates them, etc. However, the proposal, as I said, was there to start a debate and create ideas that come to a final solution. We wanted the debate. That was one of the methods that we could have put in there. There are a number, to create the debate. The debate is now happening. I have got people here at the microphone into your coffee break, and this is great. But, we know that it's not perfect. We want to hear the ideas that are better than what we have got. And let's hear them and let's do something is the simple point.
AUDIENCE SPEAKER: So you are serious about it. Dmitry. I have a question. You said three strikes in ten years is bad so the government killing Internet, everyone is five years, okay, right. I mean, actually, they can do this every three years by the time the next government is out due to the queue or maybe another dictator, they say you know what, we are a new government, we are not liable, so why don't do this three strikes in three years? I can put many other arguments against your proposal but I just ask you this.
ANDREW ALSTON: Again, I, 100 percent agree, with you. The three strikes in ten years rule was something that we actually put in thereafter a number of ‑‑ a bunch of feedback coming back saying you kind of need to give a warning shot first. If you look at the idea however that's come back about the financial penalties option. There is no three‑strikes rule. So, there are a number of options, and again we want the debate. We are looking for what those options are and what I'd say to everybody here is that get on the lists, e‑mail the authors your ideas, right. Let's find the right solution, the authors are already come out with a draft two that's about to be published based on feedback. We are not saying that our solution is the right solution. We're saying let's debate it until we have the right solution.
AUDIENCE SPEAKER: Well, good luck with that.
AUDIENCE SPEAKER: Leslie Daigle. I want to agree with others that I think this is ‑‑ certainly is a very important problem and it's a problem that needs to be solved. I might be accused of having penned some of the statements that ISOC made several years on earlier instances of outages, but I want to point out that in everything that you said, as I understand it, is pivoting around the concept that if we do this we will stop the problem in its attribution. And I didn't see anything actually in what you presented that proved to me in any sense that that was even proved.
ANDREW ALSTON: It's an interesting one. Because when we originally proposed this, we thought, we'll create a debate within Africa and it probably won't go too much further and governments may well just laugh it off and ignore us. The reality is that's not what's happened. The Kenyan government ran a two‑page article in the newspapers saying, we'll be on the floor to protest this. We know that other governments have been in conversation about this. We are watching governments saying they'll be at the AfriNIC meeting. Now, that tells us something. We have hit a nerve. And they are thinking about it. And the question now is, next time they want to shut down, are those thoughts going to play in their mind? Again, as I said, we know this is not a perfect solution. But ‑‑ the debate has to happen.
LESLIE DAIGLE: Right. But the point about the solution not being perfect is, it's not even a solution, it's actually brinksmanship. It's excellent there will be some discussion in a venue about the problem, but the problem with pursuing this particular line of reasoning in these venues is the second point I wanted to make was I thought your reference to past activities was a little too shorthanded, reference to in 2006 the ITU was debating what happens if we have all of the Internet resources to distribute at our disposal. Well, many people in this room spent a large part of the next decade going to a bunch of IT meetings in many parts of the world to explain in very short words wherever we could that this was exactly the kind of thing that didn't work for a number reasons and why this open multistakeholder environment was created to handle resources in a way that was neutral, that helped people get on the Internet and, as somebody else said, didn't take people off the Internet, so having the debate in those terms is actually seriously undermining the credibility of this entire process, and that's why I'm saying, into my coffee break, I'm all over trying to solve the problem of the shutdowns. It's not these people. It's not this room. I'm happy to have the dialogue but this is a solution that fits no one.
(Applause)
AUDIENCE SPEAKER: One before ‑‑ I will try to be short. My name is Alain Durand, and I'm speaking on my own name not on behalf of any employers, past, present or future. For full disclosure I work for ICANN right now. I'm very concerned about a policy like this one where we try to single out something and say we are against is so we are going to do something about it and we are going to remove resources. What's the next one? It's a very slippery slope. There is going to be another community that does something we don't like and we are going to remove resources. For example, we don't like the government which doesn't have a democratic process, we are going to remove resources or we don't like a government that doesn't support gay and lesbians and transsexuals and we are going to remove resources or we don't like them because they have black hair or red hair or whatever hair or blue hair or... whatever reasons of the day that we don't like them and this is a really, really bad direction to take this community to. I'd rather like this community to focus on the mission which is to allocate addresses and maintain the accuracy of a database. That's the core mission. Policing the network is not the core mission. I think this is a bad direction.
ANDREW ALSTON: My final comment on this. Why this issue? Why did we pick this issue? The United Nations is trying to classify the Internet as a human right. I heard Microsoft on a session I was moderating saying that they are fighting for a Geneva‑style convention about the Internet on a human rights. This is not just about what we like. It's not about black hair or red hair or whether or not I like the way your face is or what you have got to say. This is about what we believe is a fundamental human right. This is about whether or not people are being really seriously hurt. This is about the fact that if I look back in 1994, there was a communications blackout in one country, except they kept the state media on. The next hundred days a million people died, and the world didn't see it till years afterwards, until it was too late. Yes, we can say not our problem. Yes, we can wash our hands. I can't do that any more. We know that the solution is not perfect. We want to find the right solution. But to say that we can't have the debate. We stand here promoting freedom of speech. That's what the Internet is all about, that's what this community is about. So say that an idea is too dangerous to debate is to stand against the freedom of speech which we have all stood for. That cannot be right. It is a contradiction in terms. That's why we have an open multistakeholder debate. That's why anybody can propose the policies because we respect that freedom of speech. So, if an idea is too dangerous to debate. How does that gel with the idea of freedom of speech and the ability to promote those ideas. But like I said, thanks for hearing us out and I look forward to the debate. I look forward to the comments, and we'll see where we go from here.
Thanks very much.
(Applause)
CHAIR: So thank you very much everybody. Please remember to rate the talks, rate the Plenaries. Rate the lightning talks. It's all on the website.
If you are interested in hearing more talks like this or less talks like this, please submit your application to be on the Programme Committee to help decide on what comes up on stage.
So that's the end of this Plenary. The Task Force starts in about 10 or 15 minutes or so. So if you are interested in BCOP, which hopefully you are, get some coffee and come on back and if you are not, enjoy.
(Applause)
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.