Connect Working Group
Wednesday 10 May 2017
At 11 a.m.:
REMCO VAN MOOK: Good morning everyone. If you can all go find your seats or some seat or someone's lap or go outside or sit down on the floor or something, we are going to get started in about two minutes' time. Let's give this a start. Good morning everyone. If you are in this room because you are interested in Address Policy, first of all you have my sympathy; second, that is next door. So it pleases me to no end that we have managed to pull our loyalty cards and get connect upgraded to the big room this time, so you are welcome. Wave full agenda again today, I am one of the co‑chairs, Florence is is my dear co‑chair. We have a full agenda. This is the introduction, clearly. You have all obviously read the minutes that were sent out to the list a long time ago and you have carefully read them and you have no comments. Is that correct? Am I correct in assuming there were no suggestions or changes? Thank you. So, thank you very much to the scribe for compiling them and with this I would like to approve those minutes. The scribe for this time is the lovely Shelby, writing down all of the nonsense that we spout on this stage and trying to make sense of it so good luck with that.
Agenda: So this is the agenda. Anyone have any comments, suggestions, changes, stuff that should be on there, should not be on there? Going once, going twice. So this is the set agenda. And with that, I would like to invite up the first two presenters for IXP tools code print and Eyeball‑Jedi.
VESNA MANOJLOVIC: Good morning everyone. It's an honouror the first one in the Connect Working Group. My name is Vesna, I am from RIPE NCC, I am a community builder and I am here to say very briefly results from one of the events that we have organised. So, as a part of our community building at RIPE NCC we organise hack‑a‑thons because they are a great opportunity to bring together different parts of the RIPE community and to expand even to larger communities than the RIPE itself so we like to bring together operators, researchers, students, developers, designers, put them in the same room for two days, feed them ‑‑ and hope that they will come up with some genius results. So we were doing this for already three years and this is just a reminder of where you can find all the results from, all of our hackathons. The important part is that the code is contributed back to the community, you can find all of the results on GitHub, you can use them, you can modify them and contribute them back.
So, at the previous RIPE meeting just before the RIPE meeting in the weekend, we had a hack‑a‑thon about IXP tools and so a lot of those results were presented at the RIPE meeting itself, it was a largely vent, and one of the challenges that we experienced already for these several years is however intensive this work is it ends very quickly after the hack‑a‑thon is over so we wanted to introduce some continuation and some of the ways that we were going around that is that we invited one of the participants from the hack‑a‑thon to be an intern at the RIPE NCC so he is going to talk after me about his work as a part of that internship and the other way to continue the work from the hack‑a‑thon was to invite the winner, although we don't really do winners, but to invite one person to join us for the next event and the next event was a smaller one called Code Sprint, so we had Petros and Vasileios so we invited some more people, so two intensive days of coding and only smaller number of projects, we were working on two concrete projects, we published the results and again, there was a lot of brainstorming and we were also very proud to show off our new offices in Amsterdam and of course, everything was powered by...
So, what were they working on? Petros was working on modifying the IXP country Jedi, is a tool that uses RIPE Atlas trace routes, does the mesh between all the probes per country and tests are these paths staying within a country or not, are they going through local Internet Exchange or not. And so we wanted to have different views of this information and one of the ways that we achieved that is to show the view per network, so you can now go to this interface and drill down and find your own network and see who are your peers and do your ‑‑ do the paths between you and these peers go outside of the country and if you do not have direct peering what are the networks between you, how do the trace routes look like and so on. So this was one of the projects. And the other project was continuation of the detecting remote peerings so the small team that was working on this said we want to make this useful for the third parties, for the other organisations and so they developed different API calls and that enable everybody else to build up on this research and use the data that is processed by us to build this into their own dashboards and look up their own information that they need so you can now find the views there about the network, about the specific IXP and about the metrics that determine if the peering is remote or if it is local peering based on the RIPE Atlas data.
So these were these two projects. And also, the people who came by, they said oh, yeah, and we also continue the work of the hack‑a‑thon in our own time and we actually implemented one of these projects and put it in production so now you can see the so‑called bird's eye looking glass implemented on this URL and this is how it looks like. So, it's like a micro service for querying the bird.
So, that is T I am trying to keep it short because the agenda is quite full and there are many more interesting projects to be presented. And so, we are going tong with the hackathons, this year we will have at least another one and we are planning to continue next year so if you are interested, come talk to me, if you are coder just apply and join us, if you are an operator you want to be an expert or propose some projects that are interesting for you, also you can do that, you can sponsor us and if you are a developer yourself go and grab the software and modify it and use it for your own organisation and let us know because we would like to feature that on RIPE Labs and to show what are the consequences of us having these hackathons and we are always looking for more success stories. Thank you.
(Applause)
Petros Gkigkis: I am pursuing my university degree of University of Crete and currently doing internship at RIPE NCC and I am going to talk to us about new prototype tool called Eyeball‑Jedi and we try to see the RIPE Atlas covered in ‑‑ which can be moved around and anywhere in network and users typically access from the network, the Internet from limited physical area, typical area at site of work.
So, if you want to study how end user, it's ‑‑ you have to study the networks that determine where the connection between the users. So, the main motivation for this tool is to discover RIPE Atlas covered in the largest user population networks and how these networks interconnect each other. But also, identify the difference between countries and some kind of correlation between countries in same region or based on the country size.
In previous RIPE hack‑a‑thon the IXP was presented ‑‑ IXP country Jedi selects all the ASN that is have at least one probe and do full mesh measurement and here we can see the AS to AS metrics rose in columns are sourced in ASNs and the colours of matrix are designed by trace route path goes out of country or stays inside the country and is it across an IXP or not.
But today I am going to talk about the eyeball networks, with term eyeball networks we refer to the user facing networks with the largest user population. APNIC provides user estimates per ASN. We use two different thresholds to select the eyeball networks for every country. The first threshold is we select only the ASNs that covers at least one percent of total users inside that country, and the second one we use cumulative fraction of 95% of the Internet users in that country. We have average coverage of 90.5% for country but we have some outliers like Russia, is an outlier due a very highly diverse eyeball ecosystem. To better understand where the connection between eyeball to eyeball networks we created a structure matrix called eyeball to eyeball matrix. Here you can see the eyeball networks for Canada and again rows and columns are sourced in ASNs respectfully. We colour the size of the boxes and is according to the APNIC estimate per user per ASN and we colour the boxes based on trace out path, if it stays inside country we colour green and if not orange. You can see there are some boxes that are coloured grey, grey indicates that in that eyeball network there is no RIPE covered so please if it's your ‑‑ if this is your network please apply for a probe.
Next for Hungary we can see that 63% of trace out path between eyeball to eyeball networks stays inside the country, and as I said before Russia is an outlier and we can see how much of eyeball he can could system, we don't measure due to ‑‑ we use for at least 1% of total population. And this is world map of RIPE Atlas covered of eyeball networks, so for every country we take all the eyeball networks from the APNIC data set and if you have at least one probe there active we say that we cover that network. And you can see that in Africa, a large part of Africa is grey so we don't have coverage there. And also the case of Russia, which is yellow and we have low coverage, the first threshold that I mentioned.
The limitation of our work are ‑‑ is that we measured traffic paths and not traffic volume. Also RIPE Atlas is vantage points are a bit biased sample of connectivity in a country. We look at gee location accuracy, because some databases that provide geolocation services, not enough accurate to geolocate routers. And we don't have ground truth validation for APNIC estimates.
Ow future work is to investigate more sophisticated probe selection strategies to find all the possible diversity of ASNs and those are identify intermediate ASNs between eyeball to eyeball networks. Moreover, we focused to it study on eyeball networks towards other eyeballed networks enabled countries and finally, study eyeball networks towards popular CDNs.
The Eyeball‑Jedi tool is publically available in this link and you can check your country or network.
And some actions that we want to ask you to take is please check the first link and if you are an ambassador or network operator or a user of a network for RIPE Atlas doesn't cover please apply for a probe. We want your feedback to tell us what we should focus on, and if you have some kind of data set about user per ASN please make it free to the community.
So, thank you, if you have any questions, I will be here for the whole RIPE meeting.
REMCO VAN MOOK: All right. Thank you.
(Applause)
Any questions for Petros or Vesna who has already gone back into hiding? No. All right. Thank you very much.
(Applause)
Then next up is Bijal. You are awake, excellent.
BIJAL SANGHANI: Hello everybody. It's good to be back here at the RIPE meeting and the Connect Working Group. So, I am going to talk about the IXP database and tools today. So what do we do? (Euro‑IX) we have two forums a year and maintain and develop the website, a database and tools, we do an annual report on IXPs, we have different programmes to help support IXPs and we do that via fellowship and mentor‑IX programme where IXPs can, an existing IXP can help out a new with any issues they may be having, whether that be commercial or technical or just somebody to talk to, it's a good programme for new IXPs. And we have a benchmarking club.
We have a mailing list with lots of different Working Groups, we have a newsletter if you are interested to hear news about IXPs, please join, you can you be describe there and of course we are on social media, the usual places, so if you want to find us, follow us or like us on Facebook.
We are an association, a membership association of IXPs and currently we have 81 affiliated Internet Exchange points, while it says euro in the ‑‑ in our name, we are actually a little bit more than Europe, I can say global almost, we have members from Brazil, Japan, India, Africa, so we do cover all the continents as well.
We have had two new members already in 2017 and that is AA S.I. X which is in Casablanca and the bare rut IX who have just joined.
We have 13 patrons, we don't have a sponsorship model but patrons and there are if you like, our ‑‑ they related in the industry with IXPs and they have an interest to meet with the IXPs, so they can join the membership as patrons, and those is our current list.
So, on to the, what I really want to talk about is IXP database. So I mentioned the website earlier and what we have done in the last year is we have made some major improvements, you may have seen the old website during presentations that I have done in the past but we have done some major improvements on the website and this is what the front page looks like now. The top five improvements include ASN automation, so IXPs can now actually send their ASN information automatically, wave switch database which now contains a lot more information about the switches, and route servers as well that the Internet Exchange points are using and this goat down to the model, the type, the software and we found this really useful for Internet Exchange points that may be having problems with a certain version of code they are using and they can just quickly look up on the database and see if there is anyone else who is using the same code, they could approach and speak to.
It may sound trivial but we are now users are able to edit their own profiles and we have the peering matrix, the service matrix and ASN database all working and keep updated.
So, we changed the structure of the way that we actually collect the data of the exchanges and what we did is put in an organisation because what was happening before was all IXPs were just listed, so whether an ‑‑ whether the organisation was running one one IX or two or three it was all kind of listed separately so what we have ton is created an organisation profile and within the organisation profile you then have the IXPs listed below that, so, for example, here you can see a AMS‑IX is organisation profile and under IXPs you can see all the different IXPs that they manage. And this is the front page and this again you can see a kind of like a snapshot of what the IXP offers and the traffic graph as well and this is for each and every IXP in the database.
So I mentioned that our ASN database, we had a little bit of glitch where it wasn't collecting all the data and this is now fully functional, and IXPs are now updating their data whether that is automatically or also available manually for those that can't do it automatically yet, and again we are trying to push more automation in this area.
Again, the IXP database it, we have various different statistics like how many participants, we have broken it down into the different regions so Euro‑IX covers Europe, APIX covers Asia and lack IX covers Latin America and Af‑IX covers Africa and we have a global look and you can see the unique ASNs in each of those regions as well.
The IXP database, so this is the most common IXs so this shows the number of IXes that are particular ASN is connected to, and so not only does it say so this organisation is connected to 80 IXPs in the drop down list you can see where IXPs they are connected to and of course this can be really useful to you are looking to join an ex‑changes or particular exchange where you are trying to find certain networks so you can use this tool and see where they are and that would be a good way to figure out where you want ‑‑ which IX you want to join.
The switch database ‑ again this is just a quick look at the information that we collect from the IXPs on their switches and their route servers. We have a service matrix and again, this is a really, really good tool if you are a peering coordinator and if you are looking for a lot ‑‑ if you are looking at a large number of Internet Exchange points and you just want to have a real snapshot of what services they provide, you can do this here and you can see the number of ASNs connected, whether they are doing IPv6, whether they support Jumbo Frames, whether they have re‑seller programmes, it's quite a big matrix and I couldn't get it all on to the page but feel free to go and have a look there for information about the exchanges.
And we have a peering matrix and here you can see, you can compare and see which ASNs are ‑‑ how many ‑‑ how many ASNs are connected to the exchanges and then compare them to different exchanges as well. So all these, the blue numbers here are clickable and you can see a lot of information from that.
We have improved the search tool and so now not only can you do a search for an IXP or an ASN or an organisation, but you can also use the advance filters and say, well, I want to see which ASNs are connected at this exchange point but not at this exchange point or are connected at this exchange point and at this exchange point so if you are looking for unique ASNs that you may want to connect to this is a great tool to do that so you can see where your network wants to go spell if you are looking for particular networks that you are connecting to.
So, IXP member list. So, we started doing some work a couple of years ago and wanted to make these member lists available for peering coordinators and what we did was, a few years ago we had, since I have got some time so I can tell you a bit more about the story, a couple of years ago now we actually had a hack‑a‑thon and this was looking at the IX‑F database and also with peeringDB and we looked at the data and the Internet Exchange points that were present at both and we is it a compare to see which exchanges were still alive, which were inactive and ones that we couldn't find any contacts for so what we did was greyed out out all the ones we couldn't find contact for were not active any more and to do a bit of a sink. With that, we found that there was a lot of things we could do with this data and something that would be useful. As a starting point to try and encourage networks and IXPs to do a bit more automation, was if they could easily get access to the IXP member list. So, on the couple of slides before you would have seen that we collect ASN information and obviously this is critical to that. So what we decided to do was, use ‑‑ create an IXP member list in a JSON format, so at the moment what this contains, and there is a lot of development work that we are continually doing on this, but at the moment what this contains is both IXP data and the IXP participant data so you can find information about the IXPs, this includes the member list, locations, the switches, the route servers and like I said, there is more development work on this so there is more items we are going to be add to go that.
The good thing with that is that it's open, consistent and because of the way that it's designed it's easy to scale and grow. We currently have 24 IXP independent implementations and you can find that list, the list of those in GitHub. But as well as collecting this through automation we also have the data available which is entered manually by the IX's through the websites of the IXP A. So for example at the moment for Euro‑IX members and IXPs can enter their data manually through their IXP profiles and that information is also pushed into the API for everyone to use.
So, so there is an OpenSource implementation in IXP manager, so those of you that who don't know what that is, that is a tool for IXPs to it manage their services, their network and lots of other good things there as well. So, any IXPs already using IXP manager already that is built in facility so it just means that the data that we are collecting is going to be more up to date. More information about the member list can be found on the IX‑F website so it's ML dot IX‑F .net and the source is all available on GitHub.
So, could you sit there and think, well why would I want to use this data? I can get this from the IXP website, I can log on to hundreds of different IXP websites and collect this data by myself. What this gives you, what the API gives is you a ‑‑ it's one API and it includes many IXs, and I think the key thing here is that you get all the information in a standard format and the data is fed directly from the IXPs and IXPs own the data so therefore they have the most accurate data.
And like I said it's portable, and scaleable.
And here is a ‑‑ and here is a quick use case from Andy, who is unfortunately not here at the RIPE meeting but just to thank you for him for providing this example. Using the API, who am I not peering with at LONAP, you have your script and what you need is a complete list of peers to compare the differences, and that is where the API comes in.
So, this is the API, and this is your information on the IXPs and you put in the existing aJason sees and it throws out a list of all the networks at that IXP that you could peer with. And this is a very simple and small example but you can see that there is a lot of potential for this to grow and become even more useful.
So, we are always in search of accurate information and this is key, especially as we move more into the world of automation. You don't want your router scripts picking up IP addresses or AS numbers with typos so what this means is that there is networks can go to different places to get their information and then compare if they need to. This increases the accuracy. There is tools and portable available on the Euro‑IX website and we are looking into development for the other IXP As so API X, lack IX and Af‑IX will have their own interfaces in which they can locally add information and one of the reasons nor in South America they like to do things in Spanish or Portuguese and this allows them to have an interface in their own language which of course is a nice to have as well.
And also because we have the structure of the federation, the IXPs all have regional ‑‑ a regional group so when we have the different various meetings we are all encouraging kind of like our own group of IXs to keep their data updated.
And the database in services are complementary of other databases out there and also the RIRs and the NIRs and peeringDB of course.
So, we are in search of funding. We have got to a pretty good point, whereas you have seen from the example, and we need to have funds for continued development and this includes an interface to the database for all the IXP A, that I mentioned. We want to be able to fetch data from a central database, from other external databases so we can do validation and check data. We want tone Hans the data and of course outreach. So, if you think that this is an interest project and it's something that your organisation would like to support, then please come and speak to me later.
There is, if you are interested in IXPs, there is a global IXP discussion mailing list and you can subscribe to it there.
So while we celebrate 25 years of RIPE this year, last year we celebrated 15 years of Euro‑IX. So the little baby that came out of the EIX Working Group turned 15, and during ‑‑ and we had ‑‑ we decided to do sag bit fun and during our Monday dinner, Serge came to the meeting and both of us together we presented the forum Oscars, the F Oscars, and one of the reasons was to acknowledge some of the work that is done within the community that isn't otherwise kind of noted so just again when I say thanks to AMS‑IX, for their secretarial services, for twelve years, INEX for work on manager, NIX.CZ for their continued work on bird and Linx Malcolm for their regulatory affair contribution. Malcolm regularly attends the forums and gives us updates on what is going on.
While we celebrate 15 years, we thought let's take a look‑back and this photograph in 2001 was taken in 2001 and that was actually the first ever Euro‑IX board, and what was really good was that while ‑‑ during the meeting in Krakow, they were thrall again so we took the opportunity to take a similar picture and you can see that nobody has aged at all and they all still look completely the same. But approximate it's just to show that this community kind of sticks together and is ‑‑ goes on. Lastly, we have a video, so if you ‑‑ if your family members or your friends are like what do you do and how does the Internet work and why are you travelling all the time, show them this video, it's five minutes long and it's excellent and gives a really quick view of how the Internet works and I know for my family at least they have a better understanding of with a I do now.
The video is available in various different languages, I am looking to expand that so if your language isn't there and you are interested in translating this video then also please contact me. That is all. I think I made it right on time.
(Applause)
FLORENCE LAVROFF: Thank you, Bijal. Anybody has any questions? That is a moment. I don't think so. So, thank you, Bijal, and we will move to the next item on our agenda, we now have presentation from peer Karl owe whipped. Unfortunately he cannot be here, here at the moment so we have to remote video. I have a couple of words to introduce him, he works as system and network administrator for IXP, his research and development interest include Internet measurements network tat analysis and orientated OpenSource software. And we are going to start now. Thank you.
PIER CARLO CHIODI: I am going to talk to you about a tool that I am largely working on, ARouteServer. We know route servers deployed at Internet Exchanges are ‑‑ you correct to them, you announce them your route and they redistribute your route to all the other members that are connected as well. Sometimes they perform validation and filtering, sometimes they offer functions that allow you to control how these routes are propagated, sometimes they are completely transparent. ARouteServer my tool is ambitious project that I am developing during my free time. Its goal is to hopefully help Internet Exchanges to build automatic and feature reach configurations for route servers. It is a common line tool written in Python and powered by the ‑‑ it's OpenSource software and of course everyone is welcome and encouraged to contribute to it. Currently it's bird and over BGPd ‑‑ the tool includes a framework to perform validation of the configurations it builds. This is done by running live tests on machines, we will see more about this later.
Let's see now how it works. Here it is. This is a general picture of how it works. Let's go on with the next slide. At first the general policy and the list of route server clients are configured into jam YAML files. The general policy outlines filter options features that in the hand that it will contain. An example is provided in the box on the left side of the slide. The client's list contains the set of members that are connected to the route server and their details. AS number, IP addresses, AS macros and optional client specific settings. Some external data sources can be used to reach the final configuration, IRR database to gather information about prefixes and AS numbers ‑‑ and peering B D can be used to get the prefix limit and JSON can be used to automatically build the list of clients
In the hand the route server configuration is generated. Here, our route server stops. It's up to Internet Exchange stuff to build a script that automates testing and deployment to the production route server. Now, let's talk about policies and clients' configuration. As I already said, options, features, filters and so on can be set on a global scope using definitions, the clients inherit these options but they can also be configured with client specific settings, that in the case of a ‑‑ global options, these allows to have a fine control on the configuration of each member and also to handle exceptions or ‑‑ cases. Can be automatically generated starting from Euro‑IX Y son files as those exported by the popular IXP manager. Towards now ‑‑ every Internet Exchange is different from others and may desire to implement custom behaviours in its route server. So, I plan some hooks that can be used to inject customs bits of configuration into the main configuration built by ARouteServer. These allows both to keep the features offered by the tool and to implement custom functions or behaviours. Let's see now a brief overview of some of the features included in the configurations with using this tool. The full list of features is reported on the documentation site.
Well, next‑hop enforcement, to prevent traffic diversion, it can be configured in strict mode that is the next‑hop of a route received by the route server must concede with the IP address of the neighbour that announces it, or it can be configured to allow the next‑hop attribute to be any address among those used by announcing autonomous system to connect to the exchanges reaches. This is useful when members have multiple connections to the Internet Exchange itself. And AS path sanitation, filters are configured to be sure the left most AS number in the path is one of the member that is announcing the route, and to avoid routes with unvalid AS numbers in the path and also avoid let's call them transit‑free numbers in that different from the first one.
RPKI validation can be enabled tool, currently ‑‑ and also IRR database‑based filters. Of course most of these features are optional and can be enabled or disabled by operators as they want.
Now, here is a list of some functions offered to clients. Blackhole filtering with the optional rewriting of the next‑hop to allow Layer 2 level filtering on the exchanges reaches ports and route propagation control implemented using BGP communities to allow clients to decide how their routes they announce to the route server should be propagated to other clients. And things like announce and do not announce the route to anyone or to some specific client and the same for prop ending and so on.
As I said, our web server includes a framework to test a configurations it builds. In short, the containers are KV M machines are used to run of bird or Open BGPd and ‑‑ a sort of API is exposed by ARouteServer to ‑‑ test case that is interact with those instances and that verify the expected results are met. For example, in a configuration where blackhole filtering is enabled any sense of a route server client he use to announce some type of prefix for which traffic should be dropped. And the instances that represent other clients are queried to ensure they receive those prefixes with the expected blackhole next‑hop address.
The snippet of Python code reported in this slide is what is needed to run this test case.
Some built in scenarios are included within the project and have been used during the development of these tools. Of course, more customs can be written by users to validate their own configuration. Hopefully comprehensive guide is provided within the documentation.
So, the tool is in a better service and any feedback, suggestion or review is very welcome. The source code is on GitHub where you can also find some examples of BIRD and Open BGPd configurations built using this tool. I hope someone will find it useful and please let me know if you want to test it out and need any help. That is all, many thanks for your attention, any question?
(Applause)
FLORENCE LAVROFF: So, if you have any questions, feel free to let us know and we are going to set up a session, a Skype session with him remotely since he is not with us right now. Any question, anybody? If not, we just jump to the next topic. Okay. All right. Let's move on. Thanks a lot and talk to you next time. The next topic is presentation from Christian Urricariet about the latest trends in data centres, it's an update from a presentation that we already had a couple of years ago in another RIPE meeting so Christian, thanks for being with us. Let's go.
CHRISTIAN URRICARIET: Good morning. Great to be back here at the Connect Working Group. So, I work for Finisar, for those of you not familiar with Finisar, we are a US‑based company. We are the largest supplier of optical transceivers, worldwide, based in US, again, present in multiple places around the world. We mainly sell directly to data centres and carriers as well.
So, I want to talk about status in the optics and data centres, about 100 gig deployments and then a few words at the end about 200 and 400, give you an up attachment after many years of 100 G being deployed in the routing space and this is getting into the data centre, there are very high volumes of optics being installed starting in hyper skills in the last year or maybe early this year even. Those ‑‑ that delay was caused for multiple reasons, the optics pricing hadn't gone down in many years. This was partially due to the fragmentation in industry, some of you may remember CFP 2 and 4, also different PMDs and optical interfaces that were required by many different end users actually caused this fragmentation and didn't allow the industry to invest in manufacturing capacity and cost reduction based on volume.
The deployment however has been begun, we anticipate several ‑‑ they will be deployed in the next two two three years and have a very long tail, those deployments will continue for quite a long time.
The demand is very high, some of you may be suffering long lead times or non‑availability of optics or 100 G optical suppliers such as ourselves were really stepping up to the challenge of increasing capacity in order to support the huge ram that we are seeing right now but lead times continue to be very long and will continue to be for quite some time. We mentioned proliferation of different optical interfaces, there are still many different options out there and there are some to very different applications, some of which are shown there on the right so if if you combine 100 gigabit Ethernet, 128 fibre channel as well as some of the ITU codes you have this letter, huge amount of optical interfaces, that prolive ration has impacted interoperability availability of multiple sources and again cost reduction based on volume. We are seeing some trend away from that, seeing consolidation of demand in some of the codes like CD M4, some of those will be easier to achieve.
The models out there today that are being deployed, they include 25 GS FP 28, next generation servers as well as QS F for switches, 100 G Ethernet optical transceivers, there is some active optical cables as well that have the cable together with two transceivers, that are used in some environments for point‑to‑point applications that do not go through structured cable and this is used for break outs, in some cases for ‑‑ 4 to 1 configuration into the server.
Many different optical interface types are still available out there, including multi mode, single mode as well as parallel and duplex cabling, as you can see in black to the LR4 and the SR4, which are the only IEEE standardised interfaces, there are many others shown in red and based on MSAs or solutions in some cases that tend to support specific needs for specific applications, for example extending the reach of multi‑mode to 200 and 300 metres, extending the reach of single to 20 and 40 kilometres for data centre interconnect this many cases, as well as the one on the top right which is a duplex multi mode version for 100 G that enables use of legacy fibre, legacy duplex fibre and I will talk more about that coming up. But again, that prolive ration continues driven by many different applications. If you look at typical data centre and it's hard to generalise because of course different data centres have different needs and philosophies but if you are trying to integrate more or less what we are seeing that transition from 10G and 40G to 25 and 100 G, it's happening in the way I show there, primarily using multi mode optics in the rack in connections to server, when we go into spine switches we see a combination of multi mode and single mode codes, most notably CWDM 4 which is being installed now in very high volumes in some of the major hyperscale data centre players worldwide. The connections to core switch on routers continue to be primarily using LR4 and CWDM 4. This different codes will transition and I will touch on this at the end into higher speeds of course, essentially going to 50 G into the server as well as 200 and 400 G into the switch although we are ways off from that and I will talk about that at the end of the talk.
So what are some of the other applications that are out there? To the regular codes that I just mentioned there are some specific requirements coming for example from hyperscale suppliers, Facebook is one has publically announced they had a need for very cost‑effective 100 G QS FP 28, adapted best to their specific well controlled infrastructure conditions so they came without specification where they took the CWDM 4 and made it a light version of it, they want from two kilometres to 500 metres, they have also at the creased the case temperature range from 0 to 70 C, 15 to 55 C to create an optimised solution, lower cost, they have shared that specification, they called it CWDM 40 CP, out there in GitHub publically available. So this is an example of how specific user has adapted existing technologies and now this can be procured by anyone used by anyone.
Another example of how the market has reacted to specific needs comes from smaller data centres, non‑hyperscale, maybe some enterprise users that are looking to upgrade their existing network to 40 G and 100 G but want to use existing fibre infrastructure, don't want to change to MP O but keep duplex optics so the industry has responded with technology called SW D M which enables that type of application, essentially maxing four different optical wavelengths into single fibre for transmit and one for receive, and that is a multi source solution by many ‑‑ several systems as well as optical vendors and that is already available now in the market, it's an example of how the industry has gone beyond the standards to solve specific problems for 100 G deployments.
Let's talk about outside data centres now, how to interconnect data centres for DCI, Data Centre Interconnect, when those rates went from 10G to 100g and 200 it was necessary to to convert from basic NRZ 1 and to inherent transmission using receivers in particular to support 100 and 200 G over 80 kilometres, so several systems support this solution by this type of poxes, many of you may be familiar with, coherent modules, there is even actually a white box version of this, pizza boxes was announced by one of the ODMs, so the concept of open networking, of white box open hardware is now going into DCI as well again using coherent optics, plugable optics on the line side. Optical vendors are not stopping there, we are investing already in technologies to enable higher speeds for DCI, things like 400 G or 600 again coherent technology for next generation are in the works are being developed as we speak.
So, let's talk about what is coming next after 100 G. As you will know, the request the need for bandwidth is not stopping, how do we go from 3.2 to 6.4 terabit or 12.8, so standardisation has begun for this rates that would enable those boxes ‑‑ pump four modulation modulation, the industry is going into a four‑level type of mod later called PAM4 and vendors are investing already in those modules, in those technologies to enable, modules QS ‑‑ these are module types you will be hearing in the future that are starting support these higher rates. This required new technologies in optical components, more advanced vixels, as well as new generation of ISCs to support band 4 and manufacturing infrastructure so testing and the manufacturing itself, that would enable which is new for the industry. But that is all work had a has begun already.
So what we are talking about is shown there so essentially going from the second line, 3.2 terabit box to a 200 G or 400 G 32‑port switch, the market we envision will be fragmented still, there are some large users of bandwidth that will go to 400 G directly, some others will stop at 200 G, either because they don't have the need for bandwidth or they want lower cost in 200 G or less tolerance for risk availability for will take sometime and it's plagued by technical risk and if you want low risk go to 200 G as intermediate step.
So, I am not going to linger too much on the standards unless you have questions but there are standards work in 50 G to the server, 200 G for switch and next generation 100 G that is ongoing, the IEEE about multi mode and single mode. They are all based on PAM 4 technology or modulation. Has began earlier and focusing initially on enabling first generation 400 G which is routers and DWDM clients. There is discussions already going into four wavelengths based on 100 G PAM 4 technology, CFP 8 is modules you will hear to becoming available in next few years.
My final slide 100 G has long life and provides the right solution at the right price finally after many years of being available. Many different variants each supporting different applications and technologies for 200 G and 400 are already being developed.
So I appreciate your attention, I will be outside actually if you have some more questions on optics, we have a table there during the break.
REMCO VAN MOOK: Thank you very much.
(Applause)
Any questions for Christian?
NICK HILLIARD: Could you give us some insight into the supply chain problems that are causing the shortages of 100 G transceivers?
CHRISTIAN URRICARIET: You are the current shortages in the industries have to do with single mode and they have to do specifically with manufacturing capacity, frankly many of the system, end users did not forecast correctly the optics they needed, in some cases they didn't foresee the demand, some other cases counting on technologies that were promised by many start‑ups and the availability of those products like PS M4 for example was not as available as they would think. So they had to go back to CWDM 4. We saw this huge up sport, it's an industry problem, not Finisar, that we are trying to go through. Very hard, investing in capacity is possible but it takes time, we are talking about many hundreds of thousands of modules a year that have to be manufactured and we are stepping to the challenge. Finisar and others are investing but it will take time. It's not a specific technical problem or fundamental technology problem with component, it's pure and simple manufacturing capacity.
REMCO VAN MOOK: We don't want to run too much out of time.
AUDIENCE SPEAKER: Alex from X Cloud networks. I am curious about break out cables on single mode, so it seems they are not available what is the reason and is there any chance to see them in the future.
CHRISTIAN URRICARIET: They are not as common by using a transceiver and separate path ‑‑ or an active optical cable, multi mode is more common, certainly in multi mode you achieve optimal cost if you have a very short reach break out cable is the cheapest way you can interconnect and break out configuration. You are correct that there are very few suppliers out there for both four by ten single mode and also four by 25, that comes in the case of 4 by 25 comes to the issue that I just mentioned PS M4, and even though there was a lot of promise by many vendors in reality today very few of them are shopping product out there, due to some technical challenges. In the case of 4 by 10, it's essentially market that is smaller than multi mode and that is probably why you are seeing a lower number of suppliers.
AUDIENCE SPEAKER: Tom Hill. I was actually quite interested to see the SWDM4 developments that you are working on, I was quite interested, it sounds like a nice option and ‑‑ no one likes playing with MPO cables, I was wondering whether or not you see that coming down as cheap as SR4 or you think it's going to be a premium option, it would be very nice to have the option to pick one or the other but if there is going to be a price premium for the materials it might not work out so well.
CHRISTIAN URRICARIET: Right. In general, we can discuss privately with any of you that is interested more specific on price but there is going to be small premium, it will the same order of magnitude as SR4, may be a small premium because a bit more expensive product but the saving in the infrastructure itself will offset that and that saving is because you can use legacy fibre and OM 3 that you have today running at 10G so you don't have to install new fibre or even if you install new infrastructure you only use two fibres instead of 8 or 12 so it's similar order of magnitude.
We are not talking about LR4 pricing what is the single‑most solution using similar concept but much more expensive for single mode.
REMCO VAN MOOK: Thank you very much.
(Applause)
Next up is somebody who needs no introduction because he always introduces himself, here you go.
JOB SNIJDERS: Good morning. I work for NCT communications, and this morning I would like to share with you some recent developments that have happened in the IETF standards body that should make our operations more efficient.
Some of you may operate networks and have come to notice that communication with your BGP peers is not always straightforward. You may not have the correct contact details or may be talking to the wrong department or there can be other challenges. So we have set out to create a additional communication channel to ease operations. Currently, there is no way to signal through BGP why a session is down. We can signal the session is now down, but that doesn't tell us anything of why that happens or when it will be back or are you depeered, did you not pay the bills, is there maintenance, who knows. The only method we have of signalling this information through BGP is by flapping /24 and creating imaginary on the stat.ripe.net BGP visualiser. And this particular image has taken me two months now to create and it will take another six to eight weeks for it to finish. So this is not the path forward for communication between our peers.
And we all know the phenomena where somebody e‑mails the mailing list on an IXP saying, we are going to do maintenance, and then a few famous companies will respond, what is the circuit ID, when is this happening, where is my router? This type of communication is pure noise because it impacts a greater group than those that are just affected. To that point I introduce the BGP admin straytive shut down communication. This specific Internet draft, this is the structure, in the BGP protocol there is a cease notification message and this is the message that is sent to the other BGP speaker to signal the session is going down. As it currently stands, you have sub codes that indicate a reason, the most famous one is perhaps that you reached hacks mum prefix, but others are, for instance, that the session is re‑set, so you can expect it to come back, or that it's just shut down. And the mechanism in BGP allows for a very crude extensibility because nobody forbid that you can append data to that particular message. So that is what we did, we specified that if bytes come after that cease notification, those bytes should be interpreted as a string for human consumption. So what does this look like on a router?
Here we use as example Open BGPd as shipping in open BSD. What I am doing here is instructing the BGP Daemon through the C I tool to bring down the session with neighbour 16525425224 and leave a message in that neighbour's memory saying, this is the ticket ID, we are upgrading the software and we will be back in a few minutes. On the receiving side, in the syslog, you will see that the session went down and you will see that message that I typed in for your consumption. And the are result of this is that when you have a neighbour that uses this mechanism and they provide a ticket ID, that you can copy and paste this into your mill programme and can easily pull up the maintenance notification, should there be one.
But I hope that this will reduce the chatter between locks. For instance NTT follows up with every BGP session that goes down and we send an e‑mail, is everything right? And an e‑mail comes back yeah, I was just upgrading. If we were to read these messages in our syslog that says the session is going down, nothing is wrong we will be back or whatever they want to communicate, we can perhaps hold off on reaching out. And another example that I fear could exist on Juniper would be that you insert a disable key word and that you ‑‑ is used to communicate the particular message. It not only works for administrative shutdowns works when you clear a session. Currently if I clear with eBGP neighbour you may not know why I am doing this, perhaps we had prior communication or we didn't, perhaps I was prior to the clearing communicating with a colleague of yours that went off shift and you joined the shift and you saw the session is flapping. With this mechanism, we can perhaps assist each other in understanding why these sessions are going down.
And this is not a theoretical exercise; there is real software out there that already supports this. Open BGPd which is the reference implementation for this feature, the free range routing guys already implemented this and merged it into in their master, go BGP supports it it, and on the debugging side patches have been integrated in mainstream TCP dump and in Wireshark, so if you are a BGP develop you can use these tools to verify your implementation.
The IETF status of this document is, it's in IETF last call, that is one of the final stages before it can be published as an RFC, the next stage is IS G review, and they have to give their final approval whether this can be an proposed standard or not. Furthermore, we received some communication that some parties are perhaps interested in implementing this, for instance Cisco, Juniper and bird and what you as audience can do is give them more encouragement to implement this feature. If you believe this feature will facilitate your operations, please e‑mail your account manager stating hey, is this on your route map? Can I expect this at some point in the future? And I am not asking you to put commercial leverage on this, just notify them this exists and if enough people notify them eventually it will happen.
And as a reference, the open BSD implementation was between two and three 100 lines of code so this is really a small feature, this is very easy to implement, it fits neatly in the existing BGP framework so it shouldn't be a big deal.
Go out, ask your vendors to implement this.
Oh, and because apparently there are other languages than English, who knows? You can also use U T F‑8 and it's now legal there to send emojis over BGP. But of course, this ‑‑ this is the funny part. In reality, IETF has come to recognise that the world is bigger than just 7 bit clean as key, there are many character sets that are used by billions of people and this standard accommodates that. I want people to communicate in languages they are comfortable with and that is why U T F‑8 is a must. This concludes my presentation. Are there any questions so far?
FLORENCE LAVROFF: Just one or two questions, please, we are running short of time, so please. Go on.
AUDIENCE SPEAKER: Amazon. I was curious if you had any thoughts around suggesting standard format for the communication for certain events.
JOB SNIJDERS: This is a good question. And purposefully, BGP shut down communication is free form. The reason why is we, up until now did not have a mechanism to communicate anything that resembles structured messages like this and what I hope in the years to come as an industry, we come to a standard and I hope that standard gross organically, and if we see that arising, then we can standardise (grows) what the community convention is. But since we have no idea how this will work out in practice, I advocate let's start with free form and the second iteration we can perhaps do better.
AUDIENCE SPEAKER: From Amazon. I think that important thing on everyone's mind is, is there a depeering emoji?
JOB SNIJDERS: Thank you for your comments.
(Applause)
FLORENCE LAVROFF: We now have a presentation from Zbynek Pospichal.
ZBYNEK POSPICHAL: I am from NIC.CZ and a I would like about a new way of DDOS mitigation, we started to implement and we decided to implement in NIC.CZ because when we speak currently about DDOS mitigation in exchanges it's typical about ‑‑ we feel requires from operators and from the end users some kind of ‑‑ in the fact you are really helping the attacker to keep the destination, the destination of the just down for longer time. So, if there is ‑‑ why do implement DDOS mitigation in Internet Exchanges? The reason is easy, if you are the target and there is more flows directed to you as the other target operator, the traffic of the attack could easily create a traffic jam on your port you are connected to the exchange.and you can't do anything with it, maybe except the router ‑‑ blackholing I mentioned currently. So, we have been working on a design by the way, I have to say thanks to Erik Bais he presented in the last RIPE meeting in Copenhagen which inspired me a little bit and I have been ‑‑ when I have been working on this design, so, if we have ‑‑ if we have a DDOS attack against let's say, an attacked host behind the target peer ‑‑ try to imagine there is a DDOS attack against the host in your network, this way of mitigation you can redirect the attack traffic for just the prefix of the attacked which contained the attacked host or you can create also more specific prefix for this and you can redirect such traffic to a special box which is considered to make the mitigation somehow. It shouldn't affect any way all the clean peer traffic and you can still keep unchanged and unmitigate all the traffic going from peers which are the sources of the dirty traffic to you but to other prefixes. So, you really need only much smaller amount of traffic to put to the mitigation box. This is about ‑‑ this is about the general design. Now, about the implementation in NIC.CZ. We found that it is possible to do so with all junk box, all junk device we already have, so we decided to use old catalyst 6509 as DDOS mitigation boxes, which is not the best possible solution but this is zero budget solution and a solution we can implement just now so it's already implemented. We currently use 40 gigs for the incoming traffic and 40 for the outgoing but we found that with some ‑‑ with some tricks we can use this box up to 240 gigs of the incoming DDOS traffic and the traffic going out could stay at 80 gigs.
The hope is input interface is VRF or more if question need more than 80 gigabits because there is a limit of eight ports in ether channel for this platform, and there is a access list which pointed to a class map and pointed to service policy. There is ‑‑ the access and class created for each participant and each participant have to create a static route for each prefix and if we use more VRFs which is the trick how to avoid the platform limit with 80 gigs we can ‑‑ we have to static route for more VRFs. And of course we have available statistics so we can see how we can successful with the limitation of the traffic. We think that small portion of the traffic should stay there to see if the attack or already done.
So, this is the very simple configuration. We know it's not the best possible way how to configure boxes but currently an administrator mitigating the attack just should configure the access list which is here and the corresponding class map. Then there should be or there is prepared policy for rate limiting of the DDOS traffic, this is specialing of the UDP fragment attack facing box 147230244.1 a the operator should also this traffic route to the global routing table to for traffic to the final destination, which is the peer of the target network.
This is the statistics of mitigation of an attack, we still see we have about a quarter of maybe still going to the destination, still going to the victim, and we see we are dropping about 150 MEGS of attack traffic. Here you can see all the traffic of the target networks so just, we just need only very small portion of traffic to move to the mitigation device. It is working. It is working with zero budget hardware but we know it has a lot of limitations like you need enough skill ‑‑ people administering this, which is in a big exchanges little bit problematic. We also know that it's hard to automate in this case, yeah. And we have to consider if the development of some software which should filter the user from the platform and allow him to do just what he have to do and not what he can, would be more efficient than buying completely different DDOS mitigation platform which is controlled, for example, by ‑‑ from such companies. That is all.
FLORENCE LAVROFF: Questions? If you don't I propose we move directly to our next topic because we already short on time. Anyone? Cool. Thank you.
(Applause)
And now we have some cool announcement from Will van Gulik about Roman‑IX in Switzerland.
WILL VAN GULIK: I am Will, I am today representing newborn IXP, that is good. So, we are basically ‑‑ list of founding members here who are starting this, we are actually starting that in laws an, and we expect to go down to Geneva most likely Equinix if anyone is actually interested in connecting to us or is present we will expect the first packet to go and flow by the end of June. And so the founding members that sponsor hardware and so on so we are ready to go and run. So we have got like a really, really tiny website with not many information right now but it's starting so feel free to contact me and I will try to get you all the information you need. And I think I was fast and I hope I am catching up for you people, so...
REMCO VAN MOOK: All right. That was really short. Thank you very much, Will.
(Applause)
So now for the final presentation it's Nurani going to talk about outer space objects or something.
NURANI NIMPUNO: I realise we are at the end of the session so I will do my very best to give a very quick presentation. I did go to the school of ‑‑ 300 words per minute but I haven't lettered to give presentations in negative time. Right. So, if you have ‑‑ a few of us who have been in the IXP industry for quite some time and we can agree that the IXP model has proven to be hugely successful model. A lot of things have happened though in the time that we have all been involved with IXPs. IXPs started out as very small local interconnection solutions, and very much due to its success they have grown into something quite different. And we think a lot of these IXPs out there meet a very substantive need, but there is also become sort of perception of a current IXP paradigm which is that all IXPs need to be present in several data centres, to increase its reach, there are big metro networks with transport across larger regions and IXPs have additional Cloud services remote peering etc.. and we think there is missing piece in the landscape of interconnection and a few of us got together and thought about how can we meet this particular need and how can we create a solution that is focused on efficiency and focused on local interconnection. So we went back basically to the roots of IXPs, we have come up with a solution which is really all about local interconnection, single site, no metro or transport or Cloud, all about just local interconnection and the focus was on simplicity and efficiency but we wanted to do this with modern technology. So, we have come up with a highly automated model which is remotely managed and API driven one single management system and because we have come up with a fairly simple and efficient model, it is a model that can be replicated in lots of different places. So we think it's flexible and scaleable. And it also means that towards the customers that have one portal and one contract regardless of where they connect. And so this is what we call the Asteroid model, much needed complement to the IXPs that are out there. And our approach is also a bit different in where we build; we want to build Asteroid IXPs wherever there is a need but we don't want to take the approach of putting out dots on a map and see, we build it and they come. We actually do it the other way around so we developed a model of campaigns where networks can basically tell us where they need an IXP and the campaign model identifies that need and it also means that we won't actually build an IXP until we have enough support and an IXP will bring value on day one, it's live. If it doesn't have value we won't build it. Because we have this model that we think is very flexible we also want to work together with existing IXPs, there are a lot of IXPs out there that are commune‑driven or don't have the resources to upgrade and their portals or parts of their technology and we think that there is room for cooperation there. We also know from experience that a lot of places in the world that don't have good local interconnection solutions, there might be a good local community but they don't have the technical competence or the technical solution and we would like to work with those local communities.
When we started talking to people about where can we do a show case, a lot of people told us that Amsterdam would be a good place. Just to show what our technology can do. So we listened to that and we are going to go live with the show case at the end of this month. It is a live IXP, but the also an opportunity for people to try it out and figure out what the Asteroid IXP solution is about. And that takes me to it the end of my presentation. If you have questions come and see me or come and see Remco, we have here all week and we are happy to talk to all of you. Thank you very much.
(Applause)
REMCO VAN MOOK: Thank you. Since we are already in your lunchtime, I suggest that if you have any questions you go talk to Nurani. With that, we are at the end of the Connect Working Group, thank you all for bearing with us, I am very pleased to see we had a full room. And I will see you all next time. Thank you very much.
(Applause)
LIVE CAPTIONING BY AOIFE DOWNES RPR
DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.
WWW.DCR.IE