Before I came to Cisco I still wrote about Fibre Channel over Ethernet (FCoE) and did my best to try to help edumificate people on how the technology works. One of the most popular things I’ve ever written, in fact, was a comparison between FCoE and another convergence technology, iSCSI. Since that time I’ve come to learn and understand a lot more about both technologies, how they relate to each other, and how storage networks are designed and implemented using them.
Since Demartek published the recent piece on multiprotocol connectivity, which included some comparisons on the protocols regarding latency I thought it might be a good time to revisit some of those questions.
The conclusion was, effectively, that given the right set of circumstances and with proper planning, iSCSI has the potential to be a a great performance technology. Much of the real-world limitations, though, have little to do with the protocol (the speed of the arrays has a much greater impact than the network protocol itself, for instance, and poor topology architectures can have negative impacts on performance as well).
Because of this, the FCoE “versus” iSCSI debate continues to rage on, even after all this time. In general, as I’ve mentioned before, this is a false dichotomy: FCoE and iSCSI are different tools in the Data Center toolbox, and each has different types of implementations and for different reasons.
When it comes to performance, though, there simply hasn’t been a side-by-side comparison of the protocols using the same, multiple switches and the same interconnects. Demartek previously had an evaluation of Fibre Channel, iSCSI, and SAS host interfaces, but that was
- focused on the host interfaces and
- did not include FCoE and
- was not designed to measure across multiple topologies.
In this latest paper, however, Demartek addresses these questions. In an “all things being equal” kind of way, what do we find when we use the same equipment and the same topologies but change the protocol? To that end, with relation to each other, Demartek shows a couple of things of note:
- Each protocol — FC, FCoE, and iSCSI — show consistent results in average latency and standard distribution whether we’re talking about one switch, two switches, or three switches (topology diagrams are included in the report).
- In relation to each other, there can be a marked difference in protocol latency. For example, in the tests, the average latency is lowest for FCoE with a considerable difference than FC and iSCSI.
To me, the amazing thing about these results is that they were remarkably consistent across the board. Moreover, because these were Layer 2 networks, there was no routing going on for the iSCSI traffic, which means that the only time that the protocol stack became an issue was at the initiator and target.
This also puts to bed the notion that there is some sort of “encapsulation penalty” for FCoE.
The topologies are clearly laid out in Demartek’s report, but while SQLIO is a test suite that is designed to mirror real-world applications, this wasn’t a “let’s try to break the protocol” test. In fact, to my knowledge there isn’t a comparison like this done to this extent, just to explore what some of the differences might be.
When I sit down to really think about this, it’s rather interesting that we can use the same 3 pieces of equipment — Nexus 5500, Nexus 7000, and MDS Fibre Channel switches — and (for the most part) run any protocol at any point over any box (the notable exception is iSCSI on the MDS, but the others were total mix-and-match). In that way, we were able to check the differences between the protocols regardless of the topology design.
In a nutshell, latency is only one of many variables to affect the decision to use a particular protocol, but for some customers it is extremely important (even if they don’t always understand it’s relative context to other network performance benchmarks, like IOPS). At the very least, this testing does put some of these answers to light.
What does this mean? Essentially it means that when it comes to Ethernet-based block storage protocols, iSCSI may have specific use cases and advantages in certain deployment opportunities, but from a protocol performance perspective it appears that my initial estimate of the TCP/IP stack affecting performance was on target. Likewise, the elegance of the FCoE frame format encapsulation does, in fact, appear to have a significant advantage over iSCSI.
More than two years ago I hypothesized that this would be the case, and it’s good to know that we finally have the numbers to show it. Even so, this is only one data point, but a mystery (for me, at least) that has finally been solved.
Note: The twitter link on the profile is incorrect. I can be found on Twitter with the handle @drjmetz