In recent posts I have commented on topics in electronic trading on the one hand, and telecommunications on the other. It is time to unite those two threads.
I have also been following discussions on LinkedIn regarding low latency networking, and have been working on related projects with clients.
Building a low latency network is not a “black art”, but it does require a solid understanding of how networks are built, as well as an appreciation for the tradeoffs that telecommunications carriers make when building networks.
I will start with some general principles and then focus in more detail on the specifics.
The first thing to understand is that any engineering project involves trading off objectives. As the old joke goes, “I can build it fast, cheap, or reliable, pick any two.”
Let’s look at some examples for the design of networks:
Most telecom networks (i.e. networks built by AT&T, Verizon, etc.) are designed to scale well, to VERY large numbers of users. After all, network construction is capital intensive and tends to reward firms that can service a large number of users. Additionally, networks generally follow Metcalf’s law, which says that the network becomes more valuable as the number of users increases.
But the design choices that enable scale are bad for latency. A basic design principle to achieve scale is to employ hierarchical designs. Rather than attempting to connect every user directly to every other user (in a “full mesh”), hierarchical networks collect traffic locally, aggregate it, deliver it to regional concentration centers, and then route traffic back out to local centers and out to the users. This kind of “hub and spoke” design (looking much like a family tree) scales well, but forces traffic to take paths that are longer than necessary and to pass through routers and switches that add latency. If you have spent any time transiting ORD, DFW, or ATL, you will know what I mean. What works well for airline economics (and helps keep fares low) isn’t so great if you are in a hurry.
More generally, telecommunications carriers optimize their networks for:
- Transporting large amounts of bandwidth. This is no surprise, as most telecommunications revenue is ultimately denominated in units of bandwidth.
- Multi-service provisioning, i.e. the ability to carry (and sell) many different services on one network. Carriers make substantial investments in their networks, and they want to spread those costs over as many customers and services as possible. T1s, T3s, cell phone backhaul, traditional phone calls, text messages, internet traffic… The more services, the more revenue. To achieve this, carriers build networks using layers of routers, switches, muxes, and multi-service access nodes. These enable carriers to provision everything from T1 to text messaging, but add overhead and latency.
- Servicing population and business centers. Willie Sutton robbed banks because that is where the money was. Telecommunications companies build networks to serve population centers. When laying fiber from NY to Chicago, telecommunications companies try to pass as many population centers along the way. So a typical network that carries traffic from NY to Chicago might go up to Bighamton, over to Buffalo, then down to Cleveland before it goes on to Chicago (or down to Philadelphia, over to Pittsburgh, up through Akron and Cleveland, etc..) Take a look at the cable routing in any country, and the lines are far from straight; they zig and zag to stop by as many cities as possible. Who cares if it adds a few milliseconds, your text message will still get there fast enough.
- Cost. The cheapest way to build long-haul networks is to use existing right of way, which for the most part means rail lines. Qwest started as a railroad spinout, so did Sprint. Conveniently, railroads also zig and zag to visit key population and economic centers too. Try to take an Amtrak from NY to Chicago without passing through Buffalo, Pittsburgh, or Washington DC (not to mention Albany or Philadelphia); You can’t do it, and neither do most telecommunications companies.
- Traffic patterns that are “typical”, e.g. peaks from “American Idol” messaging or Mothers Day. Since most of the carrier’s money comes from a very broad market, they do not design their networks to consider the latency requirements or traffic patterns (e.g. peaks at US Market Open rather than Mothers Day) of trading.
- Operational efficiencies… For example, telecom companies routinely “re-groom” circuits to free up capacity on certain links, to facilitate maintenance, etc.. There operational convenience might mean a sudden change to your latency!
None of this is intended as a criticism of the traditional carriers. It is a simple matter of the engineering and economic tradeoffs that drive the large carriers to build networks optimized for the people who pay them the most money. And electronic trading is a very niche market for them.
Networks that have been built specifically for financial markets (e.g. BT Radianz, Savvis) do somewhat better. The engineers who built those networks made tradeoffs differently than telecom companies usually do, optimizing for trading traffic, focusing on the geographic markets that matter to financial services, and focusing on services (e.g. IP multicast) that facilitate market data. But while these networks do better than general purpose telecom networks, they are still not ideal for the current generation of high frequency, low latency trading. While they were built for trading, they are still trying to address a larger marketplace than the ultra low latency HFT market, and they are international in scope. For example, one of the main objectives of RadianzNet was to provide the ability to rapidly connect financial customers to one another, and so a large “community” was key to the success of that objective. But scalability to large communities gets in the way of reducing latency.
So when BT Radianz developed its low-latency network “Ultra Access”, and when NYSE developed SFTI the engineers removed layers of routing and generally made decisions that favored reduced latency at the cost of scalability. (For comparison, BT Radianz’ RadianzNet was designed to scale globally to tens or hundreds of thousands of clients, where BT Radianz Ultra was designed only to scale to thousands of clients in localized geographies.)
So how do you build a low-latency network for trading?
Well the first conclusion is that if you REALLY care about latency, you are going to have to build it yourself. While networks like Radianz and Savvis are great for quickly connecting counterparties and they reduce the IT load relative to firms that manage their own networks, if you really want the lowest in latency a shared network is just not going to cut it.
So where do you start? First, simplify the topology. While hierarchical designs are cost effective and enable scale, the lowest latency design is a simple point-to-point network. In practice for most firms that will mean multiple point-to-point links, one to each venue. That is more expensive than traditional networks, and doesn’t scale well, but there are only a handful of low latency matching engines to consider.
Next, strip out every possible layer of equipment from the design that you can. The fastest solution is a direct connection from a LAN port on a server (e.g. 1G or 10G Ethernet) directly to the WAN link.
What kind of a WAN link? In almost every case this will be either a “lit” wavelength provided by a telecom company, or a dark fiber that you light yourself (a traditional T3, OC3, OC12, or similar service will add overhead that you don’t want.) By buying a lit wavelength (1Gig Ethernet or 10G Ethernet wavelength), you are eliminating almost all the carrier overhead (SONET Muxes, Multi-Service Access Nodes, etc..)
You may even want to go further and buy dark fiber. Simply put, dark fiber is fiber installed by a carrier that you light yourself. It may seem that there is little difference in having a carrier light the fiber versus lighting it yourself, but there may be a number of benefits to leasing dark fiber and lighting it. All things being equal, dark fiber will provide a lower latency solution:
- Dark fiber allow you to isolate traffic from different trading strategies and different matching engines onto separate wavelengths at minimal marginal cost (i.e. once fiber and equipment is leased/purchased, cost of adding wavelengths is typically very low.) This eliminates all possible sources of queuing delay that can occur when multiple trading strategies are using the same wavelength.
- You can select your own equipment to light the fiber, and there ARE real differences in latency among different equipment vendors and configurations. (Carriers rarely choose or deploy optical equipment based on latency, focusing much more on the number of wavelengths that can be supported, variety of interfaces supported, operational considerations, etc..) If you light it yourself you can optimize for latency rather than optimizing for bandwidth and multi-protocol support.
Plus you know you have control and security, since you have dedicated fiber that nobody can tap, and you can have confidence that your network will not be re-routed or re-groomed.
Of course all things are not always equal. At this level of engineering, speed of light becomes very important. (The speed of light in fiber is not the number we all learned in high school, i.e. 299,792,458 meters per second. Depending on the refractive index of the fiber, the speed of light in fiber is roughly 2/3 the speed of light in a vacuum.) So a “dark fiber” route from NY to Chicago that is 100 miles longer than an otherwise comparable “lit” wavelength could be 1-2msec slower (round trip) because of the increased mileage. But all things being equal you should be able to get lower latency and less queuing if you light it yourself.
So when designing your wide area links, consider dark fiber, but be very careful to understand the routing of that fiber, the optical mileage (which is longer than the route mileage to allow for some slack in the fiber) and the type of fiber (which determines things like the refractive index). With lit services, make sure to get actual latency measurements, not just SLA numbers (that are usually padded to minimize risk for the carriers.)
Building an ultra low latency network isn’t a black art. It just requires an in-depth understanding of how networks are built, right down to the fiber routing, and it isn’t the least expensive way to go.
When dealing with really long distance(like 5000 mi+), the latency is apparently a problem for many applications. Is there a way to allow the signal to travel faster than the light? Even if the refractive index is 1, the round-trip latency won’t be smaller than 133ms to travel to the opposite side on earth(assuming it’s a shortest route between those 2 points). We probably need something to break the law of physics to greatly lower the long distance latency.
Thanks for the comment.
No, there isn’t any way to have a signal travel faster than light. There has been speculation about faster-than-light signaling using quantum entanglement, but that is theory, a long way (if ever) from reality.
The answer isn’t breaking the law of physics to greatly lower long distance latency though. Engineering is all about building things within constraints (like physics, budget, etc..)
While it is true that latency is a problem for many applications, to me that indicates either applications that are poorly designed, or use of an application in a way that wasn’t intended. If the application needs to operate over a really long distance, the application designer needs to understand latency and take that into consideration in their design. This isn’t new by the way; See “the Fallacies of Distributed Computing” first articulated by Peter Deutsche (then at Sun Micro) in almost 20 years ago.
That doesn’t mean that it isn’t important to lower latency. It is. But you need to understand what you are trying to achieve and the costs of different alternatives. For some applications (e.g. telephony) the latency needs to be low enough to allow people to speak. Below that threshold, improvement is irrelevant. For other applications (such as many financial trading applications) what matters is not an absolute latency number, but having the lowest latency available (i.e. being first is more important than the time it takes to get there.) etc.. And lowering latency has a cost. That cost needs to be considered against the cost of alternatives. If your trading application has tens of milliseconds of overhead, you may want to spend your money on reducing that before you buy the fastest network.