The performance of any system can be only as high as the oscillator that defines and maintains its stability. While timing performance, such as stability, drift, aging, and Allan deviation, are well understood, their performance is rarely employed to demonstrate the cost savings that can be achieved. For example, data centers have thousands of servers, so oscillators that can maintain high precision in the presence of shock, vibration, electrical noise, thermal gradients, acceleration, and pressure, can significantly reduce the total cost of ownership. To understand how such tiny devices deliver such impressive results, let’s focus on the characteristics of both the oscillators and the data centers themselves.

The distributed database is arguably the most common function within most data centers. As an example, we can use the ubiquitous Excel spreadsheet, which in the case of a data center, would be a truly massive document. For the purposes of our illustration, let’s assume two people bidding on an eBay item place an equal bid around and appear on the eBay server at about the same time. So, who gets the product?

Keep in mind that in a data center, the spreadsheet can be stored on different servers or even different regions of the data center, which can be distributed geographically worldwide. In Figure 1, the red columns (including Bid 1) are stored on Server A, blue columns (including Bid 2) are stored in Server B, and another section for reference resides on Server C. To access them, the data center generates east-west traffic, which is the transfer of data packets between servers within a data center. East-west traffic represents at least 70% data center traffic, so reducing its amount is very important.

Figure 1: A distributed database divides its content into multiple regions in a data center.
Figure 1: A distributed database divides its content into multiple regions in a data center.

In our case, the bids appear so near in time that small fractions of time make the difference between who wins. In Figure 2, we show two scenarios — packet A represents Bid 1 and packet B represents Bid 2. In both cases, the timestamp when the bid arrives is surrounded by a “window of uncertainty” (WOU), which is essentially the error bars of the timestamp. In Scenario 1, the WOU overlaps, requiring the servers to generate more east-west traffic to break the tie. But, in Scenario 2 the WOU is narrower, so no overlap occurs. Scenario 2 creates a more efficient data center.

Figure 2: Minimizing uncertainty eliminates overlap and contributes to increasing server utilization.
Figure 2: Minimizing uncertainty eliminates overlap and contributes to increasing server utilization.

So how is WOU related to oscillator performance? To understand, this we need to understand the source of the WOU. Figure 3 shows a typical IEEE 1588 precision time protocol (PTP) servo loop. At left, the network traffic timestamps are extracted from the packet to steer a servo loop whose goal is to filter the timestamp and extract the “lucky” packet. This low-pass filtering reduces the packet delay variation (PDV) entering the network to extract the packet. During this filtering process, the job of the local oscillator (LO) is to maintain stability between servo updates. Every time a new timestamp appears, the LO will update the loop. However, between the updates, the job of the LO is to maintain short-term stability for the client.

Figure 3: A typical IEEE 1588 precision time protocol (PTP) servo loop.
Figure 3: A typical IEEE 1588 precision time protocol (PTP) servo loop.

The goal is to reduce the bandwidth of the loop filter to reduce the PDV from the network. The narrower the servo bandwidth, the more PDV that can be eliminated and the more accurate the timestamps will be. But reducing the bandwidth simultaneously increases the amount of undesirable LO noise appearing at the client. So the goal is to reduce the loop filter bandwidth only to the specific point where the noise at the client is minimum. So where does precision timing play a role here? We’ll discuss that next.

The banner spec conundrum

It’s interesting to note that when evaluating precision timing, the banner data sheet specification is “frequency over temperature stability” in units of ppm, or ppb (1 ppm or 100 ppb). This is the first thing people think of when selecting a precision time device. However, in PTP applications, the LO is disciplined to network timing, frequency changes (in the oscillator) that are slower than the servo’s update rate, and are filtered by the servo loop (because the oscillator’s noise is high-pass filtered by the servo loop). Thus, an oscillator’s frequency-over-temperature specification, which is guaranteed over the lifetime of the oscillator, isn’t a critical parameter here. What is critical is the short-term stability resulting from changes in temperature — and short term means the time between servo loop updates, or the inverse of the servo-loop’s bandwidth which is the loop’s time constant.

The way we quantify an oscillator’s temperature sensitivity is using a frequency-over-temperature slope specification, which we’ll refer to more simply as ΔF/ΔT (that is, the derivative of frequency with respect to temperature). This ΔF/ΔT data sheet specification quantifies the oscillator’s change in frequency with changes in temperature and has units of ppb per degrees Celsius. An ideal curve is a horizontal line centered at 0 on the y-axis, meaning the output frequency doesn’t change with temperature. However, in practice, ΔF/ΔT is non-zero.

Evaluating Figure 4, the better “data sheet” stability specification is Device 2, as it exhibits ±50 ppb frequency-over-temperature stability compared to ±100 ppb for Device 1. However, in PTP applications, the better oscillator is actually the one that has a more gradual frequency over temperature slope. So here, the ±100 ppb device provides better PTP performance, which can be counterintuitive.

To summarize, while the banner data sheet specification for precision timing is frequency-over-temperature stability, the actual determinant of performance in a data center (and other synchronization applications) is short-term stability dominated by thermal drift and reported in data sheets as ΔF/ΔT. The banner data sheet specification does not apply to network synchronization because it is a lifetime specification over 10 years. But 10 years doesn’t matter for a servo loop that is disciplined to the network multiple times per second.

Figure 4: Performance over a device's lifetime has little meaning in short periods of time.
Figure 4: Performance over a device's lifetime has little meaning in short periods of time.

In short, an oscillator with lower ΔF/ΔT delivers more accurate timestamps, resulting in less east-west traffic. This translates into higher server utilization, meaning it’ possible to reduce the number of servers in the data center. And less servers means lower capital expenses and operating costs, which increase operator profits. The more impressive oscillator would be one that has a more gradual frequency slope over temperature (Device 1) even though it may not have less peak-to-peak variation over its lifetime. In short, low sensitivity to temperature changes rather than lifetime peak-to-peak stability is the critical oscillator parameter for synchronizing networks.

Comparing oscillators

Crystal oscillators have been used as a timing reference for many decades, and while they have been improved over time, the inherent limitations of quartz-based timing technology can never be fully mitigated. Microelectromechanical system (MEMS)-based oscillators have none of these stability problems and were created to overcome the limitations of quartz. They are inherently rugged, making them uniquely suited for extremely harsh operating environments, as their structure is fabricated from a mechanical structure of single-crystal silicon.

Show Me the Money

Let’s consider a scenario to represent the benefits that MEMS-based precision timing can produce in a data center (Table 1). In this case, the facility has 300,000 servers per building and three buildings per campus. Every four years, the servers are refreshed, and the total cost of ownership for each server is about $200. Achieving just a 1% increase in server utilization would result in annual savings of $1.8 million for each building and $5.4 million for the entire campus.

Table 1: MEMS-based precision timing can save an average data center $5 million annually.
Table 1: MEMS-based precision timing can save an average data center $5 million annually.

The above scenario may be conservative, as even greater increases in server utilization can been achieved. Since designers often use traditional data sheet specifications when selecting oscillators, the potential for upside is significant. That said, it’s important to consider the intended application when selecting an oscillator because, as we demonstrated in the data center scenario, a single “banner spec” alone can be irrelevant and even degrade system performance.