Modern disks implement many different features, such as media-based cache (e.g., using a portion of disk space to log some random write accesses), DRAM protection (e.g., using a small-size NVM to temporarily store some data in DRAM cache during a power loss such that write-cache can be always enabled), hybrid structure (e.g., migrating hot data to high-speed devices and cold data to low-speed devices so that the overall access time is reduced), etc. A hybrid disk (e.g., SSHD), one of the hybrid structures, has advantages in some scenarios where data hotness is significant. Some emerging and future techniques like SMR, HAMR, and BPR favor sequential access in order to diminish garbage collection, reduce energy consumption, and/or improve the device life. This chapter shows how trace analysis can help to identify these mechanisms via workload property analysis using two examples: SSHD and SMR drives.
SSHD
In this section, let’s explore the mystery behind SSHD’s performance enhancement in SPC-1C [53] under WCD: SSD/DRAM cache and the self-learning algorithm [56, 57, 16]. I collected data from the XGIG bus analyzer and monitored the response from LeCroy Scope, with a workload generated by the SPC-1C tool. Some techniques, such as pattern recognition, curve fitting, and queue theory, are applied for analysis.

SSHD performance comparison with traditional HDDs
Similar Models Chosen for Comparison
SSHD | CMR A | CMR B | CMR C | |
---|---|---|---|---|
Capacity (GB) | 600 | 900 | 600 | 900 |
RPM | 10.5K | 10.5K | 10K | 10.5K |
Bytes per sector | 512, 520, 524, 528 | 512 | 512 | 512 |
Discs | 2 | 3 | 2 | 3 |
Average latency (ms) | 2.9 | 2.9 | 3 | 2.9 |
DRAM cache | 128MB | 64MB | 32MB | 64MB |
NAND | 16GB eMLC | None | None | None |
Interface | 6Gbps SAS | 6Gb/s SAS | 6Gb/s SAS | 6Gb/s SAS |
You know from the previous chapter that the write (random) access dominates the IO requests in SPC-1C, which means the write cache actually plays an important role. However, write cache is supposed to be disabled for WCD. Is it true for this SSHD? To verify it, you can do a simple test by injecting random write requests to SSHD and calculating the CCT/qCCT/TtoD time. If write cache is actually disabled, all requests will be written to media directly, which cost roughly 10ms response time. However, from the trace, you can observe that there are many requests with response times of less than 1ms at the beginning. Therefore, write cache actually is active even for the WCD setting. This benefits from the technique of NAND-backed DRAM cache protection , so part of cached data can be written to NAND just after system power loses.
Now let’s start some analysis for two essential problems: the cache size and access isolation.
Cache Size
- 1.
Connect SSHD to the XGIG bus analyzer and power off/on SSHD.
- 2.
Send 100 random write 8K requests to SSHD using IOMeter or another tool, and repeat the same requests 10 times.
- 3.
Repeat Steps 1-2 for 4 times with the same requests.
- 4.
Compare and find the access pattern for the XGIG traces via a trace analyzer tool.
- 5.
Repeat Steps 1-4 with the request number changed to 200 and 400.
- 6.
Do the same test on a different SSHD with the same IO pattern.
R/W DRAM cache may share the same space.
The SSD mapping table may share the same space with R/W DRAM cache (a good case: SSD mapping table uses a dedicated DRAM space).
The SSD reboot self-learning procedure may take DRAM space.
- 1.
Power off/on SSHD (make sure the DRAM write cache is cleared).
- 2.
Send 1000 random 8K write requests to SSHD with queue depth=1 using IOMeter.
- 3.
Repeat Steps 1-2 for 10 times each with different request sizes, such as 16K, 32K, 64K, 128K, ..., 2048K.
- 1.
Count the write DRAM hit number at the first portion of the total accesses for each run by isolating DRAM accesses from others (DRAM CCT/qCCT is generally much smaller than others).
- 2.
Choose the maximum number of each count.
- 3.
Calculate the hit numbers and the corresponding actual cache size.
- 4.
Find the turning point, which provides a hint of the cluster size.
- 5.
Refine the turning point by narrowing the region. For example, if the turning point is within [256K-512K], then some more points, such as 300K, 400K, and 500K, may be used.

IOmeter traces for SSHD

1K request trace details

512K request trace details

1024K request trace details
Comparison Under WCE
SSHD | CMR A | |||||
---|---|---|---|---|---|---|
Size | Counted number | Size | Size | Counted number | Size | |
1K | 98 | 0.1M | 1K | 98 | 0.1M | |
4k | 98 | 0.4M | 4k | 98 | 0.4M | |
16K | 97 | 1.5M | 16K | 99 | 1.5M | |
64K | 102 | 6.4M | 64K | 102 | 6.4M | 2 |
128K | 103 | 12.9M | 128K | 104 | 12.9M | |
256K | 111 | 27.8M | 256K | 110 | 27.8M | |
512K | 115 | 57.5M | 512K | 66 | 33M | |
520K | - | - | 880K | 35 | 30M | |
900K | - | - | 900K | 42 | 36.9M | |
1024K | N.A. | - | 1000K | 36 | 36M |
Two Cases Under WCD of SSHD
SSHD (WCD) | test1 | test2 | ||
---|---|---|---|---|
size | Counted number | Size | Counted number | Size |
1K | 98 | 0.1M | 101 | 101K |
4k | 100 | 0.4M | 100 | 400K |
16K | 99 | 1.5K | 100 | 1600K |
64K | 101 | 6.3K | 101 | 6464K |
128K | 54 | 6.75M | 54 | 6.75M |
256K | 26 | 6.5M | 26 | 6.5M |
512K | 12 | 6.0M | 13 | 6.5M |
520K | - | - | 12 | 6.1M |
From the turning point, you may also guess the cache cluster/segment size. For example, SSHD’s cluster size is around 64K for write during WCD, and CMR A is around 256KB. SSHD uses up to 60MB DRAM space as write cache when WCE, while only around 8MB is used for WCD with some 100 segments.
Access Isolation
- 1.
Send 100/200/256/257/etc. pieces of 8K requests to SSHD, repeat 20-100 times for each number, and refine the number of commands to be sent according to the access pattern.
- 2.
Suppose the turn point is X. Send X random read commands with size 16K,32K,..., and 1024K to SSHD and find the cluster size according to the turning point.

Steady state of response time

100 8K random reads, repeated 20 times

250 8K random reads, repeated 100 times

260 8K random reads, repeated 100 times
Thus, you may guess that 256 could be the maximum read segment number, as no destage happens if this maximum segment number is not exceeded. Note that you can find the destage pattern via different time intervals and sizes.

Read response time pattern over repeated rounds
Statistics on Response Time (Based on 40-90 Rounds)
260 | Average | CCT | qCCT | TtoD |
---|---|---|---|---|
Overall | Mean | 0.227 | 0.227 | 0.212 |
Std. | 0.096 | 0.096 | 0.096 | |
SSD | Mean | 0.279 | 0.279 | 0.264 |
Std. | 0.031 | 0.031 | 0.032 | |
DRAM | Mean | 0.063 | 0.063 | 0.048 |
Std. | 0.001 | 0.001 | 0.001 |
SMR
The drive manages all data accesses, and data management is complicated similar to the FTL (flash translation layer) of SSD, so the management of metadata, GC (garbage collection), over-provisioning, variable performance, etc, is all inside the drive. However, there are no host-side changes, so the drive is used as a normal one. Currently, all major SMR drives available in the market fall in this category.
The host manages most data-related accesses via a SMR-specific file system similar to flash file system. Data management is complicated but can leverage mature file systems that write sequentially. A few examples are SFS [58], HiSMRfs [59], and Shingledfs [5]. Although mixed drive-host management is also possible, it is really rare.
SMR Characteristics vs. Workload Metrics
SMR char | SMR expectation | Workload metrics | SMR impact |
---|---|---|---|
Sequential write | Good for large size sequential write requests | Average write request size and distribution Seek distance (LBA) Sequential stream and near-sequential stream | The larger size, the better The smaller seek distance, the more sequential The more streams, the more sequential |
Write once read-many | Good for less updates and more reads | Read/write ratio Read on write (ROW) hit ratio Write update ratio | The higher read/write ratio (ROW ratio), the better The smaller the write update ratio, the better |
Garbage collection (GC) | Smaller write amplification and less GC | Device utilization, device idle time distribution, queue length IOPS, throughput Frequented/timed/stacked write update ratio (WUR) Write on write (WOW) hit distribution and ratio | Long and frequent idle time for GC Low write update ratio indicates that less GC is required |
Sequential read to random write | Less read performance impact due to indirect mapping, such as sequential LBA read requests in random physical address | Read on write (ROW) hit/size distribution and ratio | The higher the small (large) read to small (large) write ratio, the better |
In-place or out-of-place update | Frequent and recent updates need random access zone (RAZ)/SSD/large DRAM buffer to hold write data | Stacked write update ratio | The higher ratio in shorter stack, the more necessary to have an in-place update buffer |