Block Trace Analysis and Storage System Optimization

Jun Xu

Block Trace Analysis and Storage System OptimizationA Practical Approach with MATLAB/Python Tools

../images/468166_1_En_BookFrontmatter_Figa_HTML.png

Jun Xu

Singapore, Singapore

ISBN 978-1-4842-3927-8e-ISBN 978-1-4842-3928-5

https://doi.org/10.1007/978-1-4842-3928-5

Library of Congress Control Number: 2018964058

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation.

To Grace, Alexander, and Arthur.

Introduction

In the new era of IoT, big data, and cloud systems, better performance and higher density of storage systems become more crucial in many applications.

To increase data storage density, new techniques have evolved, including shingled magnetic recording (SMR), heat-assistant magnetic recording (HAMR) for HDD, 3D Phase Change Memory (PCM) and Resistive RAM (ReRAM) for SSD. Furthermore, some hybrid and parallel access techniques together with specially designed IO scheduling and data migration algorithms have been deployed to develop high performance data storage solutions.

Among the various storage system performance analysis techniques, IO event trace analysis (block-level trace analysis in particular) is one of the most common approaches for system optimization and design. However, the task of completing a systematic survey is challenging and very few works on this topic exist. Some books provide theoretical fundamentals without enough practical analysis in physical systems, and others discuss the performance of some specific storage systems without proposing a tool that can be applied widely.

To fill this gap, this book brings together IO properties and metrics, trace parsing, and result reporting perspectives, based on MATLAB and Python platforms. It provides self-inclusive content on block-level trace analysis techniques, and it includes typical case studies to illustrate how these techniques and tools can be applied in real applications such as SSHD, RAID, Hadoop, and Ceph systems.

This book starts with an introduction in Chapter 1 , which provides the background of data storage systems and general trace analysis. I show that the wide applications of block storage devices motivate the intensive study of various block-level workload properties.

Chapter 2 gives an overview of traces, in particular, the block-level traces. After introducing the common workload properties, I discuss the trace metrics in two categories, the basic ones and the advanced ones.

In Chapter 3 , I present the ways to collect the block-level trace in both hardware and software tools. In particular, I show how the most popular tool in Linux system, blktrace, works in a simple setting.

In Chapter 4 , I investigate the design of trace analyzers. I discuss the interactions of the workload with system components, algorithms, structure, and applications.

Case study is the best way to learn the methodology and the corresponding tools. This book will provide some examples to show how the analysis can be applied to real storage system tuning, optimization, and design. Therefore, from Chapter 5 to Chapter 9 , I provide some typical examples for trace analysis and system optimization.

Chapter 5 presents the properties of traces from some benchmark tools, such as SPC and PCMarks. I show how to capture the main characteristics and then formulate a “synthetic” trace generator. I also show how the cache is affected by the workload, and how a proper scheduling algorithm is designed.

Chapter 6 attempts to explain the mystery behind SSHD’s performance boost in SPC-1C under WCD (write cache disabled). I show from the trace how a new hybrid structure can help to improve system performance.

Chapter 7 discusses the trace under two RAID systems with different read and write properties. I illustrate that the parity structure has a big impact on the overall performance.

Chapter 8 first reviews the literature on Hadoop workload analysis. And then I discuss the WD Hadoop cluster in a production environment. After that, the workload properties are analyzed, in particular, for SMR drives.

Chapter 9 analyzes the Ceph system performance. Storage and the CPU/network/memory are discussed. I show that these components shall be considered as a unified system in order to identify the performance bottleneck.

The tools used in the book are introduced in the appendix. I first introduce the tool based on MATLAB. Then, I show how this tool is converted into the Python platform.

Acknowledgments

A major component of this work came as a result of my 16 years of R&D experience on data analytics and storage systems at Western Digital, Temasek Labs, and Data Storage Institute. I would like to acknowledge Western Digital for allowing me to publish some of my job-related work. During the preparation of this book, I received support and advice from many friends and colleagues. Here I only mention few: Dr. Jie Yu, Dr. Guoxiao Guo, Robin O’Neill, Grant Mackey, Dr. Jianyi Wang, David Chan, Wai-Ee Wong, Dr. Yi Li, Samuel Torrez, Shihua Feng, Jiang Dan, Terry Wu, Allen Samuels, Gregory Thelin, William Boyle, David Hamilton, John Clinton, Nils Larson, Karanvir Singh, Eric Lee, and Sang Huynh. In particular, Junpeng Niu, my PhD student and colleague, also helped me with a few paragraphs in Chapter 1 on hybrid disks.

I would also like to thank the technical reviewers, Yunpeng Cai and Li Xia, for their very helpful comments. Deep appreciation also goes out to the editors, Susan McDermott, Rita Fernando, Laura Berendson, Amrita Stanley, Krishnan Sathyamurthy and Joseph Quatela for their hard work.

Last but not least, I am most grateful to my wife, Grace, for the love and encouragement provided through my entire life, and to my two boys, Alexander and Arthur, who remind me that there is a life beyond the work. Without their great patience and enthusiastic support, I would not have been able to complete this book.

Chapter 1: Introduction 1

Basics of Storage 1

Storage Devices 2

HDD 2

SSD 12

Hybrid Disk 21

Tape and Disc 24

Emerging NVMs 25

Storage Systems 30

Infrastructure: RAID and EC 30

Implementation 38

System Performance Evaluation 43

Performance vs. Workload 45

Trace Collection and Analysis 46

System Optimization 46

Chapter 2: Trace Characteristics 49

Workload Properties 49

Basic Metrics 53

LBA Distribution 53

Read/Write Distribution 56

Inter-Arrival and Inter-Completion Time 57

IOPS and Throughput 58

Response Time 59

Queue Length/Depth 60

Busy/Idle Time 63

Advanced Metrics 64

Sequence vs. Randomness 65

Spatial Locality and Logical Seek Distance 69

Temporal Locality and Logical Stack Distance 71

Statistical Properties Visualization and Evaluation 72

Read /Write Dependency 74

Priority-Related Metrics 78

Modeling Issues 78

Typical Applications 82

Traces in File- and Object-Levels 85

Chapter 3: Trace Collection 89

Collection Techniques 89

Hardware Trace Collection 90

Software Trace Collection 94

Blktrace 96

Dtrace, SystemTap, and LTTng 96

Trace Warehouse 97

Chapter 4: Trace Analysis 101

Interactions with Components 101

HDD Factors 102

SSD Factors 104

Interactions with Algorithms 109

Interactions with Structure 111

Interactions with Applications 112

Chapter 5: Case Study: Benchmarking Tools 115

SPC-1C 118

Workload Properties 118

Synthetic Trace 121

PCMark 124

Workload Properties 129

Gain-Loss Analysis 133

Chapter 6: Case Study: Modern Disks 143

SSHD 143

Cache Size 146

Access Isolation 152

SMR 156

Chapter 7: Case Study: RAID 159

Workload Analysis 160

System Settings 161

Read-Dominated Trace 161

Write-Dominated Trace 168

Chapter 8: Case Study: Hadoop 175

Hadoop Cluster 178

Workload Metrics Evaluation 181

Block-Level Analysis 182

System-Level View 201

Some Further Discussions 205

Chapter 9: Case Study: Ceph 209

Filestore IO Pattern 211

Performance Consistency Verification 217

Bottleneck Identification 222

Appendix A: Tools and Functions 229

MATLAB-Based Tool: MBPAR 229

Python-Based Tool: PBPAR 238

Interaction Between MATLAB and Python 241

Appendix B: Blktrace and Tools 245

Bibliography 251

Index 263

About the Author and About the Technical Reviewers

About the Author

Jun Xu

../images/468166_1_En_BookFrontmatter_Figb_HTML.jpg

got his BS in Mathematics and a PhD in Control from Southeast University (China) and Nanyang Technological University (Singapore), respectively. He is a Lead Consultant Specialist at Hongkong-Shanghai Banking Corporation (HSBC) and was a Principal Engineer at Western Digital. Before that, he was with Data Storage Institute, Nanyang Technological University, and National University of Singapore for research and development. He has multi-discipline knowledge and solid experience in complex system modeling and simulation, data analytics, data center, cloud storage, and IoT. He has published over 50 international papers, 15 US patents (applications), and 1 monograph. He is an editor of the journal Unmanned Systems and was a committee member of several international conferences. He is a senior member of IEEE and a certificated FRM.

About the Technical Reviewers

Yunpeng Chai

received BE and PhD degrees in Computer Science and Technology from Tsinghua University in 2004 and 2009, respectively. He is currently an Associate Professor at the School of Information at Renmin University of China and Vice Dean of the department of Computer Science and Technology. His research interests include SSD/NVM-based hybrid storage systems, distributed key-value stores, and cloud storage virtualization. He regularly publishes in prestigious journals and conferences (like IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Computers, MMST, etc.). He is a member of the Information Storage Technology Expert Committee in the China Computer Federation.

Li Xia

is an Associate Professor at the Center for Intelligent and Networked Systems (CFINS), Department of Automation, Tsinghua University, Beijing China. He received his BS and PhD degrees in Control Theory in 2002 and 2007, respectively, both from Tsinghua University. After graduation, he worked at IBM Research China as a research staff member (2007–2009) and at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia as a postdoctoral research fellow (2009–2011). Then he returned to Tsinghua University in 2011. He was a visiting scholar at Stanford University, the Hong Kong University of Science and Technology, etc. He serves/served as an associate editor and program committee member of a number of international journals and conferences. His research interests include the methodology research in stochastic learning and optimization, queuing theory, Markov decision processes, reinforcement learning, and the application research in storage systems, building energy, energy Internet, industrial Internet, Internet of Things, etc. He is a senior member of IEEE.

Table of Contents

About the Author and About the Technical Reviewers

About the Author

About the Technical Reviewers