Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Python and HDF5
Preface
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
1. Introduction
Python and HDF5
Organizing Data and Metadata
Coping with Large Data Volumes
What Exactly Is HDF5?
HDF5: The File
HDF5: The Library
HDF5: The Ecosystem
2. Getting Started
HDF5 Basics
Setting Up
Python 2 or Python 3?
Code Examples
NumPy
HDF5 and h5py
IPython
Timing and Optimization
The HDF5 Tools
HDFView
ViTables
Command Line Tools
Your First HDF5 File
Use as a Context Manager
File Drivers
core driver
family driver
mpio driver
The User Block
3. Working with Datasets
Dataset Basics
Type and Shape
Reading and Writing
Creating Empty Datasets
Saving Space with Explicit Storage Types
Automatic Type Conversion and Direct Reads
Reading with astype
Reshaping an Existing Array
Fill Values
Reading and Writing Data
Using Slicing Effectively
Start-Stop-Step Indexing
Multidimensional and Scalar Slicing
Boolean Indexing
Coordinate Lists
Automatic Broadcasting
Reading Directly into an Existing Array
A Note on Data Types
Resizing Datasets
Creating Resizable Datasets
Data Shuffling with resize
When and How to Use resize
4. How Chunking and Compression Can Help You
Contiguous Storage
Chunked Storage
Setting the Chunk Shape
Auto-Chunking
Manually Picking a Shape
Performance Example: Resizable Datasets
Filters and Compression
The Filter Pipeline
Compression Filters
GZIP/DEFLATE Compression
SZIP Compression
LZF Compression
Performance
Other Filters
SHUFFLE Filter
FLETCHER32 Filter
Third-Party Filters
5. Groups, Links, and Iteration: The “H” in HDF5
The Root Group and Subgroups
Group Basics
Dictionary-Style Access
Special Properties
Working with Links
Hard Links
Free Space and Repacking
Soft Links
External Links
A Note on Object Names
Using get to Determine Object Types
Using require to Simplify Your Application
Iteration and Containership
How Groups Are Actually Stored
Dictionary-Style Iteration
Containership Testing
Multilevel Iteration with the Visitor Pattern
Visit by Name
Multiple Links and visit
Visiting Items
Canceling Iteration: A Simple Search Mechanism
Copying Objects
Single-File Copying
Object Comparison and Hashing
6. Storing Metadata with Attributes
Attribute Basics
Type Guessing
Strings and File Compatibility
Python Objects
Explicit Typing
Real-World Example: Accelerator Particle Database
Application Format on Top of HDF5
Analyzing the Data
7. More About Types
The HDF5 Type System
Integers and Floats
Fixed-Length Strings
Variable-Length Strings
The vlen String Data Type
Working with vlen String Datasets
Byte Versus Unicode Strings
Using Unicode Strings
Don’t Store Binary Data in Strings!
Future-Proofing Your Python 2 Application
Compound Types
Complex Numbers
Enumerated Types
Booleans
The array Type
Opaque Types
Dates and Times
8. Organizing Data with References, Types, and Dimension Scales
Object References
Creating and Resolving References
References as “Unbreakable” Links
References as Data
Region References
Creating Region References and Reading
Fancy Indexing
Finding Datasets with Region References
Named Types
The Datatype Object
Linking to Named Types
Managing Named Types
Dimension Scales
Creating Dimension Scales
Attaching Scales to a Dataset
9. Concurrency: Parallel HDF5, Threading, and Multiprocessing
Python Parallel Basics
Threading
Multiprocessing
MPI and Parallel HDF5
A Very Quick Introduction to MPI
MPI-Based HDF5 Program
Collective Versus Independent Operations
Atomicity Gotchas
10. Next Steps
Asking for Help
Contributing
Index
About the Author
Colophon
Copyright
← Prev
Back
Next →
← Prev
Back
Next →