Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Cover image Title page Table of Contents Copyright Foreword Preface
Organization Lots-of-cores.com
Acknowledgements Chapter 1. Introduction
Trend: more parallelism Why Intel® Xeon Phi™ coprocessors are needed Platforms with coprocessors The first Intel® Xeon Phi™ coprocessor Keeping the “Ninja Gap” under control Transforming-and-tuning double advantage When to use an Intel® Xeon Phi™ coprocessor Maximizing performance on processors first Why scaling past one hundred threads is so important Maximizing parallel program performance Measuring readiness for highly parallel execution What about GPUs? Beyond the ease of porting to increased performance Transformation for performance Hyper-threading versus multithreading Coprocessor major usage model: MPI versus offload Compiler and programming models Cache optimizations Examples, then details For more information
Chapter 2. High Performance Closed Track Test Drive!
Looking under the hood: coprocessor specifications Starting the car: communicating with the coprocessor Taking it out easy: running our first code Starting to accelerate: running more than one thread Petal to the metal: hitting full speed using all cores Easing in to the first curve: accessing memory bandwidth High speed banked curved: maximizing memory bandwidth Back to the pit: a summary
Chapter 3. A Friendly Country Road Race
Preparing for our country road trip: chapter focus Getting a feel for the road: the 9-point stencil algorithm At the starting line: the baseline 9-point stencil implementation Rough road ahead: running the baseline stencil code Cobblestone street ride: vectors but not yet scaling Open road all-out race: vectors plus scaling Some grease and wrenches!: a bit of tuning Summary For more information
Chapter 4. Driving Around Town: Optimizing A Real-World Code Example
Choosing the direction: the basic diffusion calculation Turn ahead: accounting for boundary effects Finding a wide boulevard: scaling the code Thunder road: ensuring vectorization Peeling out: peeling code from the inner loop Trying higher octane fuel: improving speed using data locality and tiling High speed driver certificate: summary of our high speed tour
Chapter 5. Lots of Data (Vectors)
Why vectorize? How to vectorize Five approaches to achieving vectorization Six step vectorization methodology Streaming through caches: data layout, alignment, prefetching, and so on Compiler tips Compiler options Compiler directives Use array sections to encourage vectorization Look at what the compiler created: assembly code inspection Numerical result variations with vectorization Summary For more information
Chapter 6. Lots of Tasks (not Threads)
OpenMP, Fortran 2008, Intel® TBB, Intel® Cilk™ Plus, Intel® MKL OpenMP Fortran 2008 Intel® TBB Cilk Plus Summary For more information
Chapter 7. Offload
Two offload models Choosing offload vs. native execution Language extensions for offload Using pragma/directive offload Using offload with shared virtual memory About asynchronous computation About asynchronous data transfer Applying the target attribute to multiple declarations Performing file I/O on the coprocessor Logging stdout and stderr from offloaded code Summary For more information
Chapter 8. Coprocessor Architecture
The Intel® Xeon Phi™ coprocessor family Coprocessor card design Intel® Xeon Phi™ coprocessor silicon overview Individual coprocessor core architecture Instruction and multithread processing Cache organization and memory access considerations Prefetching Vector processing unit architecture Coprocessor PCIe system interface and DMA Coprocessor power management capabilities Reliability, availability, and serviceability (RAS) Coprocessor system management controller (SMC) Benchmarks Summary For more information
Chapter 9. Coprocessor System Software
Coprocessor software architecture overview Coprocessor programming models and options Coprocessor software architecture components Intel® manycore platform software stack Linux support for Intel® Xeon Phi™ coprocessors Tuning memory allocation performance Summary For more information
Chapter 10. Linux on the Coprocessor
Coprocessor Linux baseline Introduction to coprocessor Linux bootstrap and configuration Default coprocessor Linux configuration Changing coprocessor configuration The micctrl utility Adding software Coprocessor Linux boot process Coprocessors in a Linux cluster Summary For more information
Chapter 11. Math Library
Intel Math Kernel Library overview Intel MKL and Intel compiler Coprocessor support overview Using the coprocessor in native mode Using automatic offload mode Using compiler-assisted offload Precision choices and variations Summary For more information
Chapter 12. MPI
MPI overview Using MPI on Intel® Xeon PhiTM coprocessors Prerequisites (batteries not included) Offload from an MPI rank Using MPI natively on the coprocessor Summary For more information
Chapter 13. Profiling and Timing
Event monitoring registers on the coprocessor Efficiency metrics Potential performance issues Intel® VTune™ Amplifier XE product Performance application programming interface MPI analysis: Intel Trace Analyzer and Collector Timing Summary For more information
Chapter 14. Summary
Advice Additional resources Another book coming? Feedback appreciated
Glossary Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion