Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Cover Title Copyright About ApressOpen Dedication Contents at a Glance Contents About the Author About the Technical Reviewer Acknowledgments Introduction Part 1: Hardware Foundation: Intel Xeon Phi Architecture
Chapter 1: Introduction to Xeon Phi Architecture
History of Intel Xeon Phi Development
Evolution from Von Neumann Architecture to Cache Subsystem Architecture Improvements in the Core and Memory Interconnect and Cache Improvements
Intel Xeon Phi Coprocessor Chip Architecture Applicability of the Intel Xeon Phi Coprocessor Summary
Chapter 2: Programming Xeon Phi
Intel Xeon Phi Execution Models Development Tools for Intel Xeon Phi Architecture
Intel Composer XE
Setting Up an Intel Xeon Phi System
Install the MPSS Stack Install the Development Tools
Code Generation for Intel Xeon Phi Architecture
Native Execution Mode
Language Extensions to Support Offload Computation on Intel Xeon Phi
Heterogeneous Computing Model and Offload Pragmas Language Extensions and Execution Model Runtime Library Routines Offload Example
Summary
Chapter 3: Xeon Phi Vector Architecture and Instruction Set
Xeon Phi Vector Microarchitecture
The VPU Pipeline Vector Registers Vector Mask Registers Extended Math Unit
Xeon Phi Vector Instruction Set Architecture
Data Types Vector Nomenclature Vector Instruction Syntax Xeon Phi Vector ISA by Categories
Summary
Chapter 4: Xeon Phi Core Microarchitecture
Intel Xeon Phi Cores Core Pipeline Stages Cache and TLB Structure L2 Cache Structure Multithreading
Performance Considerations Probing the Core
Summary
Chapter 5: Xeon Phi Cache and Memory Subsystem
The Interconnect Topologies for Manycore Processors
Bidirectional Ring Topology Two-Dimensional Mesh Topology Two-Dimensional Torus Topology Other Topologies
The Ring Interconnect Architecture in Intel Xeon Phi L2 Cache
Tag Directory Data Transactions The Cache Coherency Protocol Hardware Prefetcher
Memory Transactions Flow
Cacheable Memory Read Transaction Managing Cache Hierarchy in Software
Probing the Memory Subsystem
Measuring the Memory Bandwidth on Intel Xeon Phi
Summary
Chapter 6: Xeon Phi PCIe Bus Data Transfer and Power Management
DMA Engine
Measuring the Data Transfer Bandwidth over the PCIe Bus
Reading Data from the Coprocessor Low-Level Data Transfer APIs for Intel Xeon Phi Placement of PCIe Cards for Optimal Data Transfer BW Power Management and Reliability
Idle Stare Management Reliability Availability and Serviceability Features in the Intel Xeon Phi Coprocessor
Summary
Part 2: Software Foundation: Intel Xeon Phi System Software and Tools
Chapter 7: Xeon Phi System Software
System Software Component Ring 0 Driver Layer Components of the MPSS
System Boot Process Coprocessor OS Creating a Third-Party Coprocessor OS mic0: Transition from State Booting to Online Host Driver Linux Virtual File System (Sysfs and Procfs) Networking on Xeon Phi Network File System Open Fabrics Enterprise Distribution and Message Passing Interface Support System Software Application Components
Summary
Chapter 8: Xeon Phi Application Development Tools
The Application Development Tools
Intel C/C++ Composer XE OpenMP 4.0 and Language Extensions Pragmas Asynchronous Data Transfer Over PCI Express
Keywords
Using Shared Virtual Memory Valid Use of the Keywords
Macros Intrinsics
C++ Class Libraries
Application Programming Interfaces
Environment Variables Compiler Options Creating Offload Libraries
Intel Fortran Composer XE
Directives Macros Application Programming Interfaces
Environment Variables, Compiler Options, and Creating Static Libraries
Third-Party Compilers Supporting Xeon Phi CAPS Compiler Debugging Xeon Phi Applications Intel Debugger Third-Party Debuggers
Optimization Tool: Intel Vtune Amplifier XE Libraries
Native or Symmetric Execution Compiler-Assisted Offload Using the Automatic Offload Version of the MKL Library Third-Party Math Libraries
Intel Cluster Tools
Third-Party Cluster Tools
Summary
Part 3: Applications: Technical Computing Software Development on Intel Xeon Phi
Chapter 9: Xeon Phi Application Design and Implementation Considerations
Workload-Related Considerations
Gustafson’s Law Scaled Speedup
Effect of Grid Shape on Performance
Algorithm Considerations Data Structure Offload Overhead Load Balancing
Implementation Considerations
Memory Management Mixed-Precision Arithmetic Optimizing Memory Transfer Bandwidth over the PCIe Bus Data Alignment Considerations Communication
Summary
Chapter 10: Application Performance Tuning on Xeon Phi
Getting Baseline Data Timing Applications Detecting Application Execution Bottlenecks
Some Basic Performance Events
Setting Target Performance Optimizing Code
Compiler-Driven Optimizations Data Alignment Removing Pointer Aliasing Streaming Store Using Large Pages Using Intel Cilk Plus Array Notation Vectorization with Intel Compiler
Using the Math Kernel Library Cluster-Level Tuning Summary
Chapter 11: Algorithm and Data Structures for Xeon Phi
Algorithm and Data Structure Design Rules for Xeon Phi General Matrix-Matrix Multiply Algorithm (GEMM)
Rules 1 and 3: Scalable Parallelization and Optimal Cache Reuse Rule 2: Efficient Vectorization
Molecular Dynamics
Rule 1: Scalable Parallelization Rules 2 and 3: Efficient Vectorization and Optimal Cache Reuse
Stencil Operation
Rule 1: Scalable Parallelization Rule 2: Efficient Vectorization Rule 3: Optimal Cache Reuse
European Option Pricing Using Monte Carlo Simulation in Financial Applications
Rule 1: Scalable Parallelization Rule 2: Efficient Vectorization Rule 3: Optimal Cache Reuse
Summary
Chapter 12: Xeon Phi Application Development on Windows OS
MPSS
MPSS Tools
Development Tools
Language Extensions for the Xeon Phi Coprocessor Offload Environment Variables
Debugging Offload Execution
Logging into Xeon Phi Console using PuTTY
Using VTune Amplifier XE to Profile Offload Code on Windows Building and Running Xeon Phi Native Applications from the Windows Host Summary
Appendix A: OpenCL on Xeon Phi
Installation Building and Running OpenCL Application Performance Optimization
Appendix B: Virtual Shared Memory Programming on Xeon Phi
Placing Data on the Virtual Shared Memory Region Shared Functions Synchronizing Between the Host and the Coprocessors
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion