Index
A
accelerator_selector
Accessors
SeeBuffers, accessors
Actions
Address spaces
Ahead-of-time (AOT) compilation
vs. just-in-time (JIT)
all_of function
Amdahl’s Law
Anonymous function objects
SeeLambda function
any_of function
Asynchronous errors
Asynchronous Task Graphs
atomic_fence function
Atomic operations
atomic_fence
atomic_ref class
data race
device-wide synchronization
std:atomic class
std:atomic_ref class
Unified Shared Memory
B
Barrier function
in ND-range kernels
in hierarchical kernels
Broadcast function
Buffers
access modes
accessors
context_bound
host memory
use_host_ptr
use_mutex
build_with_kernel_type
Built-in functions
C
Central Processing Unit (CPU
Choosing devices
Collective functions
broadcast
load and store
shuffles
vote
Command group (CG)
actions
event-based dependences
execution
Communication
work-group local memory
work-items
Compilation model
Concurrency
Concurrent Execution
copy method
CPU execution
cpu_selector
CUDA code
Custom device selector
D
Data management
buffers
explicit
images
implicit
strategy selection
USM
advantage of
allocations
explicit data movement
implicit data movement
malloc
unified virtual address
Data movement
explicit
implicit
graph scheduling
memcpy
migration
Data parallelism
basic data-parallel kernels
id class
item class
parallel_for function
range class
hierarchical kernels
h_item class
parallel_for_work_group function
parallel_for_work_item function
private_memory class
loops vs. kernels
multidimensional kernels
ND-range kernels
group class
local accessors
nd_item class
nd_range class
sub_group class
sub-groups
work-groups
work-items
Data-parallel programming
Debugging
kernel code
parallel programming errors
runtime error
default_selector
depends_on()
Device code
Device information
custom device selectors
device queries
kernel queries
Device selection
Directed acyclic graph (DAG)
Direct Programming
Download code
E
Error handling
Event
Extension and specialization mechanism
F
Fallback
Fences
Fencing memory
Field Programmable Gate Arrays (FPGAs)
ahead-of-time compilation
building blocks
look-up tables
math engines
off-chip hardware
on-chip memory
Routing fabric
compilation time
customized memory systems
custom memory systems
memory access
optimization
stages
static coalescing
custom operations/operation widths
emulation
pipes
First-in first-out (FIFO)
fpga_selector
FPGA emulation
Functions, built-in
functors
SeeNamed function objects
G
get_access
get_global_id()
get_info
get_local_id()
get_pointer_type
GitHub
gpu_selector
Graphics Processing Units (GPUs)
building blocks
caches and memory
execution resources
fixed functions
device_selector
fp16
fast math functions
half-precision floating-point
predication
masking
offloading kernels
abstraction
cost of
software drivers
SYCL runtime library
profiling kernels
Graph scheduling
command group
actions
event-based dependences
host synchronization
GPU
SeeGraphics Processing Units
Graph scheduling
group class
Group functions
Gustafson
H
Handler class
Heterogeneous Systems
Hierarchical parallelism
Host code
Host device
development and debugging
fallback queue
host_selector
I
id class
In-order queues
Initializing data
Initiation interval
Intermediate representation (IR)
Interoperability
item class
J
Just-in-time (JIT)
vs. ahead-of-time (AOT)
K
Kernels
advantages and disadvantages
interoperability
API-defined objects
API-defined source
functionality
implementation
lambda functions
definition
elements
name template parameter
named function objects
definition
elements
in program objects
L
Lambda function
Latency and Throughput
Libraries
built-in functions
common functions
geometric functions
host and device
integer functions
math functions
relational functions
load() member function
Local Accessor
Local Memory
in ND-Range kernels
in hierarchical kernels
Loop initiation interval
Loop pipelining
M
malloc functions
Map pattern
mem_advise()
memcpy
Memory allocation
Memory consistency, 215
Memory Fence
Memory model
barriers and fences
C++ and SYCL/DPC++
data races and synchronization
definition
memory consistency
memory_order enumeration class
memory_scope enumeration class
ordering
querying device capabilities
memory_order enumeration class
memory_scope enumeration class
memset function
Multiarchitecture binaries
Multidimensional Kernels
Multiple translation units
N
Named function objects
ND-range kernels
example
O
oneAPI DPC++ Library (oneDPL)
Out-of-order (OoO) queues
P
Pack
parallel_for
parallel_for_work_group function
parallel_for_work_item function
Parallel patterns
map
pack
properties
reduction
scan
stencil
unpack
Parallel STL (PSTL)
algorithms
DPC++ execution policy
dpstd :binary_search algorithm
FPGA execution policy
requirements
std:fill function
USM
Pipes
Pipeline parallelism
Platform model
compilation model
host device
multiarchitecture binary
SYCL and DPC++
Portability
prefetch ()
Program build options
Q
Queries
device information
kernel information
local memory type
memory model
unified shared memory
Queues
binding to a device
definition
device_selector class
multiple queues
R
Race Condition
Reduction library
Reduction patterns
Run time type information (RTTI)
S
Sample code download
Scaling
Scan patterns
Selecting devices
set_final_data
set_write_back
shared allocation
Shuffle functions
Single Program, Multiple Data (SPMD)
Single-Source
Standard Template Library (STL)
std::function
Stencil pattern
store() member function
Sub-Groups
compiler optimizations
loads and stores
sub_group class
SYCL versions
Synchronous errors
T
Task graph
DAG
disjoint dependence
execution
explicit dependences
implicit dependences
in-order queue object
OoO queues
simple task graph
Throughput and Latency
throw_asynchronous()
Translation units
try-catch structure
U
Unified shared memory (USM)
aligned_malloc functions
allocations
data initialization
data movement
SeeData movement
definition
device allocation
explicit data movement
host allocation
implicit data movement
malloc
unified virtual address
memory allocation
C++ allocator-style
C++-style
C-style
deallocation
new, malloc, or allocators
queries
shared allocation
Unnamed function objects
SeeLambda function
Unpack patterns
update_host method
V
vec class
Vectors
explicit vector code
features and hardware
load and store operations
swizzle operations
vote functions
any_of function
all_of function
W, X, Y, Z
wait()
wait_and_throw()
Work Groups
Work-group local memory
Work-Item