Index

A

A-B-A problem
affinity_partitioner
Algorithms
parallel_deterministic_reduce
parallel_do
parallel_for
parallel_for_each
parallel_invoke
parallel_pipeline
parallel_reduce
parallel_scan
parallel_sort
pipeline
Algorithm vs. patterns
alignas() method
aligned_space
async_node
Atomic variables
auto_partitioner
available_devices() function

B

blocked_range
blocked_range2d
blocked_range3d
blocked_rangeNd
Buffering nodes

C

Cache
Cache affinity
cached_aligned_allocator template
Cache lines
Cache-oblivious algorithm
Coarse-grained locking
combinable<T> object
combine_each() function
compare_and_swap (CAS)
Composability
NCR ( see Non-Composable Runtime (NCR))
nested parallelism
TBB
parallelism
thread pool (the market) and arenas
work stealing
work isolation
composite_node
Concurrent containers
Contention
Context switching
Continuation task
blocking style
continuation-passing style
ref_count
scheduler bypass
Control flow nodes

D

Data Analytics Acceleration Library (DAAL)
Data parallelism
Data placement
Data placement and processor affinity, NUMA
hwloc_alloc_membind
hwloc_get_obj_by_type
hwloc library
hwloc_set_cpubind
nodes, bind
numa_node
Data structures
associative containers
hashing
map vs. set
multiple values
unordered
unordered associative containers
Data type definitions, triad
Deadlock
Debugging macros
Dependency graphs
continue_node objects
edges
implementation
addEdges functions
continue_node objects
createNode function
dependencies
forward substitution
parallel_reduce
serial blocked code
serial tiled implementation
synchronization points
scalability
Design patterns
Determinism
Device Filters
Device selector
Divide-and-conquer pattern
3D stereoscopic images
DYLD_INSERT_LIBRARIES
Dynamic memory interface replacement
Dynamic priorities

E

Embarrassing parallelism
enumerable_thread_specific (ETS) class
combine_each() function
parallel histogram computation
reduction
Environment variable
ETS, see enumerable_thread_specific (ETS) class
Event-based coordination pattern
Exception handling
example
tbb_exception and movable_exception classes

F

Fair mutexes
False sharing
alignas() method
histogram vector
jemalloc and tcmalloc
padding
Fibonacci Sequence
Fine-grained locking
convoying
deadlock
oversubscription
Fire-and-forget tasks
First-in-first-out (FIFO)
Floating-point settings
Floating-point types
Flow graph
Flow Graph Analyzer (FGA) tool
for_each
Fork-join layer
TBB library
Fork-join pattern
Forward substitution
Functional parallelism
See alsoTask parallelism

G

Generic algorithms
GPU kernel execution
Grainsize
Graph object

H

Hard thread limits, see Thread limits
Hardware Transactional Memory (HTM)
Hash functions
HASH MAPS
Heterogeneous triad computation
Heterogeneous triad, flow graph
High-Performance Computing (HPC)
Huge pages
hwloc
hwloc and TBB
baseline triad implementation
on_scheduler_entry
PinningObserver class

I

Imbalanced pipeline
Integrated Development Environment (IDE)
Intel Advisor tool
Isolate function
for correctness
deadlock
nested parallelism
task_arena
in task arenas
correctness issues
namespace

J

jemalloc
join_node
Join nodes

K

Kernel arguments
key() function

L

Lambda expressions
Lambda expressions–vs-user-defined classes
LD_PRELOAD environment variable
libnuma
Lightweight policy
likwid
likwid-bench
likwid-perfctr
limiter_node
Linear pipeline
Line of sight problem
llalloc
Local allocation policy
Locality
Lock-free techniques
Locking
Lock preemption
Low-level implementation of a wavefront
data dependence flow
2D wavefront pattern
parallelization strategy
recycling
sequential version
task-based implementation
Low-level tasking interface
lscpu
lstopo

M

Macros
malloc
Linux
macOS
Windows
map/multimap and set/multiset interfaces
Map pattern
Map vs. set
Math Kernel Library (MKL)
max_number_of_live_tokens
Memory allocation
replacing new and delete
Memory allocation/deallocation
Memory allocators
allocator concept
functions
memory_pool and fixed_pool
memory pool concept
special controls
template classes
memory_pool_allocator
Message-driven layer
Message passing
multifunction_node
Multiresolution
Mutex
Mutex flavors
Mutual exclusion

N

NDRange concept
Nested composition
Nested parallelism
composability
Nesting pattern
New/delete operators
new operators, replacing
Node granularity
FG loop function
FG loop per worker function
master loop function
serial loop function
Nodes
Non-Composable Runtime (NCR)
concurrent executions
construct processes
two-level deep nesting
Non-preemptive priorities
in task class
priority inversion
priority levels
task execution
thread priorities
threads
Non-Uniform Memory Access (NUMA)
locality
note_affinity function
numactl command

O

OpenCL
NDRange
streaming_node
opencl_buffer.begin() function
opencl_buffer.data() member function
opencl_device
opencl_program
OpenMP
composability
NCP
Ordering issues

P

Padding
Parallel Continuation Machine (PCM)
parallel_deterministic_reduce
parallel_do algorithm
Parallel execution
parallel_for algorithm
parallel_for_each algorithm
parallel_invoke algorithm
Parallel loop/pipeline
Parallel patterns vs. parallel algorithms
parallel_pipeline algorithm
filter_t
flow_control
Hello, World example
parallel_pipeline function
parallel_policy
Parallel programming, patterns
parallel_reduce algorithm
parallel_scan algorithm
parallel_sort algorithm
Parallel STL
parallel_unsequenced_policy
Partitioners
par_unseq execution policy
Patterns
algorithm structures
branch-and-bound
data parallelism
design patterns
divide-and-conquer
event-based coordination
finding concurrency
fork-join
implementation mechanisms
map pattern
nesting
parallel patterns vs. parallel algorithms
parallel programming
pipeline
reduce pattern
scan operation
scan pattern
supporting structures
TBB templates
workpile pattern
Performance portability (portable)
Pipeline
tbb::parallel_pipeline
Pipeline parallelism
Pipeline pattern
Portable Hardware Locality (hwloc) package
Precompiled kernel
Preview features
Priorities
algorithms
enqueued tasks
flow graph
task_group_context
Priority inversion
Privatization
histogram computation
Processor affinity
Proportional splitting constructor
Proxy methods
environment variables
functions
Linux
macOS
routines
tbb_mem.cpp
test program
pstlvars scripts

Q

queueing_lightweight policy
queueing_mutex

R

RandomAccessIterator
Ranges
default constructor
requirements
splitting constructors
Range type
parallel quicksort
quicksort
Recursive mutexes
Recursive implementation
reduce operation
Reduce pattern/map-reduce
associative operations
blocked_range
floating-point types
maximum value loop
numerical integration
rectangular integral method
Reduction
histogram computation
Reduction patterns (reduce and scan)
Reduction template
rejecting_lightweight policy
Relaxed sequential semantics
reset() function
Resource Acquisition Is Initialization (RAII)
Rule of thumb
10,000 cycle
1 microsecond

S

Scalability
analysis
Scalable mutexes
scalable_allocation_command function
scalable_allocation_mode function
scalable_allocator template
Scalable memory allocation
Scaling
Scan pattern
Scheduler
sequenced_policy
sequencer_node
Shared Virtual Memory (SVM)
simple_partitioner
Single Instruction Multiple Data (SIMD)
extensions
layer
operations
parallelism
STL library
Soft thread limit, see Thread limits
source_node
source_node interface
speculative_spin_mutex
SPIR
Splitting constructor
Standard Template Library (STL)
algorithms
Intel’s Parallel
pre-built packages
execution policies
parallel_policy
parallel_unsequenced_policy
sequenced_policy
unsequenced_policy
use of
iterators
SIMD parallelism
std::for_each
std::for_each_n
std::reduce
std::transform
std::transform_reduce
transform_iterator class
static_partitioner
HPC
random work stealing
thread
Static priorities
std::aligned_alloc
std::allocate_shared
std::allocator<T> Signature
std::for_each
std::for_each_n
std::make_shared
std::reduce
std::transform
std::transform_reduce
STL containers
Streaming computations
Strong scaling
Synchronization
atomic<T> class
C++11 mutex
image histogram
computation
grayscale picture
mutual exclusion
sequential implementation
mutex
concept
example
scoped locking

T

task_arena
Task arenas
for isolation
abstraction
Double-Edged Sword
isolation for correctness
Task cancellation
Task granularity
task_group class
parallel Fibonacci code
recycling
run() and wait()
Task_group_context (TGC)
Task groups
high-level APIs
[structured_]task_group
task_group
Task parallelism
See alsoFunctional parallelism
Task priorities
enqueued with normal priority
executing algorithms
generic-wrapper-task approach
task_group
using concurrent_priority_queue
parallel_for algorithms
used in real time systems
Tasks
Task scheduler
approaches for setting number of threads
more task_scheduler_init object
single task_scheduler_init object
using class task_arena
using global_control object
architecture
controlling thread count
changes in global_control objects
class task_arena class interface
global_control object
task_scheduler_init class interface
low-level APIs
task class
task_arena class
task_scheduler_init class
this_task_arena members
task_scheduler_init objects
task_scheduler_observer
Task scheduling
Task-to-thread affinity
affinity_id
affinity_partioner
execute task trees
functions
loop algorithms
master thread’s local deque
note_affinity function
set_affinity
type affinity_id
usage
tbb_allocator template
tbb::concurrent_hash_map
TBB exceptions
template
virtual functions
tbbmalloc
TBBMALLOC_CLEAN_ALL_BUFFERS
TBBMALLOC_CLEAN_THREAD_BUFFERS
tbbmalloc_proxy library
TBBMALLOC_SET_SOFT_HEAP_LIMIT
TBB_MALLOC_USE_HUGE_PAGES
tbb::parallel_reduce
TBB_runtime_interface_version
tbbvars scripts
tcmalloc
Templates
Think Parallel
Thread limits
Thread Local Storage (TLS)
combinable
enumerable_thread_specific
flatten2d class
Thread migration
Thread pinning
Thread pools
Thread pool (the market) and task arenas
Threads
Thread-to-core affinity
creation
allow OS
hwloc package
OS
task_scheduler_observer object
usage
Thumb, rules of
tick_count class
Timing
TLS, see Thread Local Storage (TLS)
Triad computation, heterogeneous implementation
Triad vector operation
True sharing

U

unlimited_node
Unordered associative containers
bucket methods
built-in locking vs. no visible locking
collisions
concurrent_hash_map
erase methods
hash map
iterators
map/multimap and set/multiset Interfaces
parallel scaling
Unsafe parallel implementation
image histogram computation
shared histogram vector
shared variable/shared mutable state
Unsequenced execution policy
unsequenced_policy

V

Vectorized execution
VTune

W, X, Y

Workpile pattern
Work stealing
cache-oblivious algorithms
dispatchers
loop pattern
per-thread task dispatchers
pseudo-code
scheduler bypass
schedulers
snapshot of
spawning mechanism
split tasks

Z

zero_allocator