Epilogue: Future Direction of DPC++
Take a moment now to feel the peace and calm of knowing that we finally understand everything about programming using SYCL and DPC++. All the puzzle pieces have fallen into place.
Before we get too comfortable, let’s note that this book was written at an exciting time for SYCL and DPC++. It has been a period of rapid development that coincided with the release of the first DPC++ specification and the SYCL 2020 provisional specification. We’ve endeavored to ensure that the code samples, in all previous chapters, compile with the open source DPC++ compiler at the time that this book was sent to publication (Q3 2020) and execute on a wide range of hardware. However, the future-looking code shown in this epilogue does not compile with any compiler as of mid-2020.
In this epilogue, we speculate on the future. Our crystal ball can be a bit difficult to read—this epilogue comes without any warranty.
The vast majority of what this book covers and teaches will endure for a long time. That said, it is too hot of an area for it to remain at rest, and changes are occurring that may disrupt some of the details we have covered. This includes several items that appeared first as vendor extensions and have since been welcomed into the specification (such as sub-groups and USM). That so many new features are on track to become part of the next SYCL standard is fantastic, but it has made talking about them complicated: should we refer to such features as vendor extensions, experimental/provisional features of SYCL, or part of SYCL?
This epilogue provides a sneak peek of upcoming DPC++ features that we are very excited about, which were unfortunately not quite finished at the time we sent the book to be published. We offer no guarantees that the code samples printed in this epilogue compile: some may already be compatible with a SYCL or DPC++ compiler released after the book, while others may compile only after some massaging of syntax. Some features may be released as extensions or incorporated into future standards, while others may remain experimental features indefinitely. The code samples in the GitHub repository associated with this book may be updated to use new syntax as it evolves. Likewise, we will have an erratum for the book, which may get additions made from time to time. We recommend checking for updates in these two places (code repository and book errata—links can be found early in Chapter 1).
Alignment with C++20 and C++23
Maintaining close alignment
between SYCL, DPC++, and ISO C++ has two advantages. First, it enables SYCL and DPC++ to leverage the newest and greatest features of standard C++ to improve developer productivity. Second, it increases the chances of heterogeneous programming features introduced in SYCL or DPC++ successfully influencing the future direction of standard C++ (e.g., executors).
SYCL 1.2.1 was based on C++11, and many of the biggest improvements to the interfaces of SYCL 2020 and DPC++ are only possible because of language features introduced in C++14 (e.g., generic lambdas) and C++17 (e.g., class template
argument deduction—CTAD).
The C++20 specification was ratified in 2020 (while we were writing this book!). It includes a number of features (e.g., std::atomic_ref, std::bit_cast) that have already been pre-adopted by DPC++ and SYCL—as we move toward the next official release of SYCL (after 2020 provisional) and the next version of DPC++, we expect to more closely align with C++20 and incorporate the most useful parts of it. For example, C++20 introduced some additional thread synchronization routines in the form of std::latch and std::barrier; we already explored in Chapter 19 how similar interfaces could be used to define device-wide barriers, and it may make sense to reexamine sub-group and work-group barriers in the context of the new C++20 syntax as well.
Work for any standard
committee is never done, and work has already begun on C++23. Since the specification is not finalized yet, adopting any of these features into a SYCL or DPC++ specification would be a mistake—the features may change significantly before making it into C++23, resulting in incompatibilities that may prove hard to fix. However, there are many features under discussion that may change the way that future SYCL and DPC++ programs look and behave. One of the most exciting proposed features is
mdspan, a non-owning view of data that provides both multidimensional array syntax for pointers and an
AccessorPolicy as an extension point for controlling access to the underlying data. These semantics are very similar to those of SYCL accessors, and
mdspan would enable accessor-like syntax
to
be used for both buffers and USM allocations, as shown in Figure
EP-1.
Hopefully it is only
a matter of time until mdspan becomes standard C++. In the meantime, we recommend that interested readers experiment with the open source production-quality reference implementation available as part of the Kokkos project.
Another exciting proposed feature is the std::simd class template, which seeks to provide a portable interface for explicit vector parallelism in C++. Adopting this interface
would provide a clear distinction between the two different uses of vector types described in Chapter 11: uses of vector types for programmer convenience and uses of vector types by ninja programmers for low-level performance tuning. The presence of support for both SPMD and SIMD programming styles within the same language also raises some interesting questions: how should we declare which style a kernel uses, and should we be able to mix and match
styles within the same kernel? We expect future vendor extensions to explore these questions, as vendors experiment with the possibilities in this space ahead of standardization.
Address Spaces
As we have seen
in earlier chapters, there are some cases in which otherwise simple codes are complicated by the existence of memory spaces. We are free to use regular C++ pointers in most places, but at other times are required to use the multi_ptr class and explicitly specify which address space(s) their code is expected to support.
Many modern architectures solve this problem by providing hardware support for a so-called generic address space; pointers may point to any allocation in any memory space, so that we (and compilers!) can leverage runtime queries to specialize code in situations where different
memory spaces require different handling (e.g., accessing work-group local memory may use different instructions). Support for a generic address space is already available in other programming languages, such as OpenCL, and it is expected that a future version of SYCL will adopt generic-by-default in place of inference rules.
This change would greatly simplify many codes and make usage of the
multi_ptr class an optional performance-tuning feature instead of one that is required for correctness. Figure
EP-2 shows a simple class written using the existing address spaces, and Figures
EP-3 and
EP-4 show two alternative designs that would be enabled by
the
introduction of a generic address space.
Extension and Specialization Mechanism
Chapter 12 introduced an expressive set of queries enabling the host to
extract information about a device at runtime. These queries enable runtime parameters such as work-group size to be tuned for a specific device and for different kernels implementing different algorithms to be dispatched to different types of device.
Future versions are expected to augment these runtime queries with compile-time queries, allowing code to be specialized based on whether an implementation understands a vendor extension. Figure
EP-5 shows how the
preprocessor
could be used to detect whether the compiler supports a specific vendor extension.
There are also plans to introduce compile-time queries enabling kernels to be specialized based on properties (which we call
aspects) of the targeted device (e.g., the device type, support for a specific extension, the size of work-group local memory, the sub-group size selected by the compiler). These aspects represent
a
special kind of constant expression not currently present in C++—they are not necessarily
constexpr when the host code is compiled but become
constexpr when the target device becomes known. The exact mechanism used to expose this
device constexpr concept is still being designed. We expect it to build on the specialization constants feature introduced in the SYCL 2020 provisional and to look and behave similarly to the code
shown
in Figure
EP-6.
Hierarchical Parallelism
As we noted back in Chapter 4, we consider the hierarchical parallelism
in older versions of SYCL to be an experimental feature and expect it to be slower than basic data-parallel and ND-range kernels in its adoption of new language features.
There are a
lot of new language features in DPC++ and SYCL 2020, and several of them are currently incompatible with hierarchical parallelism (e.g., sub-groups, group algorithms, reductions). Closing this gap would help to improve programmer productivity and would enable more compact syntax for some simple cases. The code in Figure
EP-7 shows a possible route for extending reduction support to hierarchical parallelism, enabling a hierarchical reduction: each work-group computes a sum, and the kernel as a whole computes the maximum of all
sums
across all work-groups.
The other aspect of hierarchical
parallelism that was briefly touched on in Chapter 4 is its implementation complexity. Mapping nested parallelism to accelerators is a challenge that is not unique to SYCL or DPC++, and this topic is the subject of much interest and research. As implementers gain experience with the implementation of hierarchical parallelism and the capabilities of different devices, we expect syntax in SYCL and DPC++ to evolve in
alignment with standard practice.
Summary
There is already a lot of excitement around SYCL and DPC++, and this is just the beginning! We (as a community) have a long path ahead of us, and it will take significant continued effort to distil the best practices for heterogeneous programming and to design new language features that strike the desired balance between performance, portability, and productivity.
We need your help! If your favorite feature of C++ (or any other programming language) is missing from SYCL or DPC++, please reach out to us. Together, we can shape the future direction of SYCL, DPC++, and ISO C++.