異構(gòu)加速計(jì)算崛起,不應(yīng)只是關(guān)注計(jì)算芯片
原文標(biāo)題:Why SYCL: Elephants in the SYCL Room
By James Reinders and Michael Wong
摘錄自:https://www.hpcwire.com/2022/02/03/why-sycl-elephants-in-the-sycl-room/
Commentary —?In the second of a series of guest posts on heterogeneous computing, James Reinders, who returned to Intel last year after a short “retirement,” follows up on his piece about how SYCL will contribute to a heterogeneous future for C++.?He is joined?by?Michael Wong, of Codeplay Software Ltd., who is also the current SYCL committee chair. Together, they offer their responses to what might be called the ‘Elephants in the SYCL Room.’?
評(píng)論——在第二個(gè)關(guān)于異構(gòu)計(jì)算的一系列客座帖子中,James Reinders在短暫的“退休”后于去年回到了英特爾,他繼續(xù)了講述SYCL將如何為C++的異構(gòu)未來做出貢獻(xiàn)的文章。Codeplay Software Ltd.的Michael Wong也加入了他的行列,他也是現(xiàn)任SYCL委員會(huì)主席。他們一起對(duì)所謂的“SYCL房間里的大象”做出了回應(yīng)。
The case for C++ programming, with SYCL bringing in full heterogeneous support, has been well articulated by persons close to the SYCL specification including a recent article “Considering a Heterogeneous Future for C++”?and numerous other resources enumerated on?sycl.tech. SYCL is a Khronos standard that introduces support for fully heterogeneous data parallelism to C++. While SYCL is not a cure-all, it is a solution to one aspect of a larger problem:?How do we enable adequately enable full heterogeneous programming given the emerging explosion in hardware diversity?
熟悉 SYCL 規(guī)范的人已經(jīng)很好地闡明了 SYCL 帶來了全面異構(gòu)支持的 C++ 編程案例,包括最近的一篇文章“考慮 C++ 的異構(gòu)未來”以及 sycl.tech 上列舉的許多其他資源。 SYCL 是一種 Khronos 標(biāo)準(zhǔn),引入了對(duì) C++ 的完全異構(gòu)數(shù)據(jù)并行性的支持。 雖然 SYCL 并不是包治百病的靈丹妙藥,但它是一個(gè)方面的解決方案:鑒于硬件多樣性的爆炸式增長(zhǎng),我們?nèi)绾纬浞謫⒂猛耆悩?gòu)編程?
In this article, we offer our perspective on key questions about SYCL, based on our perspectives of being having worked in this domain for decades. These important questions are asked by software developers looking to understand if SYCL matters to them. Let’s face it: at some point,?every?major project has?Elephants in the Room.[1]?Successful projects address their elephants openly.
在本文中,我們基于我們?cè)谠擃I(lǐng)域工作了數(shù)十年的觀點(diǎn),提出了對(duì) SYCL 關(guān)鍵問題的看法。 這些重要問題是由希望了解 SYCL 對(duì)他們是否重要的軟件開發(fā)人員提出的。 讓我們面對(duì)現(xiàn)實(shí)吧:在某些時(shí)候,每個(gè)重大項(xiàng)目都會(huì)有“房間里的大象”。[1] 成功的項(xiàng)目公開地解決了他們的問題。
Elephant 1: Aren’t GPUs enough? Do other accelerators really matter?
大象一:GPU 還不夠嗎? 其他加速器真的重要嗎?
Valid questions exist about which accelerators will stay, and which will be a passing fad. For decades, different accelerators have come and gone while CPUs persist. Today, GPUs are present in the vast majority of computer systems. Writing our applications to leverage GPUs makes a lot of sense given their near ubiquity.
關(guān)于哪些加速器將繼續(xù)存在、哪些將成為曇花一現(xiàn),存在一些合理的問題。 幾十年來,不同的加速器來了又去,而 CPU 卻一直存在。 如今,GPU 出現(xiàn)在絕大多數(shù)計(jì)算機(jī)系統(tǒng)中。 鑒于 GPU 幾乎無處不在,編寫應(yīng)用程序來利用 GPU 非常有意義。
As a result, one of the first elephant questions is whether we really need to generalize, i.e., do we need to be multiarchitecture and multivendor?
因此,首要問題之一是我們是否真的需要泛化,即我們是否需要多架構(gòu)和多供應(yīng)商?
The expectation that the future will require “dedicated or semi-dedicated hardware accelerators” as a must-have feature for computing in this decade is expected by experts including researchers led by Prof. Masaaki Kondo in “White Paper on Next-Generation Advanced Computing Infrastructure” and by Hennessy & Patterson in their paper “A New Golden Age for Computer Architecture”.
以近藤正明教授為首的研究人員等專家在《下一代高級(jí)計(jì)算基礎(chǔ)設(shè)施白皮書》中預(yù)計(jì),未來將需要“專用或半專用硬件加速器”作為這十年計(jì)算的必備功能。 ”以及 Hennessy 和 Patterson 在他們的論文“計(jì)算機(jī)架構(gòu)的新黃金時(shí)代”中。
As long as we are talking about dedicated accelerators, why stop at GPUs? Optimizing for different types of accelerators is a great objective, but we don’t want to write different code for different types of accelerators. We believe that the industry will benefit from a standardized language, that everyone can contribute to, collaborate on, is not locked into a particular vendor, and can evolve organically based on its members and public requirements.
既然我們談?wù)摰氖菍S眉铀倨?,為什么只停留?GPU 上呢? 針對(duì)不同類型的加速器進(jìn)行優(yōu)化是一個(gè)偉大的目標(biāo),但我們不想為不同類型的加速器編寫不同的代碼。 我們相信,該行業(yè)將受益于標(biāo)準(zhǔn)化語言,每個(gè)人都可以做出貢獻(xiàn)、進(jìn)行協(xié)作,不會(huì)被鎖定到特定的供應(yīng)商,并且可以根據(jù)其成員和公眾要求有機(jī)發(fā)展。
SYCL takes an interesting approach that allows us to use common code when we want and specialize when we want. In this way, SYCL embraces accelerators in general, leaving it to us, the developers, to decide when to write common cross-architecture code, and when we feel it is sufficiently advantageous to specialize code.
SYCL 采用了一種有趣的方法,允許我們?cè)谛枰獣r(shí)使用通用代碼,并在需要時(shí)進(jìn)行專業(yè)化。 通過這種方式,SYCL 總體上擁抱加速器,讓我們開發(fā)人員來決定何時(shí)編寫通用的跨架構(gòu)代碼,以及何時(shí)我們認(rèn)為專門化代碼有足夠的優(yōu)勢(shì)。
Its underlying programming model, SPMD, has shown to be usable across many architectures. SPMD is how most programmers using Nvidia CUDA/OpenCL/SYCL think: writing code from the perspective of operating on one work item and expecting it to run concurrently on most hardware such that multiple work-items fill vector hardware lanes.
其底層編程模型 SPMD 已被證明可在多種體系結(jié)構(gòu)中使用。 SPMD 是大多數(shù)使用 Nvidia CUDA/OpenCL/SYCL 的程序員的想法:從操作一個(gè)工作項(xiàng)的角度編寫代碼,并期望它在大多數(shù)硬件上同時(shí)運(yùn)行,以便多個(gè)工作項(xiàng)填充矢量硬件通道。
SYCL offers a large degree of portability across vendors (e.g., many different sources of GPUs) as well as architecture (e.g., GPUs, FPGAs, ASICs).
SYCL 提供了跨供應(yīng)商(例如,許多不同的 GPU 來源)以及架構(gòu)(例如,GPU、FPGA、ASIC)的高度可移植性。
Elephant 2: Why not just use Nvidia CUDA?
大象2:為什么不直接使用Nvidia CUDA?
A vibrant GPU eco-system is emerging thanks to competition from multiple GPU vendors. This is part of a trend for more and more competition for accelerators in general. The installed base of CUDA applications that make use of Nvidia GPUs are poised to be able to adapt over time to an open, multivendor, multiarchitecture software approach created to serve all vendors, not just Nvidia.
由于多個(gè) GPU 供應(yīng)商的競(jìng)爭(zhēng),一個(gè)充滿活力的 GPU 生態(tài)系統(tǒng)正在興起。 這是加速器競(jìng)爭(zhēng)越來越激烈的趨勢(shì)的一部分。 使用 Nvidia GPU 的 CUDA 應(yīng)用程序的安裝基礎(chǔ)將能夠隨著時(shí)間的推移適應(yīng)開放的、多供應(yīng)商、多架構(gòu)的軟件方法,該方法旨在為所有供應(yīng)商(而不僅僅是 Nvidia)提供服務(wù)。
While CUDA has earned a strong following given its value proposition and the strength of Nvidia GPUs in the ecosystem, there are increasing concerns regarding the lock-in that use of CUDA creates. Such concerns stem from the proprietary focus highlighted by these factors:
雖然 CUDA 因其價(jià)值主張和 Nvidia GPU 在生態(tài)系統(tǒng)中的實(shí)力而贏得了眾多追隨者,但人們?cè)絹碓綋?dān)心 CUDA 的使用造成的鎖定。 這些擔(dān)憂源于以下因素所強(qiáng)調(diào)的專有關(guān)注:
The definition of CUDA, its implementation and evolution, is managed by Nvidia and evolves specifically to serve Nvidia GPU product designs. Details of new features in CUDA, are generally shielded from public view until NVIDIA has both hardware and software to support them. As discussed more fully below, this control stifles innovations from other vendors.
The?licensing?for CUDA tools and libraries, from Nvidia, specifically states they must be used to “develop applications only for use in systems with Nvidia GPUs.” Even “open source” from Nvidia includes?licensing languagerestricting key parts in the same manner.
????1. CUDA 的定義、其實(shí)現(xiàn)和發(fā)展由 Nvidia 管理,并專門為服務(wù) Nvidia GPU 產(chǎn)品設(shè)計(jì)而發(fā)展。 CUDA 中新功能的詳細(xì)信息通常不會(huì)公開,直到 NVIDIA 擁有支持它們的硬件和軟件為止。 正如下面更全面討論的,這種控制抑制了其他供應(yīng)商的創(chuàng)新。
????2.?Nvidia 的 CUDA 工具和庫的許可特別指出,它們必須用于“開發(fā)僅在具有 Nvidia GPU 的系統(tǒng)中使用的應(yīng)用程序”。?即使是 Nvidia 的“開源”也包含以同樣方式限制關(guān)鍵部分的許可語言。
Nvidia CUDA can claim credit for bringing accelerated computing to the masses using Nvidia GPUs.With the explosion of competition in the accelerator market, it could appear that CUDA has become a walled garden in an increasingly open and transparent world.The desire for an open, multivendor, multiarchitecture alternative to CUDA is not going away.
Nvidia CUDA 因使用 Nvidia GPU 為大眾帶來加速計(jì)算而享有盛譽(yù)。隨著加速器市場(chǎng)競(jìng)爭(zhēng)的爆發(fā),CUDA 似乎已經(jīng)成為一個(gè)日益開放和透明的世界中的圍墻花園。對(duì) CUDA 的開放、多供應(yīng)商、多架構(gòu)替代方案的渴望不會(huì)消失。
Elephant 3: Why not just use AMD HIP?
大象 3:為什么不直接使用 AMD HIP?
AMD Heterogeneous-Computing Interface for Portability (HIP) is a C++ dialect. AMD tools include a “HIPify tool” to help transform CUDA code into HIP. AMD?states?that “HIP code can run on AMD hardware (through the HCC compiler) or Nvidia hardware (through the NVCC compiler) with no performance loss compared with the original CUDA code.”
AMD 異構(gòu)計(jì)算可移植接口 (HIP) 是一種 C++ 方言。 AMD 工具包括“HIPify 工具”,可幫助將 CUDA 代碼轉(zhuǎn)換為 HIP。 AMD 表示,“HIP 代碼可以在 AMD 硬件(通過 HCC 編譯器)或 Nvidia 硬件(通過 NVCC 編譯器)上運(yùn)行,與原始 CUDA 代碼相比,不會(huì)有任何性能損失。”
HIP is a “follow CUDA” strategy – i.e., where AMD develops an update to HIP as quickly as possible after Nvidia has released an update to its CUDA platform. The arguments in favor of HIP rest on the virtue of reuse of a large CUDA codebase for AMD GPUs. Unfortunately, given the opaqueness of CUDA no one can follow CUDA too closely, timely, or accurately. This offers no opportunity for AMD to expose unique AMD hardware innovation without forcing CUDA developers to change their code with #ifdefs for AMD GPUs.
HIP 是一種“跟隨 CUDA”策略,即在 Nvidia 發(fā)布其 CUDA 平臺(tái)更新后,AMD 盡快開發(fā) HIP 更新。 支持 HIP 的論點(diǎn)是基于 AMD GPU 重用大型 CUDA 代碼庫的優(yōu)點(diǎn)。 不幸的是,鑒于?CUDA 的不透明性,沒有人能夠太密切、及時(shí)或準(zhǔn)確地跟蹤 CUDA。 如果不迫使 CUDA 開發(fā)人員使用 AMD GPU 的 #ifdefs 更改代碼,AMD 就沒有機(jī)會(huì)展示獨(dú)特的 AMD 硬件創(chuàng)新。
While AMD has created value with HIP for those that seek AMD GPUs as an alternative to Nvidia GPUs, it is not hard to want more. Imagine having a solution that can keep pace with the feature innovation and performance of CUDA!
We believe that innovation will flourish the?most?in an open field rather than in the shadows of a walled garden.
[Editor’s note: There is a SYCL implementation called?hipSYCL?that sits on top of HIP and targets AMD GPUs running ROCm and Nvidia GPUs.]
雖然 AMD 通過 HIP 為那些尋求 AMD GPU 作為 Nvidia GPU 替代品的人創(chuàng)造了價(jià)值,但想要更多并不難。 想象一下,擁有一個(gè)能夠與 CUDA 的功能創(chuàng)新和性能保持同步的解決方案!我們相信,創(chuàng)新將在開放的領(lǐng)域而不是在圍墻花園的陰影中蓬勃發(fā)展。
[編者注:有一個(gè)名為 hipSYCL 的 SYCL 實(shí)現(xiàn),它位于 HIP 之上,并針對(duì)運(yùn)行 ROCm 和 Nvidia GPU 的 AMD GPU。]
Elephant 4: Why not just use OpenCL?
大象4:為什么不直接使用OpenCL?
OpenCL provides an open multivendor alternative, but at a lower layer of the software stack than SYCL or CUDA offers. SYCL grew out of a desire to bring the benefits of OpenCL’s open, multivendor, multiarchitecture approach by providing a standard C++ interface for heterogenous parallel architectures. SYCL implementations often utilize OpenCL for their implementations, but also have the flexibility to use other backends under the hood as of SYCL2020. SYCL delivers on the promise of OpenCL, in a higher productivity form through its C++ abstractions.
OpenCL 提供了一種開放的多供應(yīng)商替代方案,但其軟件堆棧層低于 SYCL 或 CUDA 提供的軟件堆棧層。 SYCL 的誕生是為了通過為異構(gòu)并行架構(gòu)提供標(biāo)準(zhǔn) C++ 接口來發(fā)揮 OpenCL 開放、多供應(yīng)商、多架構(gòu)方法的優(yōu)勢(shì)。 SYCL 實(shí)現(xiàn)通常使用 OpenCL 進(jìn)行實(shí)現(xiàn),但從 SYCL2020 開始,也可以靈活地在后臺(tái)使用其他后端。 SYCL 通過其 C++ 抽象以更高的生產(chǎn)力形式兌現(xiàn)了 OpenCL 的承諾。
Elephant 5: Can’t we just use C++ ?
大象5:我們不能只使用C++嗎?
Let’s start with the assumption that we want to program heterogeneous machines, we value portability, and we do not want to pay a penalty in performance for portability.
讓我們首先假設(shè)我們想要對(duì)異構(gòu)機(jī)器進(jìn)行編程,我們重視可移植性,并且我們不想為可移植性付出性能上的代價(jià)。
We might answer?”yes”?– C++ is enough when you have SYCL support too. After all, C++ was built to be extended by template libraries like SYCL. SYCL adds no new keywords, but it does benefit from SYCL-aware C++ compilers to help with cross-compilation, fat binaries, and remote memories. Those are simply things C++ compilers have not historically made easy.
我們可能會(huì)回答“是”——當(dāng)您也有 SYCL 支持時(shí),C++ 就足夠了。 畢竟,C++ 的構(gòu)建是為了通過 SYCL 等模板庫進(jìn)行擴(kuò)展。 SYCL 沒有添加新的關(guān)鍵字,但它確實(shí)受益于 SYCL 感知的 C++ 編譯器來幫助交叉編譯、胖二進(jìn)制文件和遠(yuǎn)程內(nèi)存。 這些都是 C++ 編譯器歷史上并不容易做到的事情。
SYCL also offers a solution today, within standard C++, to address programming for full heterogeneous computing built on top of ISO C++. This includes device enumeration (info), defining work (kernels), submitting and coordinating work across devices (queue), and managing remote memories.
如今,SYCL 還在標(biāo)準(zhǔn) C++ 中提供了一種解決方案,用于解決構(gòu)建在 ISO C++ 之上的完全異構(gòu)計(jì)算的編程問題。 這包括設(shè)備枚舉(信息)、定義工作(內(nèi)核)、跨設(shè)備提交和協(xié)調(diào)工作(隊(duì)列)以及管理遠(yuǎn)程內(nèi)存。
That brings us to “No”?– the C++ standard does not define support for heterogeneous systems with disjoint (non-coherent) memories. Some think it will add that one day, and there is effort to go in that direction, but even those involved believe the current direction will take at least 10 years and it is limited by the need for C++ to maintain backwards compatibility with millions of lines of existing code. In fact, one of us (MW) has written papers urging C++ in that direction. The response from?WG21 (ISO C++), understandably because of the backward compatibility concerns, has been to start with parallel algorithms and executors, and add forward progress guarantees instead of making radical surgical change to the memory and addressing model. Therefore, if you are programming heterogeneous machines it is not likely to be enough to claim “C++ is enough.” There are some trying to move in that direction and that is the beauty of a competitive industry, we can see what will work out in the best interest of the market and consumers. However, today what will work immediately is “C++ plus SYCL” or “C++ plus CUDA” or “C++ plus OpenCL.”
這讓我們得出“不”的結(jié)論——C++ 標(biāo)準(zhǔn)沒有定義對(duì)具有不相交(非連貫)內(nèi)存的異構(gòu)系統(tǒng)的支持。 有些人認(rèn)為有一天會(huì)添加這一點(diǎn),并且正在朝著這個(gè)方向努力,但即使是那些參與其中的人也認(rèn)為當(dāng)前的方向至少需要 10 年,并且它受到 C++ 需要保持與數(shù)百萬行向后兼容性的限制。 現(xiàn)有代碼。 事實(shí)上,我們中的一位 (MW) 已經(jīng)撰寫了論文,敦促 C++ 朝這個(gè)方向發(fā)展。 出于向后兼容性的考慮,WG21 (ISO C++) 的反應(yīng)是從并行算法和執(zhí)行器開始,并添加向前的進(jìn)度保證,而不是對(duì)內(nèi)存和尋址模型進(jìn)行根本性的外科手術(shù)改變。 因此,如果您正在對(duì)異構(gòu)機(jī)器進(jìn)行編程,那么聲稱“C++ 就足夠了”可能還不夠。 有些人試圖朝這個(gè)方向前進(jìn),這就是競(jìng)爭(zhēng)行業(yè)的美妙之處,我們可以看到什么將最符合市場(chǎng)和消費(fèi)者的利益。 然而,今天立即起作用的是“C++ 加 SYCL”或“C++ 加 CUDA”或“C++ 加 OpenCL”。
The purpose of adding SYCL support into our C++ compiler and runtimes, is to add capabilities so C++ supports full heterogeneous support that it does not offer today without SYCL. It is also a way to show how C++ can support heterogeneity in the future, as ISO standards tend to standardize best practices of pre-existing knowledge. We will show one such example below.
將 SYCL 支持添加到我們的 C++ 編譯器和運(yùn)行時(shí)中的目的是添加功能,以便 C++ 支持完整的異構(gòu)支持,而如果沒有 SYCL,C++ 目前無法提供這種支持。 這也是展示 C++ 如何支持未來異構(gòu)性的一種方式,因?yàn)?ISO 標(biāo)準(zhǔn)傾向于標(biāo)準(zhǔn)化現(xiàn)有知識(shí)的最佳實(shí)踐。 下面我們將展示一個(gè)這樣的例子。
Elephant 6: Can SYCL queues can make it into ISO C++?
大象6:SYCL隊(duì)列可以進(jìn)入ISO C++嗎?
Queues are how SYCL assigns work to heterogeneous devices, including handing off data within complex memory systems (not necessarily unified and coherent).
隊(duì)列是 SYCL 將工作分配給異構(gòu)設(shè)備的方式,包括在復(fù)雜的內(nèi)存系統(tǒng)(不一定是統(tǒng)一和一致的)內(nèi)傳遞數(shù)據(jù)。
It is easy to speculate on whether a queue class belongs in C++ long-term, but such speculation is premature.
從長(zhǎng)遠(yuǎn)來看,很容易推測(cè)一個(gè)隊(duì)列類是否屬于C++,但這種推測(cè)還為時(shí)過早。
Proposals for C++23 have included various constructs to direct execution to specific devices, including “std::execution” in?p2300. We know C++23 will continue to rely on a unified global memory address space and will not support disjoint remote memories (complex memory systems).
C++23 的提案包括各種直接執(zhí)行到特定設(shè)備的結(jié)構(gòu),包括 p2300 中的“std::execution”。 我們知道C++23將繼續(xù)依賴統(tǒng)一的全局內(nèi)存地址空間,并且不會(huì)支持不相交的遠(yuǎn)程內(nèi)存(復(fù)雜的內(nèi)存系統(tǒng))。
It is easy to get caught up on syntax. Eventually, if C++ expands to include full heterogeneous support, the concepts embodied in SYCL queue will be needed. Until then, SYCL fills this void. Some important capabilities, such as parallel directives, and message passing, have remained independent standards (OpenMP and MPI). While it is possible C++ will not grow to include full heterogeneous support, we believe C++ will eventually add such support incrementally.
很容易陷入語法困境。 最終,如果 C++ 擴(kuò)展到包括完整的異構(gòu)支持,則將需要 SYCL 隊(duì)列中體現(xiàn)的概念。 在此之前,SYCL 填補(bǔ)了這一空白。 一些重要的功能,例如并行指令和消息傳遞,仍然是獨(dú)立的標(biāo)準(zhǔn)(OpenMP 和 MPI)。 雖然 C++ 可能不會(huì)發(fā)展到包含完整的異構(gòu)支持,但我們相信 C++ 最終將逐步添加此類支持。
C++ aims to standardize established best practice instead of inventing new and unproven features, therefore SYCL is an important steppingstone as one of the many feeders of ‘established best practice’ into the intentionally slower moving C++ standardization process.
C++ 的目標(biāo)是標(biāo)準(zhǔn)化既定的最佳實(shí)踐,而不是發(fā)明新的和未經(jīng)驗(yàn)證的功能,因此 SYCL 是一個(gè)重要的踏腳石,作為“既定的最佳實(shí)踐”進(jìn)入故意緩慢發(fā)展的 C++ 標(biāo)準(zhǔn)化過程的眾多饋送者之一。
As C++23 settles, and C++26 is considered, the future of C++ for heterogeneous computing will begin to take shape, including syntax but likely a full solution will not emerge for another 5-10 years.
隨著 C++23 的穩(wěn)定和 C++26 的考慮,C++ 異構(gòu)計(jì)算的未來將開始成形,包括語法,但完整的解決方案可能在未來 5-10 年內(nèi)不會(huì)出現(xiàn)。
SYCL offers a solution today, within standard C++, to address programming for full heterogeneous computing. This includes device enumeration (info), defining work (kernels), submitting work to devices (queue), and managing remote memories.
SYCL 如今在標(biāo)準(zhǔn) C++ 中提供了一種解決方案,用于解決完全異構(gòu)計(jì)算的編程問題。 這包括設(shè)備枚舉(信息)、定義工作(內(nèi)核)、向設(shè)備提交工作(隊(duì)列)以及管理遠(yuǎn)程內(nèi)存。
Elephant 7: Who is behind SYCL? Is it really Open in the true sense of the word?
大象7:誰是SYCL的幕后推手? 它真的是真正意義上的開放嗎?
We believe that open, international standards and Open Source Software (OSS) projects are good for everyone. When individuals from Intel and Codeplay get involved, we have found that they work hard to help develop and promote such standards and OSS – from WiFi, USB, PCIe to OpenMP, MPI, Fortran, C, C++, OpenCL, and SYCL.
我們相信開放的國(guó)際標(biāo)準(zhǔn)和開源軟件 (OSS) 項(xiàng)目對(duì)每個(gè)人都有好處。 當(dāng)英特爾和 Codeplay 的個(gè)人參與其中時(shí),我們發(fā)現(xiàn)他們努力幫助開發(fā)和推廣此類標(biāo)準(zhǔn)和 OSS——從 WiFi、USB、PCIe 到 OpenMP、MPI、Fortran、C、C++、OpenCL 和 SYCL。
Apple was the original force behind OpenCL, which began as a set of C interfaces at a fairly low level. SYCL originally grew out of efforts within OpenCL to consider higher level interfaces, specifically using C++. After multiple years of very open debates, SYCL was born. Codeplay has been instrumental in SYCL from the very beginning. Intel’s interest in SYCL grew after entering both the FPGA market and announcing the Intel Xe?architecture to include GPUs for compute. Intel is proud to be an active member in the SYCL committee, and an active contributor to implementations to support SYCL. SYCL is a community effort, and the homes of both authors of this article (Intel and Codeplay) are enthusiastic participants along with many others.
Apple 是 OpenCL 背后的原始力量,它最初是一組相當(dāng)?shù)图?jí)別的 C 接口。 SYCL 最初源于 OpenCL 內(nèi)部考慮更高級(jí)別接口(特別是使用 C++)的努力。 經(jīng)過多年的公開辯論,SYCL 誕生了。 Codeplay 從一開始就在 SYCL 中發(fā)揮了重要作用。 在進(jìn)入 FPGA 市場(chǎng)并宣布英特爾 Xe 架構(gòu)包含用于計(jì)算的 GPU 后,英特爾對(duì) SYCL 的興趣與日俱增。 英特爾很自豪能夠成為 SYCL 委員會(huì)的積極成員,并為支持 SYCL 的實(shí)施做出積極貢獻(xiàn)。 SYCL 是一項(xiàng)社區(qū)努力,本文的兩位作者(Intel 和 Codeplay)以及許多其他人都是熱情的參與者。
Elephant 8: I see a herd of elephants – why should I believe in SYCL?
大象8:我看到一群大象——我為什么要相信SYCL?
If you have not yet needed to program an application for multiple heterogeneous machines, you may not yet feel the pain to really understand why we are so excited about the prospects for SYCL. Questioning the need is quite logical.
如果您還不需要為多個(gè)異構(gòu)機(jī)器編寫應(yīng)用程序,那么您可能還沒有真正理解為什么我們對(duì) SYCL 的前景如此興奮。 質(zhì)疑這種需求是非常合乎邏輯的。
There are many use cases for heterogeneous programming. In our?CPPCON 2021 tutorial, we taught programmers from large companies, small companies, and national labs, how to offload high throughput workloads to specialized accelerators.
異構(gòu)編程有很多用例。 在我們的 CPPCON 2021 教程中,我們向來自大公司、小公司和國(guó)家實(shí)驗(yàn)室的程序員教授如何將高吞吐量工作負(fù)載卸載到專用加速器。
Based on many experiences like that, we have every reason to be confident that interest in SYCL will continue to grow at a rapid pace because of the need for C++ programming for heterogeneous platforms.
基于許多類似的經(jīng)驗(yàn),我們有充分的理由相信,由于異構(gòu)平臺(tái)對(duì) C++ 編程的需求,對(duì) SYCL 的興趣將繼續(xù)快速增長(zhǎng)。
If you believe in the power of diversity of hardware and want to harness the impending explosion in architectural diversity, then SYCL is worth a look. Not only it open, multivendor, multiarchitecture play – but it is the key one for C++ programmers (as detailed in “Considering a Heterogeneous Future for C++”).
如果您相信硬件多樣性的力量并希望利用即將到來的架構(gòu)多樣性爆炸,那么 SYCL 值得一看。 它不僅是開放的、多供應(yīng)商、多架構(gòu)的游戲,而且是 C++ 程序員的關(guān)鍵(詳見“考慮 C++ 的異構(gòu)未來”)。
Open, Industry Standards are Critical to Enable High-Volume Markets
開放的行業(yè)標(biāo)準(zhǔn)對(duì)于實(shí)現(xiàn)大容量市場(chǎng)至關(guān)重要
New technology often starts as proprietary developments, which may be sufficient to enable niche applications and markets. But, as these niche applications grow into technology ecosystems, so does the need for competition and industry standardization to enable widespread adoption. Accelerated computing, for many years only a niche capability, has certainly emerged with the status of “here to stay.” Multiple factors contributed to this, and they are not all going away (power wall, IPC wall, memory wall).
新技術(shù)通常始于專有開發(fā),這可能足以實(shí)現(xiàn)利基應(yīng)用和市場(chǎng)。 但是,隨著這些利基應(yīng)用程序成長(zhǎng)為技術(shù)生態(tài)系統(tǒng),競(jìng)爭(zhēng)和行業(yè)標(biāo)準(zhǔn)化的需求也隨之增加,以實(shí)現(xiàn)廣泛采用。 多年來,加速計(jì)算一直只是一種小眾功能,但無疑已經(jīng)以“長(zhǎng)期存在”的狀態(tài)出現(xiàn)。 造成這種情況的因素有很多,而且它們并不會(huì)全部消失(電源墻、IPC 墻、內(nèi)存墻)。
SYCL and related efforts, like oneAPI, were introduced to bring open, industry standards to the historically proprietary universe of accelerated computing.
SYCL 和相關(guān)工作(例如 oneAPI)的推出是為了將開放的行業(yè)標(biāo)準(zhǔn)帶入歷史上專有的加速計(jì)算領(lǐng)域。
The biggest question is: how many influencers are eager to promote a move to standards, vs. how many are locked up by proprietary interests?
最大的問題是:有多少影響者渴望推動(dòng)標(biāo)準(zhǔn)的發(fā)展,而有多少人被專有利益所束縛?
As the Cambrian explosion of novel computer architectures unfolds, the case for open, multivendor, multiarchitecture standards only grow stronger.
隨著新型計(jì)算機(jī)架構(gòu)的大爆炸的展開,開放、多供應(yīng)商、多架構(gòu)標(biāo)準(zhǔn)的需求只會(huì)變得更加強(qiáng)烈。
SYCL is an open standard that invites feedback and contributions from everyone to the standard and the open source projects to implement it. The shared goal by everyone involved is to unambiguously ensure paths to high performance for?all?accelerators in this exciting new golden age for computer architecture.
SYCL 是一個(gè)開放標(biāo)準(zhǔn),邀請(qǐng)每個(gè)人對(duì)該標(biāo)準(zhǔn)以及實(shí)施該標(biāo)準(zhǔn)的開源項(xiàng)目提供反饋和貢獻(xiàn)。 所有參與者的共同目標(biāo)是明確確保所有加速器在這個(gè)令人興奮的計(jì)算機(jī)架構(gòu)新黃金時(shí)代實(shí)現(xiàn)高性能。
About the?Authors
James Reinders?believes the full benefits of the evolution to full heterogeneous computing will be best realized with an open, multivendor, multiarchitecture approach. Reinders rejoined Intel a year ago, specifically because he believes Intel can meaningfully help realize this open future. Reinders is an author (or co-author and/or editor) of ten technical books related to parallel programming; his latest book is about SYCL (it can be freely downloaded?here).?
Michael Wong?is the Distinguished Engineer at Codeplay Software. He is a current Director and VP of ISOCPP Foundation, and a senior member of the C++ Standards Committee with more than 25 years of experience. He is a member of the C++ Directions Group. He chairs the WG21 SG19 Machine Learning? and SG14 Games Development/Low Latency/Financials C++ groups and is the co-author of a number C++/OpenMP/Transactional memory features including generalized attributes, user-defined literals, inheriting constructors, weakly ordered memory models, and explicit conversion operators. He has published numerous research papers and is the author of a book on C++11. He has been an invited speaker and keynote at numerous conferences. He is currently the editor of SG1 Concurrency TS and SG5 Transactional Memory TS. He is also the Chair of the SYCL standard and all Programming Languages for Standards Council of Canada. Previously, he was CEO of OpenMP involved with taking OpenMP toward Accelerator support and the Technical Strategy Architect responsible for moving IBM’s compilers to Clang/LLVM after leading IBM’s XL C++ compiler team.
[1]?Elephants in the Room?can be defined as important questions that are obvious, but no one mentions them because they make at least some persons uncomfortable.