- 精华
- 0
- 帖子
- 6793
- 威望
- 0 点
- 积分
- 6937 点
- 种子
- 594 点
- 注册时间
- 2012-5-29
- 最后登录
- 2025-3-3
|
发表于 2020-7-21 09:28 · 广西
|
显示全部楼层
Vector Execution
The superb performance and efficiency of modern graphics processors is derived from the
parallel computing capabilities of vector execution units. As Figure 8 illustrates, one of the
biggest improvements in the compute unit is doubling the size of the SIMDs and enabling
back-to-back execution. When using the more efficient wave32 wavefronts, the new SIMDs
boosts IPC and cuts latency by 4X.
handling mixed precision. For larger 64-bit (or double precision) FP data, adjacent registers are
combined to hold a full wavefront of data. More importantly, the compute unit vector registers
natively support packed data including two half-precision (16-bit) FP values, four 8-bit
integers, or eight 4-bit integers.
|
|