This may be optimized quite straight forwardly in future versions. This implies a specific performance characteristic: the CUDA communication overhead constitutes a certain amount of "fixed costs".
Those are similar numbers to the 10,240 CUDA cores and 80 SMs found in ... who claimed that the performance of the GB203 GPU rumored to be used in the RTX 5080 is “close to AD102 in raster ...