Classic task parallelism is not a direct fit to SYCL (or GPUs in general), as tasks are often chosen to be tiny without further nested parallelism. If we deploy a task to a GPU, we however expect it to exploit the accelerator’s hardware concurrency and to be reasonably large. Within the ExaHyPE project, we work with tiny tasks.
However, we do not deploy all tasks directly to the SYCL queue. Instead, we buffer them in application-specific queues. If many appropriate tasks “assemble” within this queue, we merge them into one large meta-task and deploy this meta task to the GPU. The meta task can then exploit multiple level of concurrency and is reasonably expensive.