Abstract:
Abstract—Loop tiling is a useful technique used to achieve cache optimization in scientific computations. However, general loop tiling techniques usually fail to improve parallelism in certain scientific computations due to dependences among execution steps. In this paper we implement and experiment on a tiling technique known as Parameterized Diamond Tiling designed based on the data dependences in the program. We implement
this tiling scheme in the CHiLL compiler and demonstrate its performance for 4 stencil computations of which, outputs are calculated as a function of neighbouring points. As one of the primary goals of parameterization, in this paper we observe the impact of tile sizes on performance.