c++ - In-Place CUDA Kernel for Rectangular Matrix Transpose -
I have looked around for a while, but unable to find a suitable answer for this:
Is there any implementation for mutilated matrix transfer in CUDA?
I know about cublas geam , but there is a need to create another matrix for this: I tried a simple implementation from this:
However, this only works for class matrix. Can someone explain to me why this argument does not work at all for the diagonal matrix? Although it is not in place, however, the 'inexperienced' approach works for the transfer.
The following paper:
The sequential algorithm for in-place matrix transactions is as follows (> O (nm) run time):
// in: n rows; I colle // out: n cols; I lines zero metrics_transize (int * a, int n, int m) {int i, j; For (int k = 0; k & lt; n * m; k ++) {int idx = k; Doing {// Calculate the index in the original array idx = (idx% n) * m + (idx / n); } While (idx k); // Make sure that we do not swap std :: swap (a [k], an [idx]) elements twice; }}
Comments
Post a Comment