Conversation
|
OK, this is now working for QR on full |
lkdvos
left a comment
There was a problem hiding this comment.
Would it make sense to introduce a project_triu(!) and/or norm_triu function for this, and just overload the GPU ones to allocate?
As a side note, we might at some point try and not generate any of the gauge dependency checks if the tolerance is set to 0, as a way to disable the warnings, or alternatively put the entire thing in a @warn block so we can disable that, since these checks are starting to feel a bit expensive if we need to start allocating.
Yeah I think that could be a nice way to handle it, however I'm unsure if it's that helpful if it's only used in these checks. |
Codecov Report✅ All modified and coverable lines are covered by tests.
🚀 New features to boost your workflow:
|
|
|
||
| Q₁ = view(Q, :, 1:p) | ||
| R₁₁ = UpperTriangular(view(R, 1:p, 1:p)) | ||
| R₁₁ = UpperTriangular(R[1:p, 1:p]) |
There was a problem hiding this comment.
This is a subtle and impactful change. The UpperTriangular wrapper is really only necessary to enable the rdiv! call below. If GPUs cannot deal with UpperTriangular of a view of a GPUArray, then maybe we need to call the corresponding BLAS/LAPACK methods directly, or have some intermediate wrapper like rdiv_uppertriangular!.
There was a problem hiding this comment.
If GPUs cannot deal with UpperTriangular of a view of a GPUArray
Indeed they can't :(
There was a problem hiding this comment.
I even wonder how rdiv!(::Matrix, ::UpperTriangular) is evaluated on the GPU, since you need cuSOLVERDx to access TRSM.
There was a problem hiding this comment.
Through https://docs.nvidia.com/cuda/cublas/#cublas-t-trsm I think
There was a problem hiding this comment.
BTW cuSOLVERDx is only for device side code, so it can only be called by running CUDA kernels, not from host side code as we are doing here...
There was a problem hiding this comment.
https://github.com/JuliaGPU/CUDA.jl/blob/fbb90981cbde21d979087ad518a510f5b38f95b3/lib/cusolver/src/linalg.jl#L43 is this what you're looking for?
There was a problem hiding this comment.
Not really, this is internally in dividing using \ by a general matrix, which is then evaluated by computing its QR decomposition and then directly calling cuBLAS.trsm! on the triangular factor. Here we already have the triangular factor, so I indeed want to call trsm!, but using generic code, which is why I was using ldiv!/rdiv!. And I don't see where rdiv!(first_arg, second_arg::UpperTrangiular) is then actually lowered to cuBLAS.trsm! in the CuArray case (and why that only works for a pure CuMatrix and not for a view over it.)
There was a problem hiding this comment.
There was a problem hiding this comment.
To unblock this: shall we indeed create a helper rdiv!_uppertriangular! that for now just avoids the copy on the CPU and simply copies on the GPU with a # TODO: dispatch to trsm! directly
There was a problem hiding this comment.
But Then I don't understand why it doesn't work. The argument B::StridedCuMatrix in https://github.com/JuliaGPU/CUDA.jl/blob/fbb90981cbde21d979087ad518a510f5b38f95b3/lib/cublas/src/linalg.jl#L443 should accept a view of a CuMatrix, no? My apologies for being annoying, my lack of access to a GPU to test these things myself makes me ask these questions.
|
The Enzyme windows fail is happening on |
No description provided.