I've done something like this (although on a smaller scale); that project involved image matching using python opencv cpu, I converted to gpu. Issue is that python uses non-gpu opencv api. So to get cuda gpu features, one must do this with the opencv API, in C++. To then use the gpu results from python, I had to get deep into opencv's python wrapper (i.e. cpython), and there was really a lot of stuff to do to convert even something seemingly simple (a few cv image matching functions) for python, using cuda gpu (via opencv's cuda api). Making it work across python opencv api was the time-consuming bit.
I will need considerable remote access to your system with target gpu for development.
This will take more time if we need to have gpu features work with python (because have to follow opencv's cpython python api). Less time if we do not need to target python api. That is to say, if we can do this in C++ and not fuss with python compatibility per se., then that will save time. (Estimate is based on not having to target python api.) Cheers.