I'm interested in this project. I am not sure that I would just go with opencv, although we can start with that. I would actually use one of the already trained pedestrian detector models (there is a couple of proven ones on tensorflow), but possibly use opencv's tracking if you're actually trying to track the objects you're detecting.
When you say faster than real time, can you explain what that's about, e.g. do you have a certain frame resolution and fps in mind and if so what is the reason for it (in case we can achieve the same function at a lower rate, or downsampling). You still have a few days before the bid end date. So, if you are willing to share a sample, I can play around with it and see if I can give you a proof of concept. Thanks, let me know if interested.