Project Description is too long for this post. Please read attached word doc for full description.
We would like to buy a block of 80 hours doing PostgreSQL queries over Oil & Gas data. We are hovering at around 200 million rows of data that feed multiple user interfaces. All of this data needs a ton of cleaning and curation.
The PostgreSQL stuff will not be hard to do. Lot of case when, concat, substrings… just moving data from poor formats to good formats. For example, we may get a VARCHAR that is 1/1/2017 and need to convert it to date. Two second change. But we have hundreds of these date fields. Sometimes in a table with 1MM rows, we will have one badly formatted date – 1&1/2017. Getting that one bad date fixed is not the goal. We have to work around it to get the other 999,999 into date format. We need that VARCHAR in date format. And sometimes, the dates come in as 1/2017. Or some come in as MM/YYYY and others (in same field) are in MM/DD/YYYY format. You get the picture!
The easy part will be PostgreSQL. The hard part? Oil & Gas data is not written in lay speak. It’s a very technical industry… tubing, casing, tvd, formations… little of it will make sense. Of course, if you know O&G, huge bonus, but it’s not necessary or expected for this project.
We will interpret all the O&G stuff (for you) and send you simple sentences on what each c_ and m_ field should look like. You will then build the queries for each table.
80 hours could just be the beginning. We are already working on a Phase II. We would like to build aggregation queries around all 200 million rows for “highlights” we can report on a webpage. This would entail running row counts/sums/etc. for a ton of fields and even some basic equations. We would update these queries every night and have a web page that reports all the “macro” changes to our data sets each day. This would be used to educate users on how vast and complex the data is behind the UIs.
Finally, if you know XPATH, that’s a huge bonus for this project. We can engage you beyond the 80+ hours of PostgreSQL work to running our Web Scraper to go get more data. More data? More PostgreSQL cleaning and curation work. XPATH is not mandatory though. We prefer someone who is super comfortable writing queries. You will need that comfort while you are working with the Oil & Gas terminology.