I have crawled several thousand sites and saved the html from those sites into json along with some structured meta data. we need to import all of these json blobs in to ElasticSearch so that we can query all of the pages for specific terms, keywords, and phrases. we need help getting all of this setup and done. each document/page is roughly 20K so this data is massive since we are storing a ton of html and redundant data but its critical that we can search the html for specific things. I need someone that is familiar with how to deal with this much data (terabytes) in a quick and cost effective way. I see it that we can either spin up a massive AWS server or maybe a cluster of servers (from digital ocean) each running some portion of the data. overall we need help, our timeline is short and we need to get someone that has already learned a lot of lessons about doing this. I expect this to be in the 50-100 million row range.
we are still gathering data but we have a lot of representative data to go ahead and get started with
let me know in your bid how awesome you are with ES, for sure i'm looking for an expert :)