THIS IS A QUICK PROJECT. LOWEST QUALIFIED QUICKEST TO BID WITH FASTEST TURN AROUND TIME WINS. THE JOB HAS TO BE DONE TODAY. YOU HAVE THREE HOURS TO BID AND 2 HOURS TO DO PROJECT.
Below you will find the text of the original project. The programmer did the job as asked but it is not near fast enough and I wanted to add a tiny twist to it. The only thing that has changed is the input data. The data input format is as follows:
email@[url removed, login to view]|Name
so "Name" field has been added so the output but is not included with parse. The output just needs to include matching name for that record if passed. The input data is typically 2k-5k records. The DOMAINS and WORDS file is very minimal but the ADDRESSES is currently 700k records (see included code for use of terms DOMAINS, WORD, and ADDRESSES). I suggest loops that exit the loop upon match and go to the next test to improve speed. I also suggest a different coding language. Another improvement would be to fork off multiple processes and split up the job. Possible platforms are MacOS, WIN7 and any variety of Linux. I don't have a preference really. I will be away from my desk for the next two hours so please review carefully and I'll answer questions if there is time when I get back. Clearly illustrate how you can get this work done and bid carefully with the time frame necessary to get this job done. An example is:
I have time to do this today. I can start within X hours and complete this in X hours. I will user Perl ver x.x and split up the processes by three ....blah blah blah
I need a script preferably in shell under linux/unix that will do what "Email Scrubber [url removed, login to view]" will do in it's entirety. The main problem with this spreadsheet is that it is limited to 65k records. I don't want to have a limit on input or output. Google "Email Scrubber [url removed, login to view]" to find this. So basically when you are done building this script, I will have a way to maintain the data in each of the tabbed worksheets via text files and given my email list to scrub I will run a command like this:
[url removed, login to view] [url removed, login to view] > [url removed, login to view]
and the output will be my desired scrubbed list. Pretty simple.
Cost is the main factor of selection but if someone can work on this right this very minute at the right price, I will pick them immediately. I'll be on for at least a couple of hours.
############# IMPORTANT #####################################
Please do not bid or ask me questions before reading these notes and downloading the file shown above. If you do not follow this simple request, then you may be disqualified from being selected.
I already had a working version in Ruby 1.9.1 that support multithreading. It runs 1000 times faster than your current script. I can give you immediately after testing with sample input & output from you.