The Project is to develop a system that will de-dupe large quantities of numbers.
The numbers will be from 8 to 12 digits in length and contain upto 1 million records per batch, each project could end up with as many as 20million records. So the program would have to be efficient in order for the de-dupe process not to take too long.
1. Setup a specific project in which to upload numbers – eg. London01
2. Upload 1st batch of numbers and de-dupe against itself – system should have 2 folders one with original batch , one with deduped batch
3. If a 2nd batch is added to the project London01 – it is de-duped – then de-duped against 1st batch – there should then be a file of the remaining numbers and a folder containing the 2 de-duped batches and another containing the original file.
So the Operation would be:
Let’s take Batch 1 as the one we get to upload first.
So Batch 1 is de-duped against itself – resulting batch is stored so it can be downloaded
Batch2 is de-duped against itself (this should be stored – so it can be download if required)
Batch2 is then de-duped against the now de-duped batch1 – remaining numbers in batch 2 are stored so they can be downloaded
Batch3 is de-duped against itself (this should be stored – so it can be download if required)
Batch 3 is then de-duped against both the de-duped batch 1 and batch 2 - remaining numbers in batch 3 are stored so they can be downloaded
The Running environment required is as follows:
Software to run on windows server with SQL DB.
Interface for the end user should be webased – there should be a master login who can see all the projects and who can also create separate logons for sub-users to setup individual projects.
So sub-users would only have access to their own projects and associated data.
10 freelancers are bidding on average $545 for this job
13 Years of experience in Software development Windows and Web. Expert in Sql Server and .NET. Microsoft Certified as MCP, MCTS, MCPD, MCT.