I wish to add additional records to my existing PostgreSQL database. I have attached a sample of the file to be added. No records are to be extracted from the database or moved accross the network. Data will only be moved between drives physicaly connected to the server. The file to be added contains around 45 million records. Most of these are already in the database, however, many will not be found there. Here is what I would like to have completed.
1. Backup up the current database and structure to an external drive on the server to assure we can restore in the unlikely event any issues occure.
2. Add the new records to existing PostgreSQL database. Use your own import script or modify an existing one we have on the server from the last import.
3. All records added need to include a source field. The database already has a place for this information. Please plan to populate this field so we know the source and date of this file compared to other data in the database. I will provide this information prior to starting work.
4. Remove duplicates. A record is a duplicate if the first name, last name, date of birth and first 4 characters in the address field are identical. There is also a duplicate removal script but you will need to either develop your own or identify how it works to assure it meets the requirments of this job.
When choosing the record to remove it is imperative we remove the existing record so the record appears as an updated record with the new source field.
After the above four items are completed and after we have reviewed the results please export the database in one million record size increments to the external drive. We currently have in excess of 200 million records in the database and when we complete this addition the query performance should remain similar to what it is now. I will provide the URL for you to review the current performance to those interested.
Please ask any questions and I will be happy to reply as quickly as possible.
Thank you for your consideration.