I have a 25 mB file with 4 fields (see below) and 900,000 records. At the moment it's a tab-delimited .DAT file. I need to eliminate most of the records and to reduce the file to three fields instead of four, following some simple rules:
STEP 1 - Eliminate unnecessary records
1) If Field 3 or Field 4 is blank, delete the record and look at the next record.
2) If Field 3 or Field 4 contains any alphabet letters (A-Z), delete the record and look at the next record.
3) If the value in Field 2 of the current record matches the value in Field 2 of the previous record, delete the current record (but not the previous record), then look at the next record.
4) Repeat steps 1-3 until all of the records have been examined.
STEP 2 - Combine Fields 1 & 2
1) For all remaining records, combine Fields 1 and 2 into one Field, so if Field 1 contains "US" and Field 2 contains "92084" the new combined field should be: "US92084"
Here are the Fields in the original .dat file:
CountryCode MailCode Longitude Latitude
US 30183 34.3217 [url removed, login to view]
US 30127 33.9135
US 31535 31.5046 [url removed, login to view]
The compressed file should look like this:
CCMailCode Longitude Latitude
US30183 34.3217 [url removed, login to view]
US31535 31.5046 [url removed, login to view]
Also, I'd like to keep a copy of the program (so maybe it should be an .EXE file), because I might need to run it again in a year or so on a similar array. The program can be written in any language, I suppose. If it's in PHP, that would a bonus, so I can edit it. Thanks!