I need some Python code that will join files. The files I will be joining are large - up to 5 GB. I need code that is fast and efficient.
I can't use the Python csv module for this since I may need this code to read from HDFS in a Hadoop cluster.
I have 3 files, A,B, and C. A is the master files and needs to do something similar to a LEFT OUTER JOIN in SQL with files B and C.
File A is the master file and has 6 fields:
File B has 3 fields
File C has 4 fields:
File A joins to file B on column A
File A joins to file C on column A
The final output file will look like
The code has to be fast since the files sizes can be up to 5 GB
Bu iş için 7 freelancer ortalamada $119 teklif veriyor
I have been working in python for more than 3 years. I can do the job for you.
I have done the major portion of my programming in Python and am very familiar with the language and environment.
Are the items in file A/B/C sorted by column A? If yes, the task is trivial and the conversion will be very fast. Similar to copying the files. Otherwise the script have to build indexes for file B and C, and use th Daha Fazla