I have some rather large json .jl files. they have thousands of lines of html in them. each blob can be between 100mb to 3gb. i would like a script or cmd etc.. that i can point at a json file and it will parse the file as quickly as possible and output a histogram of word usage aka i want to know the most used words in the 1gb file. the output can be a csv file, obviously we cant output everything we do we need some lower adjustable threshold.
The hard part of this project is dealing with the large json single file. We need to utilize every core of the machine we run this on and we need to highly efficient in how we analyze the text.
if needed you can assume we have enough memory on the machine to fit the entire json file in it, so if its 3gb file we are sure we have 3gb of available memory if needed. likely we will run this on a server with 8+gb of free space.
Ideally this can run on windows machine but i am open to others if you can make a case for it being better.
Bu iş için 15 freelancer ortalamada $196 teklif veriyor
Hi. Interesting project. Some time ago I had similar task and managed to implement it by spliting the file into multiple file. Run a multithreading program on then(which will be much faster depending on the machine Daha Fazla
Hello I have some experience on this problem. As you said the main problem here is the size of data. Otherwise reading a json file and counting words is easy. It's very good that we can be sure about memory. But Daha Fazla
i can write this in python multithreaded, prefer to run on linux.
Hello Sir, I have much experience on data parsing with json and data mining i can help you do it pl ping me and give me more details thanks!!!!!!!!!!!!
Hi! I am experienced Python coder. I can develop a script to read and interpret your files quickly and efficiently. Please contact me for further discussion. Thanks
Hello I'm interesting your project very well I'm a Good C/C++, Python, Data Processing expert. I understand your req exactly. I m quite well experienced in these jobs. Let's go ahead with me I want to service Daha Fazla
Hi, billy01 i worked with you before in the lastweek i created a c# programe to seach html files for img tags without alt attribute about the new job i can do another c# programe to parse the json files , Daha Fazla
Hi I have immense experience data processing using python and c++.I have over 12 years of experience in this field and I can deliver your project on time and within budget. Drop in a mail we can talk further. Daha Fazla
Hello, I am glad to build the json parser app with golang for you, golang file can build a command line file which can be running on both linux and windows. Golang can use each core of the CPUs to speed up the json Daha Fazla
I am proficient analyst using R, C++(I like using it together with R) and Python. I would like to help you in writing the code(s) that would help you use the analyse the json files in R, with objects such as ff to help Daha Fazla
Sounds like a great project. Multiprocess/multithread for the load, and the analysis is really easy. How do you want the output, another json file, text file, or database? I can do all of them perfectly.
Even a 3 GB file can be processed very quickly if all you really care about is a histogram of word use in the contents of the HTML. I'm convinced that even with single-threaded python a single file can be parsed in lit Daha Fazla
I made a low bid because I need to gain reputation here and I don't want to spend money on doing so. Having that in attention I think I would be a very good fit for the task you asked. I think I am more than qualified Daha Fazla