Script - threaded/parallel script to analyze json blob - histogram c#, C++, Python, R

I have some rather large json .jl files. they have thousands of lines of html in them. each blob can be between 100mb to 3gb. i would like a script or cmd etc.. that i can point at a json file and it will parse the file as quickly as possible and output a histogram of word usage aka i want to know the most used words in the 1gb file. the output can be a csv file, obviously we cant output everything we do we need some lower adjustable threshold.

The hard part of this project is dealing with the large json single file. We need to utilize every core of the machine we run this on and we need to highly efficient in how we analyze the text.

if needed you can assume we have enough memory on the machine to fit the entire json file in it, so if its 3gb file we are sure we have 3gb of available memory if needed. likely we will run this on a server with 8+gb of free space.

Ideally this can run on windows machine but i am open to others if you can make a case for it being better.

Beceriler: C# Programlama, C++ Programlama, Python, İstatistik

Daha fazlasını gör: python parallel programming, Programming with R, programming with python, programming in r, programming in python, parallel programming python, parallel programming in c, how hard is programming, c# parallel programming, core python programming, python csv to json, parallel programming c, json programming, threaded, r programming project, programming R, parallel programming, json script, json c, html c c, histogram, analyze, python csv html, json csv python, python parse text

İşveren Hakkında:
( 454 değerlendirme ) Austin, United States

Proje NO: #6559602

Bu iş için 15 freelancer ortalamada $196 teklif veriyor


Hi. Interesting project. Some time ago I had similar task and managed to implement it by spliting the file into multiple file. Run a multithreading program on then(which will be much faster depending on the machine Daha Fazla

1 gün içinde %bids___i_sum_sub_32%%project_currencyDetails_sign_sub_33% USD
(25 Değerlendirme)

Hello I have some experience on this problem. As you said the main problem here is the size of data. Otherwise reading a json file and counting words is easy. It's very good that we can be sure about memory. But Daha Fazla

in %bids___i_period_sub_35% gün içinde250%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(20 Değerlendirme)

i can write this in python multithreaded, prefer to run on linux.

in %bids___i_period_sub_35% gün içinde278%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(30 Değerlendirme)

Hello Sir, I have much experience on data parsing with json and data mining i can help you do it pl ping me and give me more details thanks!!!!!!!!!!!!

in %bids___i_period_sub_35% gün içinde222%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(12 Değerlendirme)

Hi! I am experienced Python coder. I can develop a script to read and interpret your files quickly and efficiently. Please contact me for further discussion. Thanks

in %bids___i_period_sub_35% gün içinde222%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(11 Değerlendirme)

Hello I'm interesting your project very well I'm a Good C/C++, Python, Data Processing expert. I understand your req exactly. I m quite well experienced in these jobs. Let's go ahead with me I want to service Daha Fazla

in %bids___i_period_sub_35% gün içinde200%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(2 Değerlendirme)

Hi, billy01 i worked with you before in the lastweek i created a c# programe to seach html files for img tags without alt attribute about the new job i can do another c# programe to parse the json files , Daha Fazla

in %bids___i_period_sub_35% gün içinde150%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(5 Değerlendirme)

Hi I have immense experience data processing using python and c++.I have over 12 years of experience in this field and I can deliver your project on time and within budget. Drop in a mail we can talk further. Daha Fazla

in %bids___i_period_sub_35% gün içinde222%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(3 Değerlendirme)

Δεν έχει γίνει καμιά πρόταση ακόμα

in %bids___i_period_sub_35% gün içinde105%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(3 Değerlendirme)

A proposal has not yet been provided

1 gün içinde %bids___i_sum_sub_32%%project_currencyDetails_sign_sub_33% USD
(0 Değerlendirme)

Hello, I am glad to build the json parser app with golang for you, golang file can build a command line file which can be running on both linux and windows. Golang can use each core of the CPUs to speed up the json Daha Fazla

in %bids___i_period_sub_35% gün içinde166%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

I am proficient analyst using R, C++(I like using it together with R) and Python. I would like to help you in writing the code(s) that would help you use the analyse the json files in R, with objects such as ff to help Daha Fazla

in %bids___i_period_sub_35% gün içinde155%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

A proposal has not yet been provided

in %bids___i_period_sub_35% gün içinde244%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Sounds like a great project. Multiprocess/multithread for the load, and the analysis is really easy. How do you want the output, another json file, text file, or database? I can do all of them perfectly.

in %bids___i_period_sub_35% gün içinde155%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Even a 3 GB file can be processed very quickly if all you really care about is a histogram of word use in the contents of the HTML. I'm convinced that even with single-threaded python a single file can be parsed in lit Daha Fazla

in %bids___i_period_sub_35% gün içinde222%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

I made a low bid because I need to gain reputation here and I don't want to spend money on doing so. Having that in attention I think I would be a very good fit for the task you asked. I think I am more than qualified Daha Fazla

in %bids___i_period_sub_35% gün içinde115%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)