Script - threaded/parallel script to analyze json blob - histogram c#, C++, Python, R

$30-250 USD

Kapalı

İlan edilme:

9 yıldan fazla önce

$30-250 USD

Teslimde ödenir

I have some rather large json .jl files. they have thousands of lines of html in them. each blob can be between 100mb to 3gb. i would like a script or cmd etc.. that i can point at a json file and it will parse the file as quickly as possible and output a histogram of word usage aka i want to know the most used words in the 1gb file. the output can be a csv file, obviously we cant output everything we do we need some lower adjustable threshold. The hard part of this project is dealing with the large json single file. We need to utilize every core of the machine we run this on and we need to highly efficient in how we analyze the text. if needed you can assume we have enough memory on the machine to fit the entire json file in it, so if its 3gb file we are sure we have 3gb of available memory if needed. likely we will run this on a server with 8+gb of free space. Ideally this can run on windows machine but i am open to others if you can make a case for it being better.

Proje No: 6559602

Proje hakkında

14 teklif

Uzaktan proje

Son aktiviteden bu yana geçen zaman 9 yıl önce

Biraz para mı kazanmak istiyorsunuz?

E-posta adresi

Freelancer'da teklif vermenin faydaları

Bütçenizi ve zaman çerçevenizi belirleyin

Çalışmanız için ödeme alın

Teklifinizin ana hatlarını belirleyin

Kaydolmak ve işlere teklif vermek ücretsizdir

14 freelancers are bidding on average $194 USD for this job

@Yknox

Hello I'm interesting your project very well I'm a Good C/C++, Python, Data Processing expert. I understand your req exactly. I m quite well experienced in these jobs. Let's go ahead with me I want to service for you continously. Thanks

$200 USD 3 gün içinde

4,9

(758 değerlendirme)

9,1

@chirgeo

Hi. Interesting project. Some time ago I had similar task and managed to implement it by spliting the file into multiple file. Run a multithreading program on then(which will be much faster depending on the machine you have). The problem I see here is not the big file size, it's the html format. Basically each blob needs to be parsed using some html parsing tool and extract only text. This mean that the size of data will be at least twice reduced(because of html tags). Can you give me a simple file example? So that I can do some simple tests on them? Thx, home we will collaborate.

$250 USD 1 gün içinde

4,9

(100 değerlendirme)

7,2

@anuyadav1

i can write this in python multithreaded, prefer to run on linux.

$278 USD 7 gün içinde

4,8

(60 değerlendirme)

5,9

@nitelfreelance

Hello I have some experience on this problem. As you said the main problem here is the size of data. Otherwise reading a json file and counting words is easy. It's very good that we can be sure about memory. But you should know that size of memory that we need is not exactly equal to the size of data file. One key point is the implementation json read algorithm in that language or library. Sometimes we need 3GB~4GB memory for a 1GB data file. But if we have enough memory we can be sure that the time of processing is linear. A 3GB file consumes 3x seconds than 1GB file and it's very important for big files. Also be sure that the storage of target machine is a SSD not HDD. SSD give you at least 2 times faster file reading. I think there is a better solution dependent to the structure of your json data. The regular solution is: 1. read the file 2. parse json data 3. process data But might be we can remove parse json step. In this case we reduce two times processing of data from start to end to one time. But it dependent to your data structure and your needed results. Regarding threading: Not all threading applications use all of cpu cores. For example if you have 4 cpu cores and the application run 4 threads to process data OS might put these 4 threads to just 1 or 2 cores. The solution is using multi-process instead of multi-thread. It's same in coding and the way of running. But in multi-process we can be sure that with 4 process we will use all 4 cores. Regards Iman

$250 USD 2 gün içinde

4,9

(29 değerlendirme)

5,9

@bven09

Hi I have immense experience data processing using python and c++.I have over 12 years of experience in this field and I can deliver your project on time and within budget. Drop in a mail we can talk further. My gmail and skype handles are bg32014 Best regards, Bala

$222 USD 10 gün içinde

4,8

(9 değerlendirme)

4,8

@GeorgeKazi

Hi! I am experienced Python coder. I can develop a script to read and interpret your files quickly and efficiently. Please contact me for further discussion. Thanks

$222 USD 4 gün içinde

4,9

(18 değerlendirme)

4,0

@SatoKun

A proposal has not yet been provided

$200 USD 1 gün içinde

5,0

(1 değerlendirme)

3,6

@ahmedsobherfan

Hi, billy01 i worked with you before in the lastweek i created a c# programe to seach html files for img tags without alt attribute about the new job i can do another c# programe to parse the json files , create the histigrame then save it to csv will send you the code as usual payment after job completion waiting your answer Regards

$150 USD 3 gün içinde

5,0

(5 değerlendirme)

2,8

@chefarov

Δεν έχει γίνει καμιά πρόταση ακόμα

$105 USD 3 gün içinde

5,0

(3 değerlendirme)

2,6

@joeguo

Hello, I am glad to build the json parser app with golang for you, golang file can build a command line file which can be running on both linux and windows. Golang can use each core of the CPUs to speed up the json parser. Regards, Joe

$166 USD 4 gün içinde

0,0

(0 değerlendirme)

1,7

@oluochjoshua

I am proficient analyst using R, C++(I like using it together with R) and Python. I would like to help you in writing the code(s) that would help you use the analyse the json files in R, with objects such as ff to help in memory usage and easy writing in word. I would, if given the job, use c++ codes to hasten the codes and reduce immensely the computation time while taking care of the RAM usage with use of ff objects.

$155 USD 3 gün içinde

0,0

(0 değerlendirme)

0,0

@waqasmoeedlx

A proposal has not yet been provided

$244 USD 6 gün içinde

0,0

(0 değerlendirme)

0,0

@bradykelly69

Sounds like a great project. Multiprocess/multithread for the load, and the analysis is really easy. How do you want the output, another json file, text file, or database? I can do all of them perfectly.

$155 USD 5 gün içinde

0,0

(0 değerlendirme)

0,0

@ei04004

I made a low bid because I need to gain reputation here and I don't want to spend money on doing so. Having that in attention I think I would be a very good fit for the task you asked. I think I am more than qualified to do the task and the low bid is the advantage I can offer you. I hope you are interested, Best regards, André Meneses

$115 USD 3 gün içinde