Devam Ediyor

504047 Fuzzy Text Match between 2 Firm Names ( about 50 chars)

Hi, I'd like to hire someone to perform a "fuzzy" text match on two datasets. that is, I have a set of firm names in Data A (as well as "state" and "month") that I'd like to match to a set of firm names in Data B (as well as "state" "month" and "day"). So I'd like to grab the "day" in Data B for each (Firm Name, State, Month) entry.

All told, I have about 200-400k records in Data A and about 6,000 in Data B. Please note that the firm names in Data A are "messy". That is, there may be odd spacing, punctuation, some abbreviated, and misspelled words. In a few cases, the text field may inadvertantly contain additional fields (such as a city name). Data B is fairly clean.

I'm flexible, but I'd like to some kind of "probability score" that indicates the strength of the match, as well as the best matched day. I'd like some manual checks to be done to ensure that the matches with high scores look correct.

Specifically, the data looks like this:

DATA A:

Firm Name (and clean State Month too)

123. ACEP

124. church ville inn

125. Metro Metropolitan State Hospital

126. WNA Wealth Advisors; Inc

127. Valley Plating Works;Inc; Commerce CA

128. Ferrara Fire Apparatus; Inc

129. Guest House Inn

130. BUOT STUDIO; L.L.C

131. AIAA DESIGN/BUILD/FLY TEAM AT AUBURN UNIVERSITY

132. G & B Anderson; Inc

133. Loss Mit Rep

134. Kerr Drug Inc

135. PHH Mortgage (Randstad)

I'd probably limit the firm fields to text fields of 50 characters or so.

Finally, I'd like to have the underlying software code so that I can run the algorithm in the future on future data.

Thanks!

Beceriler: Her şey Kabul, PHP

Daha fazlasını görün: valley design, the hire firm, the algorithm design manual, randstad works, randstad, phh mortgage, metropolitan state university, metro on hire, may anderson, match 3 algorithm, hire someone for a day data entry, hire firm, hire and fire, high match, fire hire, ferrara fire, drug design, design text firm, design build fly, data entry randstad, c look algorithm, can 0 be a probability, best match algorithm, all-metro, all metro

İşveren Hakkında:
( 0 değerlendirme )

Proje NO: #2249971