String classification - Create a code to guess ethnicity of person based on their name

I have dataset of 500 000 over rows of data. Each row has the following columns of data:

1) a person's name (first name OR last name),

2) this person's ethnicity. There are a total 18 different ethcnicity groups in the data. Such as "english", "chinese", "russian", "indian", "german", "latino" etc.

3) popularity of this name among this ethnicity. That is, how many times our system has detected this name of a person and this person was from this ethnicity group.

4) popularity of this name among other ethnicity. That is, how many times our system has detected this name of a person and this person was from other ethnicity group.

A sample of the dataset is attached to this task as a CSV file.

Your job is to create a program or a script that will take as an input a name of a person (first name and last name, or only one name element) and it will output its guess as to what ethnicity group this person belongs to based on the training of the dataset AND a confidence or probability number which tells how sure the system is of this ethnicity being the correct answer.

For example, should the program receive input "john smith", it should output ethnicity class "english" and a confidence number as to how sure the system is "john smith" is "english".

Thus, this is basically a kind of classic string classification problem.

The code must be implemented in a way it can guess the ethnicity of person whose name does not exist in the dataset. In other word, the code must be some sort of learning system (such as artificial neural network system which has been trained using the sample dataset), OR it uses other ways to extract traits from the names which hint as to which ethnicity a person most likely is, for example that of n-grams, bayesian analysis or something else.

The code must not be simply a search algorithm which searches the dataset against hits and in case there are no hits (e.g. if name 'john' does not exist in the dataset but user input is 'john', the system cannot produce any guess that 'john' sounds like "english" name).

The code should be done in either PHP 7.x, or in a way it can be called from PHP script (e.g. Perl or Python script, for example).

In your bid, please tell me what kind of method you would use.

Beceriler: Algoritma, Yapay Zeka, Makine Öğrenimi, Örüntü Eşleme

Daha fazlasını gör: guess nationality by name, python name ethnicity, ethnicity estimate based on name, ethnicolr, name-ethnicity classification from open sources, predict ethnicity from name, names based on race, name ethnicity classifier, classification tree code, create activex remote assistance web based, create code online orders fax, create code checkout paypal, software can create code, php amp sql create code, create code page, whats code find certain person posted bulletin myspace, create selenium php scripts web based, create windows mobile application web based aspnet2, source code net rdp vnc based, create code decoder

İşveren Hakkında:
( 618 değerlendirme ) Turku, Thailand

Proje NO: #17753084



Hello, I have worked with NLTK problems before, so I believe the job won't be a problem for me. Please check my profile and feel free to ask me anything! I would use Neural network for predictor. Regards Žiga

%selectedBids___i_period_sub_7% gün içinde 222%project_currencyDetails_sign_sub_9% %project_currencyDetails_code_sub_10%
(7 Değerlendirme)

Bu iş için 19 freelancer ortalamada $209 teklif veriyor


--Very Nice Job. Professional Algorithm, Artificial Intelligence, Machine Learning, Pattern Matching expert. Best result in time----- [login to view URL] I read your description very carefully. I am very interesting for you Daha Fazla

in %bids___i_period_sub_35% gün içinde250%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(58 Değerlendirme)

I have a good hands on working with Advanced R and Python and BI tools and technologies, AI, Big Data. I have quite a good knowledge of DL/ML Algorithm , have also developed Dashboards and Web Applications using flask/ Daha Fazla

in %bids___i_period_sub_35% gün içinde250%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(20 Değerlendirme)

Hi I am a very experienced statistician, data scientist and academic writer. I have completed several PhD level thesis projects involving advanced statistical analysis of data. I have worked with data from several comp Daha Fazla

in %bids___i_period_sub_35% gün içinde250%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(18 Değerlendirme)

Hi There! I'm interesting your project very well. I am a full time devloper and can work more than 50 hours in a week. I am good at PHP and I'm a good Software Enginner. I have good experience about optimizing and Daha Fazla

in %bids___i_period_sub_35% gün içinde200%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(27 Değerlendirme)

Hello, Greetings of the day.!! Your project attracted my attention at first glance, because I've really rich experience in Machine Learning & Python Programming. I am having 5+ year of experience in Data Science Daha Fazla

in %bids___i_period_sub_35% gün içinde155%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(34 Değerlendirme)

Feel fee to contact me for String [login to view URL] me message to discuss further more details .We provide the commments,images,videos,demos and live sessions in order to help the [login to view URL] payment only after Daha Fazla

in %bids___i_period_sub_35% gün içinde150%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(13 Değerlendirme)

I will be using python and a Long short term memory to learn character sequences. I am a data scientist and am proficient at implementing machine learning models and deep learning based models, both in R and python. I Daha Fazla

in %bids___i_period_sub_35% gün içinde250%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(16 Değerlendirme)

We have 11+ years of experience in software development. We have developed 400+ projects and the research paper in the field of Machine Learning, Artificial Intelligence and Image processing (GIS), Network, SEO based W Daha Fazla

in %bids___i_period_sub_35% gün içinde500%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(10 Değerlendirme)

hii sir How are you doing I have good experience in this field and i can do your work in best possible way, kindly text me so that we can discuss the work in more details thanks ...........

in %bids___i_period_sub_35% gün içinde155%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(3 Değerlendirme)

Hello sir. I'm excited about your project, because I've really rich experience in String Classification Programming. I've developed many projects similar to yours and excellent skills. If you award me, I'll provide Daha Fazla

in %bids___i_period_sub_35% gün içinde155%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(6 Değerlendirme)

Hello! Your problem can easily be solved with an artificial neural network, giving the strings as inputs and the ethnicity as labels. This is how I would solve your problem, and I will gladly help you with it, so Daha Fazla

in %bids___i_period_sub_35% gün içinde160%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(3 Değerlendirme)

hi sir i am computer engineer an as well a certified labview developer so i am intersted to do that in Labview and in general i am intersted in this type of work so if you accept we can cooperate

in %bids___i_period_sub_35% gün içinde200%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(1 Yorum)

I WOULD TAKE 4DAYS TO GIVE YOU THE DESIRED RESULT. As a machine learner/data miner, I would use every technique of supervised learning to train your data, get the best algorithm for predicting the names and output Daha Fazla

in %bids___i_period_sub_35% gün içinde233%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(1 Yorum)

I have completed my Bachelor's degree in Engineering in Electrical and electronics. I have 3 years of work experience. I have worked with Infosys for a year and with various startups for two years. Because of my experi Daha Fazla

in %bids___i_period_sub_35% gün içinde222%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)
in %bids___i_period_sub_35% gün içinde222%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Wow! Good challenge! I love it,so i start to do it now and after complete, I will inform you to give me dataset to evaluate [login to view URL] I will use NN , but more important thing than that, i'm thinking about inputs!It Daha Fazla

in %bids___i_period_sub_35% gün içinde133%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Hi, I can make it successfully. Ready to start to work right away to complete it asap. Thank you...

in %bids___i_period_sub_35% gün içinde155%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)

Hello, My name is Alexey, I am a Python expert, despite there are only architectural and design works in my profile. I can make this program for you, but unfortunately it will be strongly based on the database yo Daha Fazla

in %bids___i_period_sub_35% gün içinde111%project_currencyDetails_sign_sub_37% %project_currencyDetails_code_sub_38%
(0 Değerlendirme)