Data Analysis

  • Durum: Closed
  • Ödül: $100
  • Alınan Girdiler: 25
  • Kazanan: pshyvw

Yarışma Özeti

Taking the attached CSV file (or any other CSV in similar format) can you provide a solution that accepts a percentage parameter and have it available in executable fashion so that I can run it in the future.

The solution needs to do the following:

1. Delete all rows where COUNTRY = ANTEPOST

2. Delete all rows where FULL_DESCRIPTION contains 'Place Market' or 'Odds' or 'Forecast' or 'Specials' or 'Match' or 'Distance' or 'Without'

3. Sort by COUNTRY (column D), EVENT_ID, IN_PLAY DESC, LATEST_TAKEN

4. For each EVENT_ID do the following:

a. Delete all rows where LATEST_TAKEN is 2 or more minutes before the SCHEDULED_OFF (i.e. if SCHEDULED_OFF is 31/03/2014 13:30 then any times before 31/03/2014 13:29)
b. Determine the (first if more than one) SELECTION with the lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Arab Spring with ODDS of 1.7 in the race whose EVENT_ID is 113511428). This will be called the FAVOURITE.
c. Delete all other rows where IN_PLAY = PE
d. Determine the (first) SELECTION with the (same or) next lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Mr Greenspan in the race whose EVENT_ID is 113511428)
e. If there is less than a 5% difference (this is the supplied parameter) between the two ODDS for 4b and 4d, then delete all other rows apart from 4b and 4d and exit (i.e. move onto next EVENT_ID)
f. Delete all rows for the EVENT_ID where SELECTION FAVOURITE
g. Find the lowest value ODDS where IN_PLAY = IP. This will be called LOWEST_IP.
h. Delete all other rows where IN_PLAY = IP

5. Save a separate cut down version of the CSV after the above processing has taken place (i.e. each EVENT_ID should have 2 rows - one with a PE and one with an IP or two with PE if we exited at 4e above).

6. Split the cut down version of the CSV into two further files extracting the rows where EVENT = TO BE PLACED into a file of their own and saving the remaining rows in another file of their own.


03.07.2014 14:55:00 (UK)____AMENDED INSTRUCTIONS AS FOLLOWS (after swapping 4c and 4d points above and making further small tweaks)

4. For each EVENT_ID do the following:

a. Delete all rows where LATEST_TAKEN is 2 or more minutes before the SCHEDULED_OFF (i.e. if SCHEDULED_OFF is 31/03/2014 13:30 then any times before 31/03/2014 13:29)
b. Determine the (first if more than one) SELECTION with the lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Arab Spring with ODDS of 1.7 in the race whose EVENT_ID is 113511428). This will be called the FAVOURITE.
c. Determine the (first) SELECTION with the (same or) next lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Mr Greenspan in the race whose EVENT_ID is 113511428). This will be called the 2ND_FAVOURITE.
d. Delete all other rows where IN_PLAY = PE or IN_PLAY = NI (i.e. where EVENT_ID = 113511428)
e. If there is less than a 5% difference (this is the supplied parameter) between the two ODDS for 4b (1.7) and 4c (3.8), then delete all other rows apart from 4b and 4c and exit (i.e. move onto next EVENT_ID)
f. Sort by COUNTRY (column D), EVENT_ID, SELECTION and delete all rows for the EVENT_ID where SELECTION is NOT equal to FAVOURITE (determined in 4b) but keep the row where 2ND_FAVOURITE was determined
g. If COUNTRY is NOT equal to 'GB' or NOT equal to 'IRE', sort by COUNTRY (column D), EVENT_ID, ODDS and find the lowest value ODDS where IN_PLAY = IP. This will be called LOWEST_IP.
h. If COUNTRY is NOT equal to 'GB' or NOT equal to 'IRE', delete all other rows where IN_PLAY = IP (i.e. where EVENT_ID = 113511428)

5. Save a separate cut down version of the CSV after the above processing has taken place (i.e. each EVENT_ID should have 3 rows - two with a PE and one with an IP or two with PE if we exited at 4e above or COUNTRY is NOT equal to 'GB' or 'IRE').

6. Split the cut down version of the CSV into two further files extracting the rows where EVENT = TO BE PLACED into a file of their own and saving the remaining rows in another file of their own.

Aranan Beceriler

İşveren Geribildirimi

“Delighted with the end result. As the rating suggests, the quality, communication, expertise and professionalism was top notch. I\'m already thinking of the next task to give to this freelancer.”

Profil Görüntüsü Collie33, Ireland.

Genel Açıklama Panosu

  • pshyvw
    pshyvw
    • 9 yıl önce

    FYI. To see if there is a real need to sort based on multiple fields(could be time consuming for large datasets), I ran some checks. For a given EVENT_ID, every SELECTION belongs to only one COUNTRY. Multi-field sorting at 4f & 4g is not needed in my opinion(unnecessary performance hit), if we can get the records based on the other filters to arrive at that step.

    • 9 yıl önce
    1. FixTheTask
      FixTheTask
      • 9 yıl önce

      And the part appears in my proposed image as well. it shows how i made it. it does not take more than some seconds to sort them.

      • 9 yıl önce
    2. tosha5252
      tosha5252
      • 9 yıl önce

      If it does not take more than some seconds, you haven't done it correctly

      • 9 yıl önce
  • FixTheTask
    FixTheTask
    • 9 yıl önce

    I ran some checks. For a given EVENT_ID, every SELECTION belongs to only one COUNTRY.

    • 9 yıl önce
  • pshyvw
    pshyvw
    • 9 yıl önce

    Incorporated all changes. New findings are that for each EVENT_ID, output csv file has: (a) 3 records with the third rec for IN_PLAY=IP only when COUNTRY is "GB" or "IRE". OR (b) 2 records with IN_PLAY= PE or NI (c) Interestingly, has 161 entries with only one record(FAVORITE). Found that these events have very few records 'during the last minute' (sometimes only 1 record during the last minute as in say EVENT_ID=113530124) and even if more than 1 record in last minute they have only one SELECTION. For this third category, we get only FAVORITE; there is no SECOND_FAVORITE as per modified rule 4c

    • 9 yıl önce
  • pshyvw
    pshyvw
    • 9 yıl önce

    Oops: Message got posted when I hit Enter. Please ignore previous message

    • 9 yıl önce
  • pshyvw
    pshyvw
    • 9 yıl önce

    Incorporated all changes. New findings:

    • 9 yıl önce
  • haankrishan
    haankrishan
    • 9 yıl önce

    sir had u selected the winner

    • 9 yıl önce
  • Collie33
    Yarışma Sahibi
    • 9 yıl önce

    Folks, just so you know you're on the right track I've saved a CSV/XLSX file for most of the steps in the instructions (for EVENT_ID = 113511428).

    See files available from https://www.dropbox.com/sh/hh4dj7dtuhlz2r1/AAAEUCYMsV6L5msf8wLt8L36a.

    • 9 yıl önce
  • Collie33
    Yarışma Sahibi
    • 9 yıl önce

    Folks, I've just realised that only UK and Ireland have in play markets so 4d now becomes:

    d. Delete all other rows where IN_PLAY = PE or IN_PLAY = NI (i.e. where EVENT_ID = 113511428)

    g and h will not be applicable when COUNTRY is NOT equal to 'GB' or NOT egual to 'IRE'

    I'm updating the brief to reflect this and small typo in point e which I'm also updating.

    • 9 yıl önce
  • pshyvw
    pshyvw
    • 9 yıl önce

    4g. I believe the objective is to get the 'lowest ODDS' for a given EVENT_ID and IN_PLAY=IP? Was wondering what the sorting by COUNTRY, EVENT_ID and ODDS would result in. In my opinion, EVENT_ID is redundant at 4g as it is the same for all records for consideration at this stage. Further, we have deleted all records other than for FAVORITE SELECTION. A SELECTION belongs to only one COUNTRY and as such COUNTRY too would be same for all records selected for this step. Can we simply sort based on ODDS for the relevant 'subsetted' dataset at that step? Data 'subsetting' is based on EVENT_ID & IN_PLAY=IP. While this may sound like an implementation issue, need confirmation whether sorting by COUNTRY can lead to a different record being selected (not necessarily the lowest ODDS for the dataset filtered only by EVENT_ID & IN_PLAY=IP).

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      Yes, the objective is to find the lowest ODDS for the FAVOURITE. There is no need to sort if you don't think it is necessary provided you retain the correct record/row. However, a SELECTION does exist for multiple EVENT_IDs (e.g. 113511429 and 113511429). A SELECTION could also race in the UK (where COUNTRY = GB) one day and then race in Ireland (where COUNTRY = IRE) a day or two later.

      • 9 yıl önce
  • pshyvw
    pshyvw
    • 9 yıl önce

    4c. If we look at the dataset just before executing this step, the (first) SELECTION with the 'next lowest ODDS' for EVENT_ID=113511428 during the minute before SCHEDULED OFF is also Arab Spring with 1.71. Since you were expecting Mr Greenspan with 3.8, I think 4c wording (initial part) should be "Determine the 'next' (other than FAVORITE) SELECTION with the lowest ODDS during the ...". Please confirm.

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      Yes you are correct - good observation - I will amend brief to match.

      • 9 yıl önce
  • pshyvw
    pshyvw
    • 9 yıl önce

    4b. Though not mentioned, it is implied from point 5 that the FAVORITE SELECTION is from IN_PLAY=PE. For EVENT_ID=100933526, we have all IN_PLAY=NI, for which we can determine 4b & 4c though not 4g. This is because 4g clearly specifies "where IN_PLAY=IP". Can we add the "where IN_PLAY=PE" to the specification at 4b & 4c? Or alternately, how do we handle cases like 100933526 where we get the first 2 NIs (not PEs) but no IP.

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      I've just realised that only UK and Ireland have in play markets so 4d now becomes:

      d. Delete all other rows where IN_PLAY = PE or IN_PLAY = NI (i.e. where EVENT_ID = 113511428)

      g and h will not be applicable when COUNTRY is NOT equal to 'GB' or NOT egual to 'IRE'

      • 9 yıl önce
  • momo2619
    momo2619
    • 9 yıl önce

    That contest is nearly 1 a day, and if the fastest should win, so one of those freelancer could finish it since yesterday.
    Shall I continue or stop here? just one answer. Continue or not.

    • 9 yıl önce
    1. momo2619
      momo2619
      • 9 yıl önce

      Just to update you, the application will be fast and simple.
      if more details needed please let me know. like the interface or anything else.

      • 9 yıl önce
    2. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      I haven't given any thought to interface but may have some comments when I see it.

      • 9 yıl önce
  • bstoinev
    bstoinev
    • 9 yıl önce

    How about new button in the ribbon that allows you to do this?

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      I don't understand your comment - ribbon?

      • 9 yıl önce
  • artej11
    artej11
    • 9 yıl önce

    Is mathematica script fine?( you will need mathematica to run it)

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      Can I download mathematica for free? If so, then it should be okay.

      • 9 yıl önce
  • artej11
    artej11
    • 9 yıl önce

    also what does f. Delete all rows for the EVENT_ID where SELECTION FAVOURITE mean?

    • 9 yıl önce
    1. artej11
      artej11
      • 9 yıl önce

      Is it just delete all favourites?

      • 9 yıl önce
    2. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      Typo in original instructions. See amended brief where f is now:

      f. Sort by COUNTRY (column D), EVENT_ID, SELECTION and delete all rows for the EVENT_ID where SELECTION is NOT equal to FAVOURITE (determined in 4b) but keep the row where 2ND_FAVOURITE was determined

      • 9 yıl önce
  • pshyvw
    pshyvw
    • 9 yıl önce

    If we go by the specified sequence at 4c all PE records would get deleted and there is no way we can get another PE at 4d - would always be PE and IP

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      See amended brief as 4c and 4d were in the wrong order previously.

      • 9 yıl önce
  • pshyvw
    pshyvw
    • 9 yıl önce

    In 4b, how do we interpret "during the minute before the SCHEDULED_OFF" - does this mean that SCHEDULED_OFF - LATEST_TAKEN should be 0 or 1 as 2 and more are deleted?

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      Perform some date/time comparison and only choose the minute before the off. So of the off time was 13:30 then the minute before would be 13:29. I'm only interested in rows that contain 13:29.

      • 9 yıl önce
  • mfvonh
    mfvonh
    • 9 yıl önce

    Would you accept a Python script?

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      Assuming I can easily run it on a Windows environment then yes.

      • 9 yıl önce
  • Collie33
    Yarışma Sahibi
    • 9 yıl önce

    Contest brief updated to reflect new amendments to instructions.

    • 9 yıl önce
  • Collie33
    Yarışma Sahibi
    • 9 yıl önce

    5. Save a separate cut down version of the CSV after the above processing has taken place (i.e. each EVENT_ID should have 3 rows - two with a PE and one with an IP or two with PE if we exited at 4e above).

    6. Split the cut down version of the CSV into two further files extracting the rows where EVENT = TO BE PLACED into a file of their own and saving the remaining rows in another file of their own.

    • 9 yıl önce
  • Collie33
    Yarışma Sahibi
    • 9 yıl önce

    c. Determine the (first) SELECTION with the (same or) next lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Mr Greenspan in the race whose EVENT_ID is 113511428). This will be called the 2ND_FAVOURITE.
    d. Delete all other rows where IN_PLAY = PE (i.e. where EVENT_ID = 113511428)
    e. If there is less than a 5% difference (this is the supplied parameter) between the two ODDS for 4b (1.7) and 4c (3.8), then delete all other rows apart from 4b and 4d and exit (i.e. move onto next EVENT_ID)
    f. Sort by COUNTRY (column D), EVENT_ID, SELECTION and delete all rows for the EVENT_ID where SELECTION is NOT equal to FAVOURITE (determined in 4b) but keep the row where 2ND_FAVOURITE was determined
    g. Sort by COUNTRY (column D), EVENT_ID, ODDS and find the lowest value ODDS where IN_PLAY = IP. This will be called LOWEST_IP.
    h. Delete all other rows where IN_PLAY = IP (i.e. where EVENT_ID = 113511428)

    • 9 yıl önce
  • Collie33
    Yarışma Sahibi
    • 9 yıl önce

    Folks,

    I spotted an error in my instructions which may answer some of your questions. I've slightly amended the instructions which basically swap points 4c and 4d. There are some other tiny tweaks which should help explain the scenario. Amended instructions as follows (in separate messages due to 1000 character limit):

    4. For each EVENT_ID do the following:

    a. Delete all rows where LATEST_TAKEN is 2 or more minutes before the SCHEDULED_OFF (i.e. if SCHEDULED_OFF is 31/03/2014 13:30 then any times before 31/03/2014 13:29)
    b. Determine the (first if more than one) SELECTION with the lowest ODDS during the minute before the SCHEDULED_OFF (i.e. Arab Spring with ODDS of 1.7 in the race whose EVENT_ID is 113511428). This will be called the FAVOURITE.

    • 9 yıl önce
  • anujbatham
    anujbatham
    • 9 yıl önce

    wait for my entry

    • 9 yıl önce
  • ACMAuk
    ACMAuk
    • 9 yıl önce

    Stay tuned- Almost complete.........

    • 9 yıl önce
  • shenghuan
    shenghuan
    • 9 yıl önce

    1. Delete all rows where COUNTRY = ANTEPOST --- There were even no country named ANTEPOST ( or list of ANTEPOST provided).

    • 9 yıl önce
    1. shenghuan
      shenghuan
      • 9 yıl önce

      Thank you.

      • 9 yıl önce
    2. soniapathania
      soniapathania
      • 9 yıl önce

      hi..is these employers seeking data entryjobs real..because they are asking for registration fee.

      • 9 yıl önce
  • FixTheTask
    FixTheTask
    • 9 yıl önce

    Please wait for something

    • 9 yıl önce
  • momo2619
    momo2619
    • 9 yıl önce

    I am nearly finished the job itself. But if you need it in an executable,i am creating currently a c# application to do the task,

    • 9 yıl önce
  • ACMAuk
    ACMAuk
    • 9 yıl önce

    Please wait for something VERY EXCELLENT!

    • 9 yıl önce
  • shenghuan
    shenghuan
    • 9 yıl önce

    Does the data set only contain IN_PLAY = IP or IN_PLAY = PE records?

    • 9 yıl önce
    1. Collie33
      Yarışma Sahibi
      • 9 yıl önce

      For each EVENT_ID you should have one row where IN_PLAY = IP and one row where IN_PLAY = PE.

      The row where IN_PLAY = PE should be a minute before the scheduled off time and the row where IN_PLAY = IP should be the lowest odds figure for the where SELECTION_ID is the same as that for the PE row.

      • 9 yıl önce
  • shenghuan
    shenghuan
    • 9 yıl önce

    Your description is not clear. It does not mention if IN_PLAY = other should be kept.

    • 9 yıl önce
  • shenghuan
    shenghuan
    • 9 yıl önce

    My feeling is freelancer is only for logo design. So only images are allowed to be submitted.\

    • 9 yıl önce
  • shenghuan
    shenghuan
    • 9 yıl önce

    It only allow to submit figures. I change the .csv to .gif or ,png, but cannot submit.

    • 9 yıl önce
  • shenghuan
    shenghuan
    • 9 yıl önce

    An error occurred, please try again later.

    • 9 yıl önce

Daha fazla yorum göster

Yarışmalara nasıl başlanır

  • Projenizi ilan edin

    Yarışmanızı İlan Edin Hızlı ve kolay

  • Tonlarca girdi alın

    Tonlarca Girdi Alın Bütün dünyadan

  • En iyi girdiyi seçin

    En iyi girdiyi seçin Dosyaları indirin - Kolay!

Şimdi bir Yarışma İlan Et ya da Bugün Bize Katılın!