You are provided access to hand scored essays, so that you can build, train and test scoring engines against a wide field of competitors. Your success depends upon how closely you can deliver scores to those of human expert graders.
· Compare the efficacy and cost of automated scoring to that of human graders.
· Reveal product capabilities
The graded essays are selected according to specific data characteristics. On average, each essay is approximately 150 to 550 words in length. Some are more dependent upon source materials than others. This range of essay type is provided so that we can better understand the strengths of your solution. It is our intent to showcase quality and reliability, based on how well you can match expert human graders for each essay.
You need to create training data for each essay prompt. The number of training essays does vary. For example, the lowest amount of training data is 1,190 essays, randomly selected from a total of 1,982. The data should contain ASCII formatted text for each essay followed by one or more human scores, and (where necessary) a final resolved human score. Where it is relevant, you are provided with more than one human score, so that you may evaluate the reliability of the human scorers, but - keep in mind - that you will be predicting to the resolved score. Also, please note that most essays are scored using a holistic scoring rubric. However, one data set uses a trait scoring rubric. The variability is intended to test the limits of your scoring engine’s capabilities.
Following a period of 3 months to build and/or train your engine, you will be provided with test data that will contain new essays, randomly selected for blind evaluation. However, you will notice that the rater and resolved score columns will be blank. You will be asked to supply, based on your engine's predictions for each essay, your score in the resolved score column and then submit your new data set on this site.
Also, please note that it is our intention to stage other follow-on ASAP phases in the months ahead. :
· Phase 1: Demonstration for long-form constructed response (essays);
· Phase 2: Demonstration for short-form constructed response (short answers);
· Phase 3: Demonstration for symbolic mathematical/logic reasoning (charts/graphs).
In every instance, we seek to drive innovation for new solutions to automated student assessment. We hope that you will enjoy this process. May the best model win!