To obtain the validity and reliability of the essay test I am constructing, I have to see it first from the defintion of validation and reliabilty first.
Validation is the process of accumulating evidence that supports the appropriateness of the inferences made from student responses for the test. To make sure that the essay test I gave to students are valid, there are two things to do:
First, I clearly state the purpose and objectives of the test. For example, the objectives of my essay test are:
1/ students write an organized paragraph
2/ students show logical development of ideas
3/ students use correct grammar and mechanics
4/ students demonstrate style and quality of expression.
By writing those objectives, I can ensure the content validity because the test clearly defines the achievement that I measure.
Next, I develop scoring criteria that address each objective. If one of the objectives is not represented in the score categories, then the rubric doesn’t give necessary evidence to examine the given objective. If some of the scoring criteria are not related to the objectives, then, the appropriateness of the assessment and the rubric is in question. Scoring rubric meets criterion validity, since I will know precisely the extent of test criteria that have been actually reached by my students. Here is my scoring rubric:
|Essay has an effective introductory paragraph|
|Topic sentence/thesis statement is stated|
|Essay has apparent body paragraphs|
|Essay has a satisfactory concluding paragraph|
|Ideas are concrete and well developed|
|Supporting details are relevant and sufficient|
|Essay reflects complete thought (cohesiveness)|
|Essay demonstrate syntactic variety and rhetorical fluency|
|Essay uses correct English writing conventions (punctuation and spelling)|
|Essay uses a wide range of vocabulary|
|Essay show good register and concise|
|Essay is written in neat and legible format|
Although the criteria in the rubric seem too detailed, all of them are related to the four objectives mentioned above. Using detailed scoring rubric ensures the validity of my assessment. Indeed, good grading practices can also increase the reliability of essay tests . Also, as far as I am concerned, a valid assessment is by necessity reliable.
Reliability refers to the consistency of assessment scores. If my test is reliable, my student will get the same score regardless of when he/she completed the test, when the response was scored, and who scored the response. Two forms of reliability in classroom assessment and in rubric development involve rater (or scorer) reliability. Rater reliability generally refers to the consistency of scores that are assigned by two independent raters (interrater reliability) and that are assigned by the same rater at different points in time (intra-rater reliability). Sometimes I use interrater reliability by assigning my Teaching Assistant (TA) to score my students essays using the scoring rubric I prepared previously, or by grading papers together (me and TA) for clarity of evaluation and time efficiency. This will check whether there is great discrepancy of TA’s scoring and mine or not. If our scorings do not show great discrepancy, I can say that my test is reliable. Other times, I can also make intrarater reliability by scoring again the works several weeks later. If the previous scores do not show big discrepancies with the second scoring, then my evaluation is reliable.