How to Obtain the Validity and Reliability of an Essay Test?

Woles...ada test
Woles…ada test

To obtain the validity and reliability of  the essay test I am constructing, I have to see it first from the defintion of validation and reliabilty first.

Validation is the process of accumulating evidence that supports the appropriateness of the inferences made from student responses for the test. To make sure that the essay test I gave to students are valid, there are two things to do:

First, I clearly state the purpose and objectives of the test. For example, the objectives of my essay test are:

1/ students write an organized paragraph

2/ students show logical development of ideas

3/ students use correct grammar and mechanics

4/ students demonstrate style and quality of expression.


By writing those objectives, I can ensure the content validity because the test clearly defines the achievement that I measure.

Next, I develop scoring criteria that address each objective. If one of the objectives is not represented in the score categories, then the rubric doesn’t give necessary evidence to examine the given objective. If some of the scoring criteria are not related to the objectives, then, the appropriateness of the assessment and the rubric is in question.  Scoring rubric meets criterion validity, since I will know precisely the extent of test criteria that have been actually reached by my students.  Here is my scoring rubric:





Essay has an effective introductory paragraph
Topic sentence/thesis statement is stated
Essay has apparent body paragraphs
Essay has a satisfactory concluding paragraph
Ideas are concrete and well developed
Supporting details are relevant and sufficient
Essay reflects complete thought (cohesiveness)
Essay demonstrate syntactic variety and rhetorical fluency
Essay uses correct English writing conventions (punctuation and spelling)
Essay uses a wide range of vocabulary
Essay show good register and concise
Essay is written in neat and legible format


Although the criteria in the rubric seem too detailed, all of them are related to the four objectives mentioned above. Using detailed scoring rubric ensures the validity of my assessment. Indeed, good grading practices can also increase the reliability of essay tests . Also, as far as I am concerned, a valid assessment is by necessity reliable.


Reliability refers to the consistency of assessment scores. If my test is reliable, my student will get the same score regardless of when he/she completed the test, when the response was scored, and who scored the response. Two forms of reliability in classroom assessment and in rubric development involve rater (or scorer) reliability. Rater reliability generally refers to the consistency of scores that are assigned by two independent raters (interrater reliability) and that are assigned by the same rater at different points in time (intra-rater reliability). Sometimes I use interrater reliability by assigning my Teaching Assistant (TA) to score my students essays using the scoring rubric I prepared previously, or by grading papers together (me and TA) for clarity of evaluation and time efficiency. This will check whether there is great discrepancy of TA’s scoring and mine or not. If our scorings do not show great discrepancy, I can say that my test is reliable. Other times, I can also make intrarater reliability by scoring again the works several weeks later. If the previous scores do not show big discrepancies with the second scoring, then my evaluation is reliable.

Steps in Constructing My Language Test


I’d like to share some steps in constructing a language test. Those are based on my experience in teaching Diploma III at STAN Jakarta. Some strategies are taken from Brown’s principles, with certain modification. Here they are:

First of all, I determine the objective of the test beforehand. For example, when I give a reading test, the purpose of it to test their reading skills, which can be seen from their abilty to deal with the types of reading questions.

Second, I enlist the test specifications as the outline of the test. This includes the time allocation, the skills tested, item types and tasks.

Third, I create the test tasks. The test tasks must be in line with objectives stated earlier. For example, I choose Multiple Choice  format for Reading Test for practicality purpose. I make the first draft of test items and tried them out in classroom teaching before administering them in the actual test. For the final version for the actual Reading Test, I make the parallel forms of the sample test.

Finally, I make the scoring criteria and feed back. For example, Reading Test items are each worth 1 point, so 100% correct answer will worth 20 points. Then each student’s score will be graded according to the letter grade of A, B, C, D, and E with certain comments of Excellent, Very Good, Good, OK, and Try to be More Careful Next Time. I also have to provide a brief information to student on which reading skill he/she has to improve.


To be precisely clear, the following is the summary example of my test construction for my students:
Type of test    :  Reading Test


Objective        :  1/ Students recognize the main idea

2/ Students know the details

3/ Students know the meaning from context


Specification  :  –    30 minutes, multiple choice format, 20 total items.

–          tasks: main idea, details, vocabulary


Sample Items

Directions        :  This test contains several passages, each followed by a number of questions. Read the passages, and for each question, choose the best answer—A, B, C, or D—based on what is stated in the passage or on what can be inferred from the passage.


(Passage 1)

When the people of Boston threw the tea from the English ships into the ocean, few of them, if any, realized that this was the start of the War of Independence. This war was a long and bloody struggle exacerbated by politicians out for personal gain.



  1. What is the best title for this passage?
    1. The Boston Tea Party
    2. The War of Independence
    3. American Politicians
    4. English Ship at War


  1. The word exacerbated in line 3 can be best replaced by ….
    1. aggravated
    2. exaggerated
    3. stopped
    4. continued


  1. What is the impetus of the War of Independence?
    1. The Boston politicians started a fight with Englishmen
    2. The people of Boston dumped English tea to the sea
    3. There was lack of tea at Boston
    4. People did not believe in the deal made by Boston politicians


Scoring Criteria: –   Each item is worth 1 point;  maximum total correct answer is 20 (=100%)

–           Grades for reading performance: 85-100% = A; 70-84% = B;  60-69% =  C; 50-59% = D; below 49% = E

–          Give comments on which skills need to be improved


Hopefully this sharing will give you some insight on language assessment. See ya! *=*