Reliability and Validity in Assessment

Reliability and validity are important concepts in assessment, however, the demands for reliability and validity in SLO assessment are not usually as rigorous as in research. While you should try to take steps to improve the reliability and validity of your assessment, you should not become paralyzed in your ability to draw conclusions from your assessment results and continuously focus your efforts on redeveloping your assessment instruments rather than using the results to try and improve student learning. Instead, be mindful of your assessment’s limitations, but go forward with implementing improvement plans.

Reliability

Reliability is the extent to which a measurement tool gives consistent results. It does not have to be right, just consistent. Student learning throughout the program should be relatively stable and not depend on who conducts the assessment. Issues with reliability can occur in assessment when multiple people are rating student work, even with a common rubric, or when different assignments across courses or course sections are used to assess program learning outcomes.

Reliability can be improved by:

More clearly defining SLOs
Agreeing on how SLO achievement will be measured
Providing guidelines for constructing assignments that will be used to measure SLO achievement
Developing better rubrics. Develop well-defined scoring categories with clear differences in advance. Revisit these often while scoring to ensure consistency.
Create or gather and refer to examples that exemplify differences in scoring criteria.
Testing rubrics and calculating an interrater reliability coefficient. Interrater reliability = number of agreements/number of possible agreements. Values > 0.8 are acceptable.
Conducting norming sessions to help raters use rubrics more consistently. Recalculate interrater reliability until consistency is achieved.
Increase the number of questions on a multiple choice exam that address the same learning outcome.

Validity

Validity is the extent to which a measurement tool measures what it is supposed to. More specifically, it refers to the extent to which inferences made from an assessment tool are appropriate, meaningful, and useful (American Psychological Association and the National Council on Measurement in Education). In order to be valid, a measurement must also and first be reliable.

Validity is often thought of as having different forms. Perhaps the most relevant to assessment is content validity, or the extent to which the content of the assessment instrument matches the SLOs. Content validity can be improved by:

Examining whether rubrics have extraneous content or whether important content is missing
Constructing a table of specifications prior to developing exams
Performing an item analysis of multiple choice questions
Constructing effective multiple choice questions using best practices (see below)

Haladyna, Downing, and Rodriguez (2002) provide a comprehensive set of multiple choice question writing guidelines based on evidence from the literature, which are aptly summarized with examples by the Center for Teaching at Vanderbilt University (Brame, 2013).

Writing Effective Multiple Choice Questions

The question’s stem should:

Be meaningful by itself
Contain only relevant material
Avoid being complex
Be a question or partial sentence that avoids the use of beginning or interior blanks
Avoid being negatively stated unless SLOs require it

Alternatives (answer choices) should be:

Plausible
Stated clearly and concisely
The same in content (have the same focus)
Mutually exclusive
Free of “none of the above” and “all of the above”
Presented in a logical order
Have grammar consisted with the stem
Be parallel in form (e.g. don’t make the answer “too long to be wrong”)
Be similar in length and language

Resources

Brame, C. (2013). Writing good multiple choice test questions.
DePaul University Center for Teaching & Learning. Methodology.
Florida Center for Instructional Technology. Classroom Assessment.
Haladyna, Downing, S. M., & Rodriguez, M. C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education, 15(3), 309–333.
Salkind, N. J. (2017). Tests & measurement for people who (think they) hate tests & measurement. 3rd ed. Thousand Oaks, Calif: SAGE Publications. Chapters 3-4.

Assessment and Planning

Reliability and Validity in Assessment

Reliability

Validity

Writing Effective Multiple Choice Questions

Resources