Reliability and Validity in Assessment

Reliability and validity are important concepts in assessment, however, the demands for reliability and validity in SLO assessment are not usually as rigorous as in research. While you should try to take steps to improve the reliability and validity of your assessment, you should not become paralyzed in your ability to draw conclusions from your assessment results and continuously focus your efforts on redeveloping your assessment instruments rather than using the results to try and improve student learning. Instead, be mindful of your assessment’s limitations, but go forward with implementing improvement plans.


Reliability is the extent to which a measurement tool gives consistent results. It does not have to be right, just consistent. Student learning throughout the program should be relatively stable and not depend on who conducts the assessment. Issues with reliability can occur in assessment when multiple people are rating student work, even with a common rubric, or when different assignments across courses or course sections are used to assess program learning outcomes.

Reliability can be improved by:

  • More clearly defining SLOs
  • Agreeing on how SLO achievement will be measured
  • Providing guidelines for constructing assignments that will be used to measure SLO achievement
  • Developing better rubrics. Develop well-defined scoring categories with clear differences in advance. Revisit these often while scoring to ensure consistency.
  • Create or gather and refer to examples that exemplify differences in scoring criteria.
  • Testing rubrics and calculating an interrater reliability coefficient. Interrater reliability = number of agreements/number of possible agreements. Values > 0.8 are acceptable.
  • Conducting norming sessions to help raters use rubrics more consistently. Recalculate interrater reliability until consistency is achieved.
  • Increase the number of questions on a multiple choice exam that address the same learning outcome.


Validity is the extent to which a measurement tool measures what it is supposed to. More specifically, it refers to the extent to which inferences made from an assessment tool are appropriate, meaningful, and useful (American Psychological Association and the National Council on Measurement in Education). In order to be valid, a measurement must also and first be reliable.

Validity is often thought of as having different forms. Perhaps the most relevant to assessment is content validity, or the extent to which the content of the assessment instrument matches the SLOs. Content validity can be improved by:

  • Examining whether rubrics have extraneous content or whether important content is missing
  • Constructing a table of specifications prior to developing exams
  • Performing an item analysis of multiple choice questions
  • Constructing effective multiple choice questions using best practices (see below)

Haladyna, Downing, and Rodriguez (2002) provide a comprehensive set of multiple choice question writing guidelines based on evidence from the literature, which are aptly summarized with examples by the Center for Teaching at Vanderbilt University (Brame, 2013).

Writing Effective Multiple Choice Questions

The question’s stem should:

  • Be meaningful by itself
  • Contain only relevant material
  • Avoid being complex
  • Be a question or partial sentence that avoids the use of beginning or interior blanks
  • Avoid being negatively stated unless SLOs require it

Alternatives (answer choices) should be:

  • Plausible
  • Stated clearly and concisely
  • The same in content (have the same focus)
  • Mutually exclusive
  • Free of “none of the above” and “all of the above”
  • Presented in a logical order
  • Have grammar consisted with the stem
  • Be parallel in form (e.g. don’t make the answer “too long to be wrong”)
  • Be similar in length and language