OAERS

Assessment and Measuring

What is Assessment?

Assessment refers to a systematic process of collecting information from individuals to inform their status with respect to particular traits of interest. Within education and the social sciences, assessment is most often conducted using tests, exams, surveys, and inventories. For example, a teacher may use a test for the assessment of student knowledge of algebra.In educational settings where assessments are used to provide information about students’ knowledge, skills, and abilities, assessments can assume many different forms. Such forms include paper-and-pencil tests, computer-based adaptive tests, performance-based rated tasks, and portfolios. In recent years, computer-based assessments have gained popularity, including computer adaptive tests whereby the assessment is tailored to the examinee’s current estimated level of proficiency as the examinee progresses through the assessment.

The results of assessments used in educational contexts is often classified into two broach forms – formative or summative – depending on the intended purpose of the assessment process. Formative assessment refers to assessment processes that are intended to inform strengths and weaknesses of students’ knowledge, skills, and abilities, and is typically used to inform instruction, remediation, and educational placements.  In contrast, summative assessment refers to assessment processes used to evaluate a student’s status on the targeted trait at the completion of an instructional unit.

Although the term assessment historically has been applied primarily to the context of student learning outcomes in Kindergarten to 12th grade (K-12) settings, the past several decades has seen growing use of the term in relation to student learning outcomes in colleges and universities. The use of assessment in higher education is used to provide evidence of the extent to which academic programs (e.g., Bachelor’s program, Masters program) are leading to the intended student learning outcomes, and generate formative information about how to improve academic programs.

What is Measurement?

A notable component of the assessment process is measurement, which is the procedure used to assign numerical values to levels of the trait targeted by the assessment. The field of measurement is sometimes referred to as the measurement sciences, and encompasses the theory, models, and methods used to assign numeric values to the levels of the target trait, as well as the methods used to evaluate the quality of the measurement process.

Assigning numeric values to levels of a trait based on the responses to a series of items or tasks involves numerous considerations that all fall under the umbrella of measurement. One issue concerns how to best estimate the respondent’s numeric value on the targeted trait. Another issue involves the scale or metric on which to place the numeric values (e.g., one can assign scores on a range of 0 to 50, 200 to 800, etc.). A common consideration for assessments having multiple forms across a timeline (e.g., the SAT has multiple forms) is how to align the numeric scale from one form to the next so that a given value has the same meaning across all forms, which is a process known as equating. Also important is the evaluation of the psychometric properties (e.g., difficulty and discrimination) of the items of the test, as the psychometric properties of items play a fundamental role in determining the overall quality of the measurement process. Equally important is consideration of how well the generated scores lead to appropriate inferences about the respondent’s level of the targeted trait, which concerns the issues of reliability and validity.

The process of measurement is conducted using a measurement model. The measurement model serves many purposes, including estimating the respondent’s value on the target trait, evaluating the properties of items and tests, and evaluating the quality of the measurement process. Two measurement models are in common use: Classical Test Theory (CTT) and Item Response Theory (IRT).  CTT is the older of the two approaches, and is based on a relatively simple modeling approach for assigning numeric values to levels of the targeted trait. IRT has gained popularity over the past 40 years, and is now the dominant measurement framework used for large-scale testing programs. IRT is based on establishing a separate model for each item that expresses the probability of observing each outcome of an item as a function of the trait targeted by the assessment.