13 How Do I Assess?
Kimberly had great confidence in her instructional skills. She loved teaching and felt that she was pretty darn good at it or at least on her way to being good. Testing, though, mystified Kimberly. When writing, giving, scoring, and grading a test, Kimberly felt as if she was only going through the motions. She didn’t have a clear sense of why she tested, what she should test, or how she should test, no less how she should score tests and assign a course grade. As a result, Kimberly sort of winged it, doing what she believed other teachers at her school teaching similar subjects did. It worked well enough. Kimberly wasn’t bad at testing. Yet she knew she had more to learn to make testing another one of her strong suits.
Assessment
Assessment is another one of a teacher’s critical functions. Assessment is the educator’s fancy word for testing, referring to a teacher’s formal evaluation of the student’s progress toward and attainment of benchmarks or standards. Schools and their constituents, including accreditors, licensing bodies, and employers, require rigorous assessment. Schools and their constituents must know whether students are meeting standards. Schools don’t just teach; they also credential. A school’s diploma, degree, or certificate, or even just credit in a single course, certifies to the world that the student has gained the knowledge, skills, or ethics that the credential represents. Others rely heavily on school credentials. Rigorous assessment ensures the validity and reliability of the credential. Assessment, though, can also be a powerful teaching and learning tool, as this chapter discusses. The teacher with strong assessment skills is generally a very good teacher. Sharpen your assessment skills, and you’ll improve your instruction.
Forms
Educators divide assessment into two forms, summative and formative, even though the two forms can overlap. A summative assessment is one that the school uses to determine advancement. The classic summative assessment is a course’s final exam on which the teacher will base all, most, or some of the student’s grade. Any quiz, test, exam, paper, project, or other assignment the score from which counts toward the student’s final grade is also a summative assessment. Formative assessments, by contrast, do not count toward advancement but instead help the teacher and student determine where the student stands to determine what additional instruction or studies may be necessary for the student to meet the benchmark. Formative assessments can also have their own instructional impact, beyond their value for measurement. Even when relatively unprepared for testing, students learn from practice tests, in what educators call the powerful testing effect, as explained further below in a paragraph on practice tests. Don’t underestimate the value of formative assessment in addition to the summative assessments you implement to certify student knowledge and advancement.
Standards
As already indicated above, assessment measures student progress toward a benchmark or standard. The benchmarks or standards can vary as to their sources. Senior teachers may simply have in mind, from their long experience teaching their subject, what students should know and be able to do. They have, in essence, incorporated the standards into their own view of their subject. A teacher’s own view of what students should know and be able to do can have great validity, especially if the teacher has relevant experience in the related field, outside of the school. The career chemist who becomes a teacher, for instance, may have a keener view of what the practice of chemistry demands than an outdated text from which the chemist-turned-teacher teaches. Standardized texts and materials, though, have the purpose of ensuring that teachers teach and assess to the standards. When teaching and testing from a standardized text, you are teaching to the text’s recognized standards. Teachers can also generally find on their own the applicable standards to which schools should be teaching their students, to which to align their instruction and assessment. State departments of education publish benchmarks at the K-12 level, while accrediting bodies, licensing bodies, and professional associations generally publish the standards applicable to programs of higher education.
Alignment
Research the standards to which you should be teaching and assessing, to be sure that your instruction and testing align with those standards. Get help from your department, curriculum director, or curriculum committee, identifying the applicable standards. You may be surprised to find gaps in your texts, materials, instruction, and testing. Instruction and assessment are relatively meaningless unless they align to some benchmark or standard. You can teach and test whatever you want, and give all your students A’s in your course. But what’s the point, if your teaching and testing completely missed the mark? Your students would know the irrelevant matters that you taught them but not what they needed to know to meet the standards and benchmarks. Standards themselves are not something that education departments, accreditors, licensing bodies, and professional associations just dream up. Instead, those who draft standards align them to the knowledge, skills, and ethics that individuals need to succeed in their fields. Teach and test to the standards, and you’ll graduate competent physicians, lawyers, engineers, electricians, musicians, historians, or whatever other professionals your instructional program trains. If you teach at the K-12 level, your teaching and testing to the standards should prepare your students to succeed at the next level. Indeed, if your teaching and testing fails to address all the standards, the teacher who teaches your same students at the next level may readily recognize and rue your omission. Aligning your teaching and testing is critical to effective instruction.
Validity
When you align your assessments to your subject’s standards, you take the first step toward valid assessing. Validity is a key measure of a test’s usefulness. Validity means that your test measures what you intend it to measure, so that its results are meaningful to your instructional goal. If you teach economics, for instance, you shouldn’t be testing engineering. A low score on an engineering test wouldn’t tell you anything about whether your economics students had learned economics. Your test items should thus evaluate whether students have learned the instructional objectives that you taught them. You should also spread your test items among multiple instructional objectives, to ensure content validity. If, instead, you choose only one instructional objective to test out of a dozen potential objectives that your students believed the test was to cover, then you would advantage and disadvantage students unfairly, depending on whether they happened to study that one objective effectively. Distribute your test items relatively widely to ensure content validity and that you don’t test unfairly.
Timing
The timing of assessments can also have a lot to do with the effectiveness of your instruction. Educators document the procrastination curve. If you only give a final exam, students tend not to study, review, and rehearse their learning throughout the term, until shortly before the final exam. If, instead, you give periodic tests throughout the term, each counting at least twenty percent toward the final grade, students will generally study, review, and rehearse for each periodic test throughout the term. Cumulatively, students will study, review, and rehearse for substantially more time and with substantially greater effort, if you test periodically, with the tests counting substantially toward the final grade. If you also give a final exam that covers the whole term, students will study, review, and rehearse the whole term’s instruction, aiding their long-term memory. Three interim tests, each worth twenty percent toward the final grade, plus a final exam worth forty percent, generally influences an optimal amount of time and effort. Consider carefully how often you test and the weight you give to tests in calculating the final grade. Quizzes worth ten percent of the final grade or less don’t generally have a similar positive impact on student studies.
Formats
You have several options in the formats that you use for assessment. Common exam formats include multiple choice, fill in the blank, true/false, matching options, short answer, long answer, and essay formats. Each of those formats has an art and science to it, developed and confirmed by psychometricians. For instance, multiple-choice questions should generally have four options, one of which is clearly correct, the other three of which are clearly incorrect but equally reasonably attractive, and all of which the options state in the positive rather than negative, without an option for “none of the above” or “all of the above.” Each exam question should also be independent from every other question, with no questions depending on a prior question and answer. The reasons for these design principles are technical but rational. Investigate and follow sound psychometric principles for your favored question format. You don’t, though, have to assess using an exam format. Your assessment could instead be a paper, project, graphic design, presentation, or anything else that enables you to measure the student’s learning. Choose the format or formats that best fit your instructional objectives and overarching course goal.
Reliability
The prior paragraph suggested the importance of drafting reliable exam questions. Reliability is a key marker of sound assessment. Your exams should produce results consistent with the competence of students whom they test. Poorly drafted questions that invite student misunderstanding and introduce errors through reasonable misinterpretation won’t reward students who have studied effectively. You can, by applying a relatively simple formula, analyze your test items for their discriminative effect, meaning whether they truly separate stronger learners from weaker learners. Ideally, a test item should positively discriminate at a reasonably high level between high and low-performing students. Test items that all students get right or get wrong don’t help with assessment. Even more so, test items that low-performing students get right and high-performing students get wrong have introduced an interpretive error producing negative rather than positive discrimination. Item discrimination around +.40 makes a good target for which to shoot. Analyze your test items for high reliability and positive discriminative effect. Lend integrity to your testing.
Objectivity
Sound assessments also need to have objectivity. Evaluation of student responses shouldn’t depend on the eye of the beholder. Whether you score the test items or have a qualified aide or assistant do so, the results should be the same. Indeed, whether you score the test items when in a good mood or bad mood, or rested and refreshed versus tired, the results should also be the same. You should be able to readily justify your scoring of test items to students, without claiming discretion to exercise your own subjective judgment. Objectifying some assessment formats can be more difficult than others. Multiple-choice, true/false, and matching questions are generally easy to objectify, to minimize ambiguities. But questions eliciting long open-ended answers, and essay questions requiring substantial analysis in a long-winded narrative, can trigger a much wider variety of responses, leaving much more room for ambiguity at the boundaries between an accurate and inaccurate, or adept or inept, answer. Yet even as to essay questions, you should state the question in a form that clearly elicits the information, analysis, and evaluation that you seek, thus objectifying your scoring. See the next chapter on assessment scoring and grading. Avoid subjectivity in scoring for its unfairness and unreliability.
Levels
Your assessments should also evaluate multiple levels of knowledge. A common error that teachers make in objectifying their quizzes, tests, and exams is to only test factual recall and basic comprehension of terms. Yet your instructional objectives likely also include higher-level skills involving elaboration, generalization, application, analysis, and evaluation. Thus, your assessments should also test those higher-level reasoning skills. You can do so, even in an objective test format like multiple-choice questions. A multiple-choice question doesn’t have to test only basic factual recall. It can instead ask students to generalize and apply knowledge to new contexts, to reach reasoned conclusions or reasonable evaluations. Short and long-answer questions, and essay questions, can be even more amenable to testing higher-level reasoning skills, while also testing spelling, grammar, organization, and other writing skills. With each assessment, ensure that you ask a range of questions from basic factual recall and comprehension up through generalization and application, all the way to analysis and evaluation.
Practice
An above paragraph in this chapter already mentioned the powerful testing effect of formative assessments. Consider adding practice tests throughout your instruction. You may, for instance, begin a unit with an ungraded pre-test that reminds students of related prior knowledge and alerts students to the nature, depth, and challenge of the new knowledge. A pre-test can show students that they don’t know as much as they think and that they need to pay attention to coming instruction. You may alternatively offer an interim practice test after a first presentation on the instructional objective, before significant additional study, for substantially the same purposes. You need not implement every practice test during class time. You may instead offer optional practice tests outside of class or required practice tests outside of class as homework, especially if your school uses an electronic learning-management system that supports online self-administered testing. Those systems often also offer automated scoring and answer explanations. Generally, the more practice testing, the better, especially when your practice tests address the priority learning in a format similar to the summative assessment that will inevitably follow.
Reflection
On a scale from one to ten, how strong do you feel your assessment skills are? What research, investigation, collaboration, or training might increase your assessment skills? On a scale from one to ten, how rich of an assessment program do you feel your instruction reflects? What additional or improved assessment would strengthen your assessment program? Can you identify the benchmarks or standards to which you teach? Have you aligned your teaching and testing to those standards? Do your test items evaluate what you intend to teach and actually teach? How many graded tests or assignments do you give that count toward the final grade, on what schedule spread across the term? Does your summative assessment schedule and its weighting spur student studies across the term, defeating the procrastination curve? Are you using the best test item formats for your instructional objectives? Can you vary the test item formats to improve assessment? Is your test, paper, and other assignment scoring objectified, minimizing your grader subjectivity? Are you testing at every comprehension level from basic recall of facts up through generalization, application, analysis, and evaluation? Do you offer abundant practice testing for all instructional objectives?
Key Points
Assessment is a critical function and skill for effective teaching.
Assessment comes in summative and formative forms, each valuable.
Assess students according to applicable benchmarks and standards.
Align your test items to cover a wide range of the applicable standards.
Ensure that your test items evaluate the objectives that you pursue.
Spread summative assessment across the term to increase study effort.
Choose the better test formats for your subject matter and objectives.
Ensure the discriminative effect and reliability of your test items.
Objectify tests to reduce subjectivity in scoring and increase reliability.
Test at every level of reasoning from recall to analysis and evaluation.
Offer abundant practice testing, both required and voluntary.
Read Chapter 14.