assessments

Pass or Fail: Is Testing a Valid Way to Measure Student Progress?

In this multi-part series, I provide a dissection of the phenomenon of retention and social promotion. Also, I describe the many different methods that would improve student instruction in classrooms and eliminate the need for retention and social promotion if combined effectively.

While reading this series, periodically ask yourself this question: Why are educators, parents and the American public complicit in a practice that does demonstrable harm to children and the competitive future of the country?

What if the measures we use to determine passing or failing grades are completely skewed? Is standardized testing, or any testing for that matter, the right way to determine student progress?

For obvious reasons, one of the first and most significant concerns for the application of standardized tests is that they are not consistent with the standards for fair and appropriate testing. Of course, educators must first define the standards themselves, and demonstrate them to be relevant. In this instance, we are referring to the standards for fair and appropriate testing as defined by the NRC Report, which says that measurement validity refers to the extent to which evidence supports a proposed interpretation and use of test scores for a particular purpose.

For instance, a measurement validity of the reading section of the SAT I standard test would be assessed to have a reasonable validity for assessment of an individual’s reading comprehension skills, knowledge of grammar rules, and ability to make inferences from texts. The use of scores from this test to determine an individual’s preparedness for entry into a particular college program would also be reasonably good. The component of appropriate testing usually overlaps with this second issue of validity, too, which the NRC Report Standards also outlines, and which is backed up by the findings of various other organizations.

To go back to the more formal parameters, the general rule is that the internal structure of the test, the content of the test, the relationship of the test to other criteria, and the psychological processes and cognitive operations used by the examinee in responding to the test items must all support the purpose of the test.

A test assessing knowledge and skill should target the knowledge and skills specifically; looking, as well, to ensure that the knowledge and skills being assessed are those that have been obtained from appropriate instruction. In some instances, knowledge might depend on poor instruction or on factors that are unrelated to the skills under review. For instance, a student might score poorly on the SAT reading test because their teachers didn’t transfer the necessary knowledge and skill (the students may not have received the targeted knowledge of proper grammar, for instance, or they have received inadequate instruction on how to read critically).

Another example would be that an individual might score badly on the SAT reading test not because they lack reading comprehension skills that the test intends to assess but because they have significant language barriers or because there are cultural differences that have some bearing on the test. For instance, a passage in American history that is being read for comprehension but that in some way relies upon presupposed knowledge of American history or customs might be problematic and undermine the validity and fairness of tests scores, undermining the attribution of cause.

Disabilities can also factor as an issue for the attribution of cause. Several types of cognitive or even physical disabilities can undermine an individual’s performance in a testing scenario without appropriate interventions provided to support the student’s exceptionalities.

In the context of K-12 assessments, the cause component also influences the extent to which students receive adequate opportunity to learn the material for the test. Adequate quality and quantity of instruction become important, as does the alignment of test content and curriculum.

Students need adequate opportunity within the testing scenarios to demonstrate their knowledge. If tests contain irrelevant language or content, for instance, students may not have adequate opportunity to perform and test developers will have compromised the fairness and relevance of the test.

Furthermore, many of the criteria for fairness in testing standards overlap with attribution of cause. In the Standards, overlapping elements include the investigation of bias and differential item functioning, determining whether construct-irrelevant variance differentially affects different groups of examinees, and equal treatment during the testing process.

Circular validity lies within the cause component in the sense that it relates to the alignment between test content and the curriculum taught in class. Chapter 13 of the Standards determines that “There should be evidence that the test adequately covers only the specific or generalized content and skills that students have had an opportunity to learn.”

This goes beyond the criteria outlined here and applies to a broader interpretation of opportunity to learn; one that is not restricted to curricular validity but also inclusive of the consideration of instructional quality as a predictor of student test scores.

Certain polices within the K-12 setting make high-stakes student decisions dependent upon evidence that the student has the educational experience and opportunity to acquire relevant knowledge and skill. Where students have lacked sufficient opportunity to acquire desired skills in an educational context, they may not meet the criteria for grade promotion or graduation.

At the same time, though, it is hardly fair that the student be held accountable for the deficit in their learning. At what point do we say: this portion of education is the responsibility of the schools, of the system and the stakeholders, not just the individual student?

The effectiveness of treatment is the final component of the fair and appropriate test criteria, relating to whether test scores lead to consequences that are educationally beneficial in a given context. Consequences could include placement in a particular academic grouping based on ability or advancement from one level of learning to a higher level based on test achievement. Accountability plays a part here, too, as the criteria for effective treatment determines that it is inappropriate to use tests to make placements that are not educationally beneficial.

When tests are used in placement decisions, they must be fair and appropriate. Students must be “better off in the setting in which they are placed than they would be in a different available setting.” With all of these factors in mind, though, can testing ever truly be trusted as a placement option for students?

Accountability: Just One Piece of the School Reform Puzzle

School reform can no longer rely mostly on inputs—that is, giving schools more resources and more support. In order for schools to really help the students on hand, the past must play a role and so must the individual needs of the school.

Do standards and accountability work?

Time has shown that inputs have no real impact on student performance. Federal edicts, such as NCLB, have enforced protocols based on standards, testing, and accountability. Standards emphasize performance objectives and require high levels of accountability from educators.

Required reform and accountability, particularly those which impose sanctions similar to those imposed by NCLB, often create much stress and anxiety. This certainly has been the case since NCLB went into effect. Many educators ask whether it is fair to hold schools accountable for student achievement. And, even if it is “fair,” how are we to measure such achievement? What testing and evaluation formulas will be used?  The answers to questions like the above are not easy. Obviously, achievement can only be guaranteed if we assess it in some way. However, current assessment models are flawed.

Research exists to suggest that standards and accountability may improve learning for some disadvantaged students, particularly those with disabilities. When some schools implement accountability guidelines, they promote an environment of increased collaboration among educators and created an environment where teachers expected disabled students to perform better, which in turn encouraged better learning outcomes.

Some countries have been able to show effective and useful outcomes based on their use of certain accountability policies. However, American policy-makers and researchers still do not have any real evidence that these latest accountability reforms are working to improve the performance of the vast majority of students.

What’s the argument surrounding accountability?

Conversations around school accountability have been polarized. Politicians and parents often want to hold schools and teachers completely responsible for student achievement. Teachers point to disinterested students and uninvolved parents, saying that there is only so much they can do. But studies have shown that if teachers and students work together, and schools hold themselves accountable, great strides can be made. All of this discussion of accountability and standards is intended to bring us to a place where schools are performing better and our children are learning.

Researchers at Sam Houston State University in Huntsville, Texas observed positive strides toward improved learning outcomes among a variety of middle schools. The researchers believed that improvement strategies must not only improve learning, but also develop responsiveness and social equity. While studying middle schools, they found that teachers at high-performing schools were using teaching strategies that required students to think critically, and strategies that involved the use of real-world problems.

These teachers were not simply teaching abstract ideas or teaching to the test. They noted that student achievement can be improved when students receive recognition for efforts such as note-taking and doing homework, as well as having the opportunities to work collaboratively in groups and engage in active learning like the testing of hypothesis.

These findings show that the type of assessment or accountability that NCLB brings is not the be all and end all of the teaching equation. Rather, the quality of instruction is the biggest part of learning. It is paramount that we continue to work toward a more balanced solution, finding ways to encourage quality instruction, while also monitoring results.

Inputs alone cannot properly reform a school or district; it takes constant monitoring and understanding of the student population to effect change that will positively impact the students it is meant to serve.