'Tis the Season to Be Testing

On the Value and Limits of Standardized Tests

‘Tis the season to be testing.  The New York Times reported this past week that the state of New York was unveiling a new test, one of the first in the nation based on the Common Core Standards, which have been adopted by 45 states.  North Carolina schools give End-of-Grade tests (EOGs) at the end of the school year.  Here at Trinity we give the Educational Records Bureau’s Comprehensive Testing Profile IV (ERBs for short).  Most independent schools give the ERBs, but some give other standardized tests like the Iowa or the CAT.  By state law, all non-public schools (of which independent schools are one important subset) must give a nationally normed test to all third, fifth, seventh, ninth, and eleventh graders.  We give the ERBs to all Trinity students, grades 3-8; our ninth graders take the Explore test, which is administered by the ACT; tenth graders take the PSAT; eleventh graders take the SAT and sometimes the ACT.   Our seniors will take various tests: some choose to take the AP exam in a subject where their (non-AP) Trinity curriculum has covered a lot of the AP material; and juniors and seniors also sometimes take the SAT subject tests as part of their college application process.  Most of these tests are given in the spring of the year. 

Our seventh and eighth graders completed their ERBs recently.   This was the first time we have administered them online through laptops.  It seems that this is the way of the future, so it was helpful for us to pilot this mode of testing.  By most accounts, it went quite well.  Our Lower School students will take their ERBs in the middle of May. 

Several weeks ago, Lower School Director Robin Lemke and Middle School Director Mason Goss held a very informative meeting on standardized testing.  They defined terms, explained different kinds of tests, walked parents through the norming and scoring of these tests, shared general trends in Trinity scores, discussed the ways we utilize these tests, and fielded quite a number or really good questions from the parents in attendance.  Here is a summary of some of their presentation:

(Information from Dept. of Psychology, University of California, Davis)

Aptitude is the ability to learn or to develop proficiency in an area (if provided with appropriate education or training). It is like talent. Examples are various types of reasoning, artistic ability, motor coordination, musical talent. There are aptitude tests that measure mechanical and linguistic ability, as well as more specific skills, such as military flight and computer programming.
Intelligence is a general mental capability that involves the ability to reason, plan, solve problems, think abstractly, comprehend ideas and language, and learn. Intellectual ability involves comprehension; understanding, and learning from experience. Intelligence tests are aimed at assessing a person's underlying intellectual ability. They are used primarily for clinical (diagnostic) purposes. Examples are the Wechsler Adult Intelligence Scale and the Stanford-Binet Intelligence Scale.
Achievement tests measure the extent to which a person has "achieved" something, acquired certain information, or mastered certain skills - usually as a result of planned instruction or training. It is designed to efficiently measure the amount of knowledge and/or skill a person has acquired, usually as a result of classroom instruction.  A common use is to determine a person's academic level.
What- The ERB, CTP4 is the Comprehensive Testing Program, edition 4.  The program assesses student achievement in English language arts and mathematics.  Students are tested on vocabulary, verbal ability, reading comprehension, writing mechanics, writing concepts, mathematics and Algebra.

When- Middle School is April 2-5 / Lower School is May 13-16

Why-       Diagnostic tool.  Middle School-curriculum assessment
Lower School- individual performance and curricular trends

How-       Four to five days of testing
Middle School - 4 days of on-line testing in 90-minute blocks,
WrAP (writing) is paper and pencil
Lower School- paper and pencil testing

Who-       All 3rd through 8th grade students

For more information please visit the Educational Records Bureau (ERB) cite at


This kind of testing almost always stirs up anxiety and stress among students and parents.  This is all the more true when the test is administered in a year of curricular change, as in the states where the tests are measuring students against the new Common Core Standards.   The Times piece highlights some of the stresses and challenges of high stakes testing, and I heard today from a teacher at Frank Porter Graham Elementary School in the Chapel Hill district that the same kind of anxiety is attending this year’s testing there, where the Common Core is measuring students against new standards.

Let’s take a step back and ask why we give these tests.  What is valuable about the feedback they give?  At the same time, we want to be aware of the downsides and dangers of such testing.  In its eighteen year history, Trinity has tried to walk this fine line, milking the tests for all they can show us without allowing Trinity to become a place that is reigned by and shaped primarily by these tests. 

I have learned the value of that fine line again for myself in an avocation I’ve enjoyed greatly over the last several years: cycling.  I think it’s fair to say that I’ve learned to love cycling.  I do it for the exercise, but I do it more for the mental and spiritual health it affords me.  There are days on the bike (this past Saturday, for instance), where the weather is crisp and I have enough time really to get away and let my mind and spirit clear.  This side of cycling is like the love of learning in education: It’s the heart and soul of what I do and why I do it. 

But there is another side to the bike.  I enjoy a good race now and then, one where they put a chip on your shoe and you see your time when you come across the finish line.  And even on more mundane ride, there are now all manner of devices to measure yourself and your performance.  I wear a heart-rate monitor.  I have a ridiculously expensive computer on my bike that shows me my heart-rate, my average heart-rate, my cadence, my average cadence, my speed, my average speed, and time elapsed. 

Without the instruments, I enjoy the bike more.  There are days when I don’t wear the monitor and don’t pay much attention to the computer, because I know I’m out there mainly for my mental health, or to spend my time working though and praying through a challenge or a problem.  About a month ago, the battery in my computer went low and I had a week of riding without instruments.  I was free, in a way, and I loved it. 

But without instruments and without competition, I get lazy.  I think I’m fast until I see my numbers.  Showing up at a race with 800 riders from across the refion is a reality check.  Just because I can hang with the boys at home doesn’t mean much when the CAT-1 riders show up.  The instruments and the races add stress, but they make me better too.  And there is joy in getting better.  There is also joy in hitting your goal.  (I still have some unfinished business with a certain race around Valle Crucis.)

I offer this analogy to say that we want to measure ourselves and learn from those measures; but we also want to make sure that we enjoy the learning and never lose that fundamental joy.  Finding the balance is key.  Using the data wisely and with moderation is a key too.  And being willing, sometimes, to say, “Forget the data, let’s just ride!”

So with this in mind, let’s think about the values and limitations of standardized testing in schools:

The Values of Testing

1.     We measure ourselves against a group larger than our own class and school.  Grades at school are a bit like the group that shows up at the four-way stop at Trinity to ride on Saturday.  ERB Independent School norms are like showing up at a race with riders from six states.  It will keep you humble, and it will bring out the best in you.  The tests we take at Trinity are not measured against other students at Trinity; they are measured against national norms and they are measured against independent school norms.  The latter is a really challenging group to go up against.  Some of our students will perform at the highest levels (stanines) even against this norm; many of our students, whose nationally normed scores are in the 80-99 percentile, will spread out across stanines 4-9.  The competition is stiffer.  That may be hard for some of us, but it keeps us honest and it makes us better. 

People want to know how we do on these tests.  Here is a very general but accurate answer: When we measure ourselves against the national norms, our median scores are Stanines 7 or 8.  (Stanines are a way of dividing a normal bell-curve distribution into nine segments, called "standard nines" or "stanines" for short.  I have put a diagram below.  Stanines 4-6 are average, and account for the largest number of scores in such a standard distribution.  Stanines 7-9, which comprise a smaller number of scores, are considered above average.)  When we measure Trinity students against the independent school norms, our median scores are 5 or 6, which is to say that our students are scoring at or slightly above the average for independent schools.  

We are often asked what our goals are for these standardized tests, on the independent norms.  Don’t we want all our students to be in the highest stanine [9]?  Not actually.  For there are really only two ways to accomplish this, if it is possible at all, and both of them contradict fundamental and core values of Trinity.  We would need to spend much more time teaching the test, practicing the test, emphasizing the test, and this would mean time away from the classical, rich, and unhurried education that is our mission.  The second thing we could do is to screen differently for admissions and accept only those students who test well.  This is problematic on a number of fronts, not the least of which is that it runs counter to our goal to have Trinity be a family school, serving as many children in a given family as we can.  Those of us with multiple children know that they are all different, all think and learn differently, and all test differently.  I’d hate to think that we couldn’t serve Trinity families just because we wanted a certain profile on our standardized test scores.)

2.     We can observe trends over time.  We can see how one cohort—say, the Class of 2018—has performed over time, on various tests and sub-tests.  We can also see how a particular grade at Trinity performs, over several years, in a given area.  This gives teachers and administrators invaluable information for adjusting curricula, pacing our teaching, differentiating instruction according to a class’ strengths and weaknesses, insuring smooth hand-offs from grade to grade, and identifying areas for school improvement.

3.     We have benchmarks for curriculum adjustments and curriculum changes.   We never use these tests alone, but in conjunction with other assessments, these provide an external rigorous measure of student’s learning.  They help us see where we might need to adjust or change curricula; and when we do make those adjustments, they give us one clear benchmark we can watch for impact. 
(Note: When schools change curriculum, as Trinity has in its 6-8 math over the last few years, they often will see a dip in standardized scores.  Part of the brouhaha in New York these days is owing to the introduction of the new tests to measure the new Common Core standards; Kentucky, which led the nation in implementing these, saw a dip in scores in the first year or two.)

4.     These tests provide an additional set of data when we need to make decisions in Middle and Upper School about where to place students (in math classes, in honors vs. college prep classes in US).  The operative word here is additional.  We never make an important decision like this with only one data point. Teacher evaluation of students, grades, multiple assessments, and student motivation—these all play into our decisions about placement.  But the ERB scores do give us a point of reference, and especially where they corroborate grades or teacher evaluations, they can be very helpful. 
(Note: Being placed in an accelerated math class or in an honors class in Upper School will be the right thing for some of our students, but Trinity is a place that celebrates all of our students, and we want to find ways to affirm the student who, for instance, needs to take Algebra 1 in ninth grade.  This “slower” track may be a hard pill for some students and parents to swallow, but it is, I am convinced, the best placement for some students.  It does not mean a life of inferior mathematical literacy or of limited college choices.  It means that students will be well-placed and able to thrive at the learning they are best suited for.)

The Limits of Testing

1.     Testing creates stress, in some students so much stress that they actually underperform.  Some students will do well on these tests only with lots of coaching and practice; some will always struggle with them.  At Trinity, we are unwilling to devote enormous amounts of school hours to practicing these tests (for reasons I outlined above), so that our poor test-takers may struggle with them.  For many students, the experience of taking these tests year after year is helpful, and we do see some scores go up over time, probably for this reason.  Third grade is the first year we give these tests (in accordance with state law), and we do sometimes see some low scores that year.  We will want to wait and see how fourth and fifth grades go before we draw strong conclusions.  To deal with the stress, we recommend strongly that parents not over-emphasize these tests.  By no means should you sit down with your third or fourth grader and go over the scores.  Make sure students get a good night’s sleep the night before and get a good breakfast that morning.

2.     Testing measures something but misses so much.  These standardized tests measure a certain kind of thinking, but they do not measure some of the things we value most at Trinity: perseverance, creativity, innovation, grit.  These are things which employers value greatly—more than GPAs and SAT scores, to be sure.  What’s more, these tests are very limited in the sort of thinking skills they measure.  Jacques Barzun has written persuasively about the problems and limits of multiple choice tests.  And Cathy Davidson of Duke tells an amazing story of how in high school she, with a learning disability, had scored miserably on the ACT but had written lengthy essay answers on the back of the test, identifying ambiguous questions and explaining how the answers to some questions were all incorrect and why.  An angel in the form of a test reader took the time to read those answers and to write to Cathy’s principal, who sat her down and helped her find a way into college.  She is now the John Hope Franklin Humanities Institute Professor of Interdisciplinary Studies and Ruth F. Devarney Professor of English at Duke University.

3.     Measuring changes us.  We become what we measure.  We shift our behaviors, often subtly and imperceptibly but nevertheless truly so that we are maximizing those efforts that produce good test results.  This can sometimes be good.  If I watch my cadence meter all the time, I will increase cadence on the bike—well, I have to do a bit of pedaling as well as watching, but if I keep my computer dialed in to that metric, I am very likely to adjust my cadence upwards.  This is good if that is what I want.  But if what I am after is wattage, acceleration, and power, that may not be the thing to do.  And—more to the point—if prayer and meditation is what I am after, watching my cadence may be downright counterproductive.  Schools that emphasize standardized tests become good at taking standardized tests.  But when was the last time your boss gave you a standardized test?  When was the last time life dealt you a standardized test?  Is this really the principal way we want to shape and form the next generation?

4.     Measuring can rob us of the joy of learning and we can forget why we do what we do.  I spoke earlier this year (at the Headmaster’s Dinner) of the importance of wonder.  It is, I think, the sine qua non of a good education.  From Einstein’s happiest moment (when he realized his equation for general relativity actually explained a discrepancy in the orbit of Mercury) to the second graders watching the chicks hatch, this is the stuff of learning.  Too much measurement can sap the wonder out of any learning.  Augustine was right that there are things to be used and things to be enjoyed, and that the things to be enjoyed are better.  When we measure our learning, we are using it for an instrumental purpose, maybe a very good one.  But never one as good as the joy of pure learning.  Four years ago, I had the once-in-a-lifetime privilege of being part of a team of cyclists who participated in the Race Across America (RAAM).  Wouldn’t you know it—my computer broke a day into the race.  There I was riding across the country, with no idea really how fast I was going, certainly no idea of what my average speed was.  I found out, when we finished, what our group’s average speed was, but to this day, I have no idea of what I contributed to that—which stanine of our team I was in.  I think that glitch was a gift from God.  I don’t remember my average, but I remember the ride: coasting down the Arizona mountains into the sunrise, speeding down a Fourteener in Colorado at an indeterminate but chilling speed, driving hard and fast across the cornfields of Kansas, slogging through frog-thick Missouri marshes in the middle of the night, cruising through the German towns of Ohio at twilight, climbing the steep hills of West Virginia in a final push toward the finish.

Thus do we test them at Trinity, and thus do we take these tests with a grain of salt.  May our students test well enough to keep their best options open.  And may they forget about them when they are done, so that they can get on with the best work of learning. 


Popular Posts