'Tis the Season to Be Testing
On the Value and
Limits of Standardized Tests
‘Tis the season to be testing. The New York Times reported this
past week that the state of New York was unveiling a new test, one of the first
in the nation based on the Common Core Standards,
which have been adopted by 45 states. North
Carolina schools give End-of-Grade tests (EOGs) at the end of the school year. Here at Trinity we give the Educational
Records Bureau’s Comprehensive Testing Profile IV (ERBs for short). Most independent schools give the ERBs, but
some give other standardized tests like the Iowa or the CAT. By state law, all non-public schools (of
which independent schools are one important subset) must give a nationally
normed test to all third, fifth, seventh, ninth, and eleventh graders. We give the ERBs to all Trinity students,
grades 3-8; our ninth graders take the Explore test, which is administered by
the ACT; tenth graders take the PSAT; eleventh graders take the SAT and
sometimes the ACT. Our seniors will
take various tests: some choose to take the AP exam in a subject where their
(non-AP) Trinity curriculum has covered a lot of the AP material; and juniors
and seniors also sometimes take the SAT subject tests as part of their college
application process. Most of these tests
are given in the spring of the year.
Our seventh and eighth graders completed their ERBs recently. This was the first time we have administered
them online through laptops. It seems
that this is the way of the future, so it was helpful for us to pilot this mode
of testing. By most accounts, it went
quite well. Our Lower School students
will take their ERBs in the middle of May.
Several weeks ago, Lower School Director Robin Lemke and
Middle School Director Mason Goss held a very informative meeting on
standardized testing. They defined
terms, explained different kinds of tests, walked parents through the norming
and scoring of these tests, shared general trends in Trinity scores, discussed
the ways we utilize these tests, and fielded quite a number or really good
questions from the parents in attendance. Here is a summary of some of their presentation:
BASICS in EDUCATIONAL TESTING
(Information from Dept. of Psychology, University of
California, Davis)
Aptitude is the
ability to learn or to develop proficiency in an area (if provided with
appropriate education or training). It is like talent. Examples are various
types of reasoning, artistic ability, motor coordination, musical talent. There
are aptitude tests that measure mechanical and linguistic ability, as well as
more specific skills, such as military flight and computer programming.
Intelligence is a
general mental capability that involves the ability to reason, plan, solve
problems, think abstractly, comprehend ideas and language, and learn.
Intellectual ability involves comprehension; understanding, and learning from
experience. Intelligence tests are aimed at assessing a person's underlying
intellectual ability. They are used primarily for clinical (diagnostic)
purposes. Examples are the Wechsler Adult Intelligence Scale and
the Stanford-Binet Intelligence Scale.
Achievement tests measure
the extent to which a person has "achieved" something, acquired
certain information, or mastered certain skills - usually as a result of
planned instruction or training. It is designed to efficiently measure the
amount of knowledge and/or skill a person has acquired, usually as a result of
classroom instruction. A common
use is to determine a person's academic level.
What- The ERB, CTP4 is the Comprehensive Testing Program,
edition 4. The program assesses
student achievement in English language arts and mathematics. Students are tested on vocabulary, verbal
ability, reading comprehension, writing mechanics, writing concepts,
mathematics and Algebra.
When- Middle School is April 2-5 / Lower School is May
13-16
Why- Diagnostic
tool. Middle School-curriculum assessment
Lower
School- individual performance and curricular trends
How- Four to
five days of testing
Middle
School - 4 days of on-line testing in 90-minute
blocks,
WrAP
(writing) is paper and pencil
Lower
School- paper and pencil testing
Who- All 3rd
through 8th grade students
For more information please
visit the Educational Records Bureau
(ERB) cite at
http://erblearn.org/schools/achievement/ctp
This kind of testing almost always stirs up anxiety and
stress among students and parents. This
is all the more true when the test is administered in a year of curricular
change, as in the states where the tests are measuring students against the new
Common Core Standards. The
Times piece highlights some of
the stresses and challenges of high stakes testing, and I heard today from a
teacher at Frank Porter Graham Elementary School in the Chapel Hill district
that the same kind of anxiety is attending this year’s testing there, where the
Common Core is measuring students against new standards.
Let’s take a step back and ask why we give these tests. What is valuable about the feedback they
give? At the same time, we want to be
aware of the downsides and dangers of such testing. In its eighteen year history, Trinity has
tried to walk this fine line, milking the tests for all they can show us
without allowing Trinity to become a place that is reigned by and shaped
primarily by these tests.
I have learned the value of that fine line again for myself
in an avocation I’ve enjoyed greatly over the last several years: cycling. I think it’s fair to say that I’ve learned to
love cycling. I do it for the exercise, but I do it more
for the mental and spiritual health it affords me. There are days on the bike (this past Saturday,
for instance), where the weather is crisp and I have enough time really to get
away and let my mind and spirit clear. This
side of cycling is like the love of learning in education: It’s the heart and
soul of what I do and why I do it.
But there is another side to the bike. I enjoy a good race now and then, one where
they put a chip on your shoe and you see your time when you come across the
finish line. And even on more mundane
ride, there are now all manner of devices to measure yourself and your
performance. I wear a heart-rate
monitor. I have a ridiculously expensive
computer on my bike that shows me my heart-rate, my average heart-rate, my
cadence, my average cadence, my speed, my average speed, and time elapsed.
Without the instruments, I enjoy the bike more. There
are days when I don’t wear the monitor and don’t pay much attention to the
computer, because I know I’m out there mainly for my mental health, or to spend
my time working though and praying through a challenge or a problem. About a month ago, the battery in my computer
went low and I had a week of riding without instruments. I was free, in a way, and I loved it.
But without instruments and without competition, I get
lazy. I think I’m fast until I see my
numbers. Showing up at a race with 800
riders from across the refion is a reality check. Just because I can hang with the boys at home
doesn’t mean much when the CAT-1 riders show up. The instruments and the races add stress, but
they make me better too. And there is
joy in getting better. There is also joy
in hitting your goal. (I still have some
unfinished business with a certain race around Valle Crucis.)
I offer this analogy to say that we want to measure
ourselves and learn from those measures; but we also want to make sure that we
enjoy the learning and never lose that fundamental joy. Finding the balance is key. Using the data wisely and with moderation is
a key too. And being willing, sometimes,
to say, “Forget the data, let’s just ride!”
So with this in mind, let’s think about the values and
limitations of standardized testing in schools:
The Values of Testing
1.
We measure
ourselves against a group larger than our own class and school. Grades at school are a bit like the group
that shows up at the four-way stop at Trinity to ride on Saturday. ERB Independent School norms are like showing
up at a race with riders from six states.
It will keep you humble, and it will bring out the best in you. The tests we take at Trinity are not measured
against other students at Trinity; they are measured against national norms and
they are measured against independent school norms. The latter is a really challenging group to
go up against. Some of our students will
perform at the highest levels (stanines) even against this norm; many of our students,
whose nationally normed scores are in the 80-99 percentile, will spread out
across stanines 4-9. The competition is
stiffer. That may be hard for some of
us, but it keeps us honest and it makes us better.
People want to know how we do on these tests. Here is a very general but accurate answer: When we measure ourselves against the national norms, our median scores are Stanines 7 or 8. (Stanines are a way of dividing a normal bell-curve distribution into nine segments, called "standard nines" or "stanines" for short. I have put a diagram below. Stanines 4-6 are average, and account for the largest number of scores in such a standard distribution. Stanines 7-9, which comprise a smaller number of scores, are considered above average.) When we measure Trinity students against the independent school norms, our median scores are 5 or 6, which is to say that our students are scoring at or slightly above the average for independent schools.
People want to know how we do on these tests. Here is a very general but accurate answer: When we measure ourselves against the national norms, our median scores are Stanines 7 or 8. (Stanines are a way of dividing a normal bell-curve distribution into nine segments, called "standard nines" or "stanines" for short. I have put a diagram below. Stanines 4-6 are average, and account for the largest number of scores in such a standard distribution. Stanines 7-9, which comprise a smaller number of scores, are considered above average.) When we measure Trinity students against the independent school norms, our median scores are 5 or 6, which is to say that our students are scoring at or slightly above the average for independent schools.
We are often asked what our goals
are for these standardized tests, on the independent norms. Don’t we want all our students to be in the
highest stanine [9]? Not actually. For there are really only two ways to
accomplish this, if it is possible at all, and both of them contradict fundamental
and core values of Trinity. We would
need to spend much more time teaching the test, practicing the test,
emphasizing the test, and this would mean time away from the classical, rich,
and unhurried education that is our mission.
The second thing we could do is to screen differently for admissions and
accept only those students who test well.
This is problematic on a number of fronts, not the least of which is
that it runs counter to our goal to have Trinity be a family school, serving as many children in a given family as we
can. Those of us with multiple children
know that they are all different, all think and learn differently, and all test
differently. I’d hate to think that we
couldn’t serve Trinity families just because we wanted a certain profile on our
standardized test scores.)
2.
We can
observe trends over time. We can see
how one cohort—say, the Class of 2018—has performed over time, on various tests
and sub-tests. We can also see how a
particular grade at Trinity performs, over several years, in a given area. This gives teachers and administrators
invaluable information for adjusting curricula, pacing our teaching,
differentiating instruction according to a class’ strengths and weaknesses,
insuring smooth hand-offs from grade to grade, and identifying areas for school
improvement.
3.
We
have benchmarks for curriculum adjustments and curriculum changes. We never use these tests alone, but in
conjunction with other assessments, these provide an external rigorous measure
of student’s learning. They help us see
where we might need to adjust or change curricula; and when we do make those
adjustments, they give us one clear benchmark we can watch for impact.
(Note: When schools change curriculum, as
Trinity has in its 6-8 math over the last few years, they often will see a dip
in standardized scores. Part of the
brouhaha in New York these days is owing to the introduction of the new tests
to measure the new Common Core standards; Kentucky, which led the nation in
implementing these, saw a dip in scores in the first year or two.)
4.
These
tests provide an additional set of data when we need to make decisions in
Middle and Upper School about where to place students (in math classes, in
honors vs. college prep classes in US).
The operative word here is additional. We never make an important decision like this
with only one data point. Teacher evaluation of students, grades, multiple
assessments, and student motivation—these all play into our decisions about
placement. But the ERB scores do give us
a point of reference, and especially where they corroborate grades or teacher
evaluations, they can be very helpful.
(Note: Being placed in an accelerated math
class or in an honors class in Upper School will be the right thing for some of
our students, but Trinity is a place that celebrates all of our students, and
we want to find ways to affirm the student who, for instance, needs to take
Algebra 1 in ninth grade. This “slower”
track may be a hard pill for some students and parents to swallow, but it is, I
am convinced, the best placement for some students. It does not mean a life of inferior
mathematical literacy or of limited college choices. It means that students will be well-placed
and able to thrive at the learning they are best suited for.)
The Limits of Testing
1.
Testing
creates stress, in some students so much stress that they actually
underperform. Some students will do
well on these tests only with lots of coaching and practice; some will always
struggle with them. At Trinity, we are
unwilling to devote enormous amounts of school hours to practicing these tests
(for reasons I outlined above), so that our poor test-takers may struggle with
them. For many students, the experience
of taking these tests year after year is helpful, and we do see some scores go
up over time, probably for this reason.
Third grade is the first year we give these tests (in accordance with
state law), and we do sometimes see some low scores that year. We will want to wait and see how fourth and
fifth grades go before we draw strong conclusions. To deal with the stress, we recommend
strongly that parents not over-emphasize these tests. By no means should you sit down with your
third or fourth grader and go over the scores.
Make sure students get a good night’s sleep the night before and get a
good breakfast that morning.
2.
Testing
measures something but misses so much.
These standardized tests measure a certain kind of thinking, but they do
not measure some of the things we value most at Trinity: perseverance, creativity,
innovation, grit. These are things which
employers value greatly—more than GPAs and SAT scores, to be sure. What’s more, these tests are very limited in
the sort of thinking skills they measure.
Jacques Barzun has written persuasively about the problems and limits of
multiple choice tests. And Cathy
Davidson of Duke tells an amazing story of how in high school she, with a
learning disability, had scored miserably on the ACT but had written lengthy
essay answers on the back of the test, identifying ambiguous questions and
explaining how the answers to some questions were all incorrect and why. An angel in the form of a test reader took
the time to read those answers and to write to Cathy’s principal, who sat her
down and helped her find a way into college.
She is now the John Hope Franklin Humanities
Institute Professor of Interdisciplinary Studies and Ruth F. Devarney Professor
of English at Duke University.
3.
Measuring
changes us. We become what we
measure. We shift our behaviors, often
subtly and imperceptibly but nevertheless truly so that we are maximizing those
efforts that produce good test results.
This can sometimes be good. If I
watch my cadence meter all the time, I will increase cadence on the bike—well,
I have to do a bit of pedaling as well as watching, but if I keep my computer
dialed in to that metric, I am very likely to adjust my cadence upwards. This is good if that is what I want. But if what I am after is wattage,
acceleration, and power, that may not be the thing to do. And—more to the point—if prayer and meditation
is what I am after, watching my cadence may be downright counterproductive. Schools that emphasize standardized tests
become good at taking standardized tests.
But when was the last time your boss gave you a standardized test? When was the last time life dealt you a
standardized test? Is this really the
principal way we want to shape and form the next generation?
4.
Measuring
can rob us of the joy of learning and we can forget why we do what we do. I spoke earlier this year (at the
Headmaster’s Dinner) of the importance of wonder. It is, I think, the sine qua non of a good
education. From Einstein’s happiest
moment (when he realized his equation for general relativity actually explained
a discrepancy in the orbit of Mercury) to the second graders watching the
chicks hatch, this is the stuff of learning.
Too much measurement can sap the wonder out of any learning. Augustine was right that there are things to
be used and things to be enjoyed, and that the things to be enjoyed are
better. When we measure our learning, we
are using it for an instrumental purpose, maybe a very good one. But never one as good as the joy of pure
learning. Four years ago, I had the
once-in-a-lifetime privilege of being part of a team of cyclists who
participated in the Race Across America (RAAM).
Wouldn’t you know it—my computer broke a day into the race. There I was riding across the country, with
no idea really how fast I was going, certainly no idea of what my average speed
was. I found out, when we finished, what
our group’s average speed was, but to this day, I have no idea of what I
contributed to that—which stanine of our team I was in. I think that glitch was a gift from God. I don’t remember my average, but I remember
the ride: coasting down the Arizona mountains into the sunrise, speeding down a
Fourteener in Colorado at an indeterminate but chilling speed, driving hard and
fast across the cornfields of Kansas, slogging through frog-thick Missouri
marshes in the middle of the night, cruising through the German towns of Ohio
at twilight, climbing the steep hills of West Virginia in a final push toward
the finish.
Thus do we test them at Trinity, and thus do we take these
tests with a grain of salt. May our
students test well enough to keep their best options open. And may they forget about them when they are
done, so that they can get on with the best work of learning.
Comments