Fellowship Paper
More Than MCAS:
Performance-Based Assessment
in the Days of Standardized Tests
by
Peter Garbus
Francis W. Parker Charter Essential
School
for the
Massachusetts Charter School
Fellowship Program
2000
It might feel as if MCAS is the first and last word on assessment in the
state of Massachusetts. However, there are more voices out here, and we
say that our schools need performance-based, authentic forms of assessment,
too.
At the Francis W. Parker Charter Essential School, we assess our students
using portfolios, exhibitions, performance-based assessments, and sometimes
tests. The state granted our school a charter to create a school where students
learn to use their minds well and where they demonstrate such learning through
public exhibitions. We believe that students will only learn to use their
minds well if they are asked to do so, and for us that requires performance-based
assessment. Ironically, our students have done very well on the English/Language
Arts and Mathematics sections of the MCAS for the past two years without
much emphasis from the school. For us to say that we need more than MCAS
is not sour grapes. We feel strongly that MCAS might very well do damage
to the good work that is going on here.
Yet the State and others have some important questions for us. How reliable
and valid are these exhibitions as a form of assessment? How can we show
that performance-based assessment works in ways that serve the students
in our schools and also serves the State's need for some kind of accountability?
At Parker, we know that performance-based assessment works for our students
and our school. It just might work for your school. It just might work for
the State.
Performance-based Assessment: What it
is and Why do it
Let me begin with some definitions. R.J. Dietel, J.L. Herman, and R.A. Knuth
of the North Central Regional Educational Laboratory define performance
assessment as:
an evaluation in which students are asked to engage in a complex task,
often involving the creation of a product. Student performance is rated
based on the process the student engages in and/or based on the product
of his/her task. Many performance assessments emulate actual workplace
activities or real-life skill applications that require higher order processing
skills. (Dietel, Herman, Knuth, 1991)
Joan Herman, the associate director of the National Center for Research
on Evaluation, Standards and Student Testing at UCLA Graduate School of
Education, defines it this way:
The essence of performance assessments--whether in the form of open-ended
questions, essays, experiments or portfolios--is that they ask students
to create something of meaning. A good performance assessment taps complex
thinking and/or problem-solving, addresses important disciplinary content,
invokes authentic or real-world applications and uses tasks that are instructionally
meaningful. Furthermore, because they require students to construct a unique
answer, performance assessments typically are scored by humans, exercising
judgment, rather than by machines. (Herman, 1998)
With performance-based assessment, students engage, create, emulate real
world tasks, solve problems, construct unique answers, and they think.
This kind of assessment is not new. Providing us some historical context,
George Madaus and Laura O'Dwyer of Boston College's Department of Education
went all the way back to the Han Dynasty in China to identify essentially
four ways to assess what a person knows. They describe those four ways to
assess as follows:
First, you can ask the person to supply an oral or written answer to a
series of questions (e.g., essay questions, short-answer questions, oral
disputations). Second, you can ask a person to produce a product (e.g.,
a portfolio of artwork, a research paper, a chair, a piece of cut glass).
Third, you can require the person to perform an act to be evaluated against
certain criteria (e.g., conduct a chemistry experiment, read aloud from
a book, repair a carburetor, drive a car). Finally, and historically the
most recent, you can have an examinee select an answer to a question or
a posed problem from among several options (i.e., the multiple-choice or
true/false item). We define performance assessment broadly: performance
assessment requires examinees to construct/supply answers, perform, or
produce something for evaluation. Thus performance assessment embraces
the first three ways of assessing and excludes the fourth. (Madaus and
O'Dwyer, 1999)
In those first three methods, a student would have to create or construct
or do something in order to show their ability to create, construct, and
do. That's exactly what the craft guilds needed to produce master craftsmen
who could skillfully make the things they were supposed to make. The fourth
practice, a method of accountability emphasizing efficiency, uniformity,
scientific reliability, and norm-referenced comparability, came about with
the rise of industrialization and has dominated education ever since. Human
beings as segments of the assembly line. So what does our world need today,
cogs in the machine or master craftsmen?
Our world is changing and we need now more than
ever to help our students develop essential skills and habits of mind. As
technology speeds and widens our access to information and as the sheer
amount of information grows exponentially, we will need fewer memorizers
and more thinkers. We will need creative, resourceful problem-solvers that
can identify an issue, research it effectively, and sift through the piles
of information for insight and ideas. It matters less whether educated people
know thousands of bits of discrete information and matters more whether
they can analyze and synthesize the information they find. While it may
be more challenging and more expensive to design large scale assessment
systems around performance, is there really any choice not to when standardized
tests don't come close to measuring whether students can think?
We also need to match our practices to what
recent science tells us about how people learn. Dietel, Herman, and Knuth
summarize these findings:
From today's cognitive perspective, meaningful learning is reflective,
constructive, and self-regulated. People are seen not as mere recorders
of factual information but as creators of their own unique knowledge structures.
To know something is not just to have received information but to have
interpreted it and related it to other knowledge one already has. In addition,
we now recognize the importance of knowing not just how to perform, but
also when to perform and how to adapt that performance to new situations.
Thus, the presence or absence of discrete bits of information-which is
typically the focus of traditional multiple-choice tests-is not of primary
importance in the assessment of meaningful learning. Rather, what is important
is how and whether students organize, structure, and use that information
in context to solve complex problems. (Dietel, Herman, Knuth, 1991)
Information in use, students constructing their own knowledge -- these are
the hallmarks of meaningful learning.
Not only does this learning theory suggest new goals for assessment, but
it also tells us that it is extremely difficult for students even to learn
"the basics" because of state requirements that emphasize discreet
facts over performed skills. The same authors explain:
Learning isolated facts and skills is more difficult without meaningful
ways to organize the information and make it easy to remember. Also, applying
those skills later to solve real-world problems becomes a separate and
more difficult task. Because some students have had such trouble mastering
decontextualized "basics," they are rarely given the opportunity
to use and develop higher-order thinking skills. (Dietel, Herman, and Knuth,
1991)
Indeed, students are more likely to learn the basics when their learning
is situated in a real world context and when the assessment is performance-based.
In light of this current research, we have to change the way we think about
learning; we have to change the way we teach children, and we have to change
the way we assess what our students know and can do.
Others think so too. The following organizations have argued against high-stakes
standardized testing and called for systems of assessment that include performance-based
assessment: The American Association of School Administrators, the American
Educational Research Association, The American Federation of Teachers, The
Association of Childhood Educator's International, The International Reading
Association, The National Association for the Education of Young Children,
The National Council of La Raza, The National Association of Secondary School
Principals, The National Council of Teachers of English, The National Education
Association, and The National Research Council, among others. More and more
educators are convinced that performance-based assessment does something
that standardized testing does not do -- they are all saying that we need
to do it.
What it might look like: Parker's
Assessment System
Our school starts with grade 7, after which students move through three
non-graded Divisions until they graduate. To move from one division to the
next and ultimately to graduate, students must complete a portfolio and
an exhibition at each stage. We call the process of moving on to the next
level a Gateway. The curriculum is organized into Math/Science/Technology
(MST), Arts/Humanities (AH), Spanish, and Wellness (P.E. and Health). Students
gateway in MST, AH, and Spanish separately.
Moving from Division 1 to Division 2, students work mostly on gathering
a portfolio of their best work and then presenting the portfolio in a celebratory
exhibition of what they've learned with their family, friends, and teachers.
The Spanish Gateway does involve some performance; students have to describe
their learning in Spanish. Moving from Division 2 to Division 3, the stakes
of the exhibition rise. In both AH and MST, students have to design and
complete an independent project of original work, in addition to presenting
a completed portfolio. The gateway exhibition is to present their project
work, demonstrating their readiness to move on to the next level. The final
Gateway is to graduate, and this step involves a juried presentation of
work completed on a Senior Project. A successful project along with a completed
Graduation Portfolio again means that the student is ready to move on. This
process that begins by looking mostly at school skills ends by looking at
the authentic, independent, original work.
For each level, we assess student work that
might go into their portfolio based on a set of standards defined by specific
criteria for excellence. The standards include Reading, Writing, Oral Presentation,
Research, Artistic Expression, Listening, Scientific Investigation, Mathematical
Problem Solving and Communication, Technology, Systems Thinking, and Spanish.
These are the skills that we want all students to be good at when they leave
our school. And we look at each piece of work they do in light of the criteria
the school community has described, indicating on rubrics whether students
are just beginning, approaching, meeting, or exceeding the expectations
for that level. The portfolios are collections of a student's work that
meets the standards and which thereby shows readiness for the next level.
Most of our assessment within courses is performance-based. With such an
emphasis on skills, we have to challenge, inspire, and test our students'
skills by asking them to use them. Those projects and assignments generate
the student work that we assess. Students are able to revise their work
almost as much as they want to improve it and to try to meet the standards.
Most students are in each Division for about 2 years. They enter just beginning
to use the skills and they leave be able to use the skills. Because this
is a system of promotion by performance, they don't move on until they've
done the work and they're ready. As our students and we have found, that
is no small task.
With our emphasis on performance and skill, we still ask our students to
learn content. However, we define essential content differently than many.
We believe strongly that "less is more"; students learn more and
they learn better when they explore limited topics in depth. So we do not
have long lists of topics that our students have to learn. Yet nearly every
skill piece requires students to know what they are talking about; they
must show understanding of the subject at hand. Indeed, they cannot show
thinking skills for example without knowing content, even though the content
is less prescribed. At present, the state frameworks for Math and Language
Arts have followed a similar approach of defining the skills while leaving
the content fairly open. However, the Science and the History/Social Studies
frameworks require specific and extensive topics to be covered. If those
frameworks remain unchanged, we might have to make significant changes to
our approach to teaching and learning.
Inside Portfolios
There are many portfolio systems out there, more today than 10 or 15 years
ago. While portfolios are not necessarily performance-based, ours have to
be in order for us to use them to see what a student can do. For us, a student's
portfolio is evidence that they have the skills we want to see for that
particular level.
What goes into a student's portfolio? While specific requirements differ
for each domain at each division, the basics are the same. We ask to see
evidence of consistent competence. That means a student has to include a
certain number of pieces of work that met the criteria for specific skills.
For example, the portfolio requirements for moving from Division 2 to Division
3 in AH include 4 pieces of writing, 2 pieces of research, 3 pieces of artistic
expression, 3 pieces of reading, and so on. The numbers reflect the number
of opportunities students have had to work on the particular skill balanced
with an effort to see consistency. Students do chose the pieces they put
in, but they are selecting only from their work that met the criteria. In
addition to the work, students write a cover letter for each portfolio,
reflecting on the work represented there.
This is a process that requires reflection and that encourages self-awareness.
It also works against the "one shot and you're done" orientation
of traditional tests. Matt Smith, a sixth-year teacher in Division 1 explains:
Helping students see their growth over time is important. A portfolio provides
students with the opportunity to look back at previous work, either to
improve it or to compare it with recent work. They must be reflective when
they decide what to include in the portfolio or what to revise to raise
of piece to portfolio standards. (September 2000)
Our students care about the work that goes into their portfolios. Students
must do high quality work for it to be portfolio "worthy," and
they have work hard to get there.
The practical aspects of this system require physical space to keep the
portfolios. We have file cabinets full of brown student folders; we keep
all portfolios until the student graduates or leaves the school. Students
need time to organize and review their portfolios. Every quarter or so,
most classes pause to work on them, either adding new pieces or counting
up the gaps that need to be worked on. And teachers need time around Gateways
to review portfolios with students and to meet with colleagues in order
to convey an official stamp of approval.
There is always a point in this process where we do a gut check with a student's
portfolio. We try to think beyond the specific requirements of the portfolio
to whether a student's skills really are ready for the next level. We check
our intuition, our experience, and what we know about the student. After
all, we know our students very well. Few teachers have assessment loads
above 60 students, and some teacher's loads are as low as 25. When we say
we know them and their work, we really do. So when it comes time to evaluate
portfolios, the group of teachers gathered checks the numbers against the
qualities we know we're looking for. Almost every time when a student's
portfolio is complete, our gut check confirms that student's readiness.
This should make sense. After all, we or some education expert or some committee
of adults writes the standards. We define and describe what the skills are
and what levels of performance are fair to expect at what points. For us,
we simply check what we know about our students with the work that they
have produced. If our assessments work well, what they produce matches the
skills and knowledge they've acquired.
Inside Exhibitions
Parker's exhibitions include the big Gateway Exhibitions when a student
completes a Division and the performance-based assessments that happen in
classrooms. Whether big or small, our exhibitions all involve some kind
of real or authentic work that students must create, they have an audience
beyond the teacher, they cast the teacher in the role of coach, and they
involve a balance of freedom and structure. Some have every student working
on the same project. Some have students working completely independently.
In every case, students have to think hard and create quality work.
Here are some examples. From Division 2 AH, students staged a trial of Emma
Goldman. Having done research on the Progressive Era, on the Women's Suffrage
Movement, and on Industrialization, students created characters and gave
performances that tested their knowledge and their ability to artistically
express themselves through drama. When studying Chinese philosophies, students
in small teams designed their own schools around the ideas of Taoism, Buddhism,
and Legalism, each with its own curriculum, discipline policies, and physical
environment. From Division 1 AH, students created a museum of exhibits on
early civilizations from around the world. From a Division 3 economics course,
students studied a current economic issue in order to understand the economics
of it and to propose a sound and viable solution to it. From Division 1
MST, students work on "Challenges of the Week" (COWS) to apply
their learning to new problems. From Gateway Projects, a student did a family
history project and created a montage of images in five frames. Another
student researched and created a resource book on vegetarianism. She presented
her findings along with a three course meal. Each of these exhibitions worked
well for the students. As important, they clearly showed teachers and the
community students' skills and knowledge.
There are several keys to making these exhibitions
work. If students don't find real meaning in the work they do, we are not
going to see interesting, thoughtful work from them. Many use the word "authentic"
to describe tasks and problems and dilemmas that genuinely spark students
in exhibitions. While some of our students might be sparked by theoretical
intellectual work, probably more will be stimulated by being asked to grapple
with phenomena that they care about, situations dealing with the real world
around them, things they can see and feel and get involved with. Often,
they will need choice about what their work is.
Related to that, students care more when they
have real audiences and when their work might have a real impact on their
community, however they define that. Our Senior Projects, which are presented
in front of a jury, proved to our principal, Gregg Sinner, that there was
something different happening in these exhibitions "To me," he
said, "a juried exhibition is far more significant than a demonstrative
one. The risk taken by the student is comparable to a graduate school qualifying
exam." Adding to that, a Division 1 AH teacher commented that, "the
Division 2 Gateway and the Senior Project Exhibitions prepare students to
communicate with a wider audience, going beyond the walls of the classroom.
This prepares them for life beyond Parker, as well as helps them connect
their learning to the world outside the school." Thus we have seen
that student work becomes powerful when there is some personal connection
and commitment to the work.
Not only must we find ways for students to find
their way into a real world problem, we need to ask them to use their brains
and all their powers to do something with that real world problem. They
need to solve a problem or answer a perplexing question or make a proposal
or solve a mystery or resolve a dilemma or argue a point or uncover something
that has not been seen before. For example, Jessica Jacobs, a first year
teacher in MST, described the work her students do on challenges of the
week:
In mathematics, I think students best show
that they know how to use their minds well when they solve a problem using
their own creative methods, as opposed to a test in which students know
exactly which method they need to use. On the Challenges of the Week in
Division I, I gave students a great deal of guidance in solving the problems.
Most students then solved the problem using the methods I showed them.
However, because C.O.W.s require more than one solution, students then
had to use a more creative method to verify that their original answer
is correct. On one C.O.W., I remember a student proving that her equations
were correct by graphing them and by writing an analysis of the ways the
equations make sense with the problem's parameters. (September 2000)
Jessica's students could not simply rely on
doing what the teacher told them to do. These seventh and eight graders
had to think of their own solutions. Not only do exhibitions ask students
to grapple in these ways, but they also ask students to research, analyze,
draw conclusions, make interpretations, design, create, and generally figure
something out. That is putting information and thinking to use.
When we send our students off into this kind
of work, it is so important that they know inside what they are being asked
to do and show. There are many, many sets of standards out there and rubrics
to reflect them. While they might all be valid, there is a stark difference
between standards that exist on paper and standards that live in students'
minds. When they have internalized the criteria, when they know what it
looks and feels like to do the kind of skills and habits that exhibitions
ask, then students have a legitimate shot at developing those skills and
habits. This internalization is more likely to happen when the criteria
are simple, clear, and when they capture core elements of the intellectual
work at hand. If students see exemplars, if they use the criteria often,
if they see them in their everyday work, and if they are assessed on those
very criteria, then the criteria matter and they begin to mean something.
This work is not fluffy. The criteria are harder than what's required
by standardized tests. The standards are higher. And at the same
time, real world criteria applied through real world tasks are more compelling
to students and therefore more possible for all to meet.
Exhibitions need to be open enough to allow
students to find their way to something that matters, to encourage them
to figure something out, and to help them demonstrate the important skills
that really matter. They also need to be structured enough to accomplish
the very same things. Implementing successful exhibitions requires some
or all of the following: student curiosity, time, choice, resources, accountability,
good coaching, guidelines, suggested steps, feedback, and flexibility. In
this balance of openness and structure, students seem to thrive.
Accountability, Reliability, and Validity
While teachers and students might feel good about these kinds of assessments,
do such alternatives stand up to the cold, hard measures of accountability,
reliability, and validity? Does Parker's system hold up under this kind
of analysis?
George F Madaus and Laura M O'Dwyer, researchers at Boston College, describe
why many people say large scale performance-based assessments don't measure
up:
From a purely technical/efficiency point of view, we have evidence that
performance assessments in high-stakes programs that generally involve
large numbers of pupils 1) are less efficient, more difficult to administer,
more disruptive to school organization, and more time-consuming than multiple-choice
testing programs; 2) are not as easily standardized in terms of conditions
of support for teachers within a school administering them and in terms
of the actual administration itself, leading to a lack of comparability
of results; 3) are as vulnerable to manipulation as are multiple-choice
tests in high-stakes situations; 4) are sometimes insufficiently reliable
for confident use; 5) sample a considerably smaller portion of pupil performance,
raising questions about the generalizability of results to the larger domain
of interest; and 6) are considerably more costly than traditional, commercially
available multiple-choice tests. (1999)
What we do at Parker is still small scale when compared to statewide or
nationwide systems, but these criticisms still matter. Do our practices
hold our students accountable? Are the results reliable and valid?
Parker's practices are definitely valid and mostly reliable if reliability
means that an assessment is free of measurement errors and if validity means
that it is appropriately matched to what it was intended to assess. Dietel,
Herman, and Knuth of the North Central Regional Educational Laboratory define
reliability as "the extent to which a test is dependable, stable, and
consistent when administered to the same individuals on different occasions"
(1991). While our students might not get the exact same assignment multiple
times, they are called upon to use the same skills in many different situations.
Performance assessment seems reliable, therefore, if a student consistently
demonstrates that they can read and write and think. That's why our portfolios
require multiple pieces of work to show the same skill and that's why we
emphasize skills as habits. Once they are able to demonstrate those habits,
it seems safe to say that they can demonstrate them again and again, even
if they don't always choose to. Parker's assessment system, then, is reliable
because the standards and the criteria for excellence are "dependable,
stable, and consistent."
The same authors define validity as "the extent to which a test measures
what it was intended to measure. Validity indicates the degree of accuracy
of either predictions or inferences based upon a test score." (1991)
Here performance assessment seems to be considerably more valid than many
multiple choice tests because it asks for exactly the skills that a student
needs to demonstrate. They call for real world skills and they require real
world skills to complete the task. As Grant Wiggins argues, "test validity
should depend in part upon whether the test simulates real-world "tests"
of ability. Validity on most multiple-choice tests is determined merely
by matching items to the curriculum content (or through sophisticated correlations
with other test results" (1990). And so by using exhibitions and portfolios
that lay out exactly the task at hand, the validity is built in.
If accountability means setting standards for student performance and holding
them to it, then our students are accountable. We tell them exactly what
"the test" is and what we hope they can show on it. That is exactly
as it should be. What kind of accountability is when the test is secret?
Again, Wiggins explains that such secrecy is completely misplaced:
The best tests always teach students and teachers alike the kind of work
that most matters; they are enabling and forward-looking, not just reflective
of prior teaching. In many colleges and all professional settings the essential
challenges are known in advance--the upcoming report, recital, Board presentation,
legal case, book to write, etc. Traditional tests, by requiring complete
secrecy for their validity, make it difficult for teachers and students
to rehearse and gain the confidence that comes from knowing their performance
obligations. (A known challenge also makes it possible to hold all students
to higher standards). (1990)
There are no tricks here. There's no secrecy. Students know what is being
asked of them. We simply have to look to see whether they can do it or not.
At Parker, we have tried to strengthen our accountability by having outside
assessors evaluate our portfolios and exhibitions. We regularly ask experienced
educators to visit and examine student work. We have also conducted inter-rater
agreement studies to make sure we are setting performance standards in consistent
ways. And our school gets inspected by the State and by accrediting committees
to check whether our practices follow our principles. We now have several
years of reports that essentially confirm we are doing what we say we're
doing.
Nonetheless, we have experienced the kind of difficulties described by Joan
Herman in her review of where performance-based assessment is today:
Research suggests that in the early years of a new assessment, achieving
reliable scoring can be a challenge because it occurs while the processes
of achieving consensus on standards and expectations and of encouraging
ownership and support for the new assessment are also incomplete. The technical
process of reaching high levels of agreement, however, is fairly well understood.
It requires good training and scoring procedures, including well-documented
scoring rubrics exemplified by benchmark or anchor papers; ample opportunities
for scorers to discuss and practice applying the rubric to student responses;
and systematic checks before and during the scoring itself to ensure that
evaluators are consistent, with retaining as necessary. (1998)
We know that what she says is correct. And we do work hard to get to the
kind of consensus on standards she describes using the practices she lays
out. Still, our efforts always could be more frequent and more precise.
Every institution needs to reflect on its practices and Parker is ripe for
a review of our assessment system. We've been running for five years now,
and it is probably time to evaluate what's been working and what hasn't.
The following set of criteria questions were developed by the National Center
for Research on Evaluation, Standards and Student Testing at UCLA to help
educators evaluate performance assessments:
Consequences . To what extent do the assessments
model and encourage good teaching practice? Are intended positive consequences
achieved? What are the unintended negative consequences?
Alignment . Does the assessment reflect content
and performance standards that have been established for students? Does
the assessment measure important curriculum goals?
Fairness . Does the assessment enable students,
regardless of race, ethnicity, gender or economic status, to show what
they know and can do? Have students had the opportunity to learn what's
being assessed?
Transfer and Generalizability . Will the results
of an assessment provide accurate generalization about student achievement?
Content Quality . Is the assessment content
consistent with the best current understanding of the subject matter? Does
it reflect the enduring themes and/or priority principles, concepts and
topics of the discipline?
Cognitive Complexity . Does the assessment
require students to use complex thinking and problem solving?
Content Coverage . To what extent does the
assessment cover the key elements of content standards and/or curriculum?
Linguistic Appropriateness . Does the assessment
allow students to display what they know and are able to do without being
swamped by language demands not required by the content?
Meaningfulness. Do students find the
assessment tasks realistic and worthwhile?
Practicality and Cost . Is the information
about students worth the cost and time to obtain it (What Constitutes a
Quality Assessment 1998)
So how would Parker come out if these were the criteria? We are probably
strongest in the areas of consequences, cognitive complexity, and meaningfulness.
Again and again, we ask our students to think in meaningful ways. Almost
all of our teaching goes to that aim. If we have weaknesses, they are in
the area of content coverage. We simply cover less ground than many want
us to. But the ground we cover, our students and we cover with intellectual
rigor and depth.
So Does it Work?
Performance-based assessment, portfolios, and exhibitions definitely work
at the classroom and school level. How can we prove it? Our test scores
are good -- so is that enough? I think we prove that performance-based assessment
works by the engagement and learning of our students at Parker on an almost
daily basis. That can be seen through students' own testimony and it has
been confirmed by external evaluations.
If one believes that they can be effective measures of what students know
and can do, Parker's test scores do show that our students can read and
write and compute and think. In a renewal inspection report completed in
February, 1999, the inspection team reported this success: "Student
achievement on external academic assessments -- Stanford 9, MCAS, and the
PSAT has been consistently strong. Standardized test scores at Francis Parker
are as strong as, and in some cases somewhat stronger than, those of the
many districts who send students to the school" (p. 3, Renewal Inspection
Report). Interestingly, Parker's annual reports, an accountability requirement
for charter schools, all point out that the school's "emphases do not
align with the subject and style of standardized tests." So Parker
students have achieved their scores without the school's specific attention
to tests.
More important to us, outside evaluators have looked beyond the numbers
to the quality of student work produced and to the quality of the learning
experiences Parker has achieved. These inspectors, visiting committee members,
and portfolio reviewers came to the school, looked at student work, talked
to as many people as they could, and described what they saw and heard.
While they have reported thoughtful and constructive concerns, their reviews
have been overwhelmingly positive in the first five years of the school,
especially in the area of assessment.
In the charter renewal inspection report, required by the State and conducted
by Schoolworks, an independent consulting firm, inspectors found that our
assessment system clearly and effectively helped students rise to high standards.
Here are some excerpts:
Ongoing classroom assessments and formal Gateway
assessments indicate that most students are developing sound "habits
of mind" and mastering "essential skills and areas of knowledge"
as defined by the Parker School Criteria for Excellence . . .. Assessment
using these criteria is frequent within a unit and both comprehensive and
rigorous at a Gateway. (p. 3)
The Criteria for Excellence are clearly and
coherently organized and state, so much so that students themselves articulate
academic expectations, and often discuss and adapt them to particular assignments,
freely and frequently. (p. 10)
Parker measures academic progress through its
elaborate system of rubrics and benchmarks toward criteria in each division.
Every assignment is disseminated using these standards and related achievement
criteria and subsequently returned to students with clear feedback against
the standards, in both sequential and narrative terms. The sheer volume
and detail of assessment provides students with an almost momentary sense
of the progress being made through the curriculum toward the ultimate Coalition
goal of "the student-as-worker," principle number five, which
ultimately "provokes students to learn how to learn and thus teach
themselves. (p. 13)
Every assessment (in Arts and Humanities) takes
full advantage of the student's individual interests and creative potential.
Instructions for successful completion of the project and descriptions
of model work are clear and copious, suggesting timetables for completion
but allowing individual students and student groups to organize study and
investigation in the most individually appropriate ways. Of particular
not is the interdisciplinary thinking and research required of students
within the domain. Content and skills in literature, art, and history merge
seamlessly throughout the unit and in the culminating activities."
(p. 19)
Such comments match some of our highest goals
for assessment, and this independent group of external inspectors said we
were achieving them.
Noted educators have also described what we consider to be accomplishments.
Visiting Committee member Joseph McDonald, Professor of Education at NYU
and an authority on assessment, wrote after his second visit in 1998:
The student work I read and observed was very
good. In particular, the exhibitions were impressive -- signs of the strong
intellectual culture that has been established in the school. But the portfolios
of the students who had succeeded in passing from Division 1 to Division
2 were also impressive: good writing, good thinking, intellectually substantive
content." (p. 100, 1997-98 Parker Annual Report)
Vito Perrone, Professor of Education at the
Harvard Graduate School of Education, wrote the following in June 1998:
I was particularly pleased to see the various
"standards" posted in all of the rooms along with displays of
student work. The latter were displayed well, good demonstrations of what
I call 'challenging work,' essentially responses to powerful questions,
having within them room for invention, carried out in diverse ways, ending
in products that show care (something honored) and from which others can
learn." (p. 102, 1997-98 Parker Annual Report)
Thus, outside educators, experts, and state
investigators report that Parker students have been doing quality work at
high levels of achievement. What they've observed has been the result of
an assessment system that asks students to use their minds well and to demonstrate
their learning through exhibitions.
Still even more important, our students know that they are learning in powerful
ways. When an eighth grade student named Chris gatewayed from Division 1
AH to Division 2 AH, he described learning that never would have happened
if we didn't use performance-based assessment:
By far, oral presentation was my most improved
skill category. I have always been shy, so naturally I was a terrible speaker.
My first oral presentation at Parker was nothing like I had ever done before.
The speeches were more natural, without strict guidelines like you have
to use note cards, your cards should have the exact words that you are
going to say, and the speech must be at least 5 minutes and 30 seconds.
I like the "take your time" approach of my teacher. I was allowed
to start over when I screwed up and not to get stressed out, which helped
me to improve. Over time, I became more comfortable speaking and eventually
became a good speaker. I went from so-so performances like the Apartheid
Rally last year, to excellent pieces like the Constitutional Convention
Simulation debate. (p. 123, 1997-98 Parker Annual Report)
Was it too soft to allow this 13-year-old to
start again? Does it matter whether he thinks he did an excellent job? I
would argue that this student reached an amazing level of self-awareness
for a middle school student and developed confidence in an incredibly important
skill.
It seems that performance assessment also helps students find real meaning
in their work. Alexis, a ninth-grade student at the time she gatewayed into
Division 3 MST, described why she worked so hard to prepare her portfolio:
Most importantly, the drive behind these ambitions
and efforts was not a grade on the assessment or a check mark on the rubrics
of excellence. The work was done in an effort to dedicate myself to something
and try to understand it. It was done in order to become a better student
through solid research and writing and learning to understand hardcore
scientific processes, like nuclear fusion in my universe project."
(p. 130, 1997-98 Parker Annual Report)
In a conversation with two seniors, Jenny and
Colin (September 2000), they described how different it feels to work and
learn for one's own personal goals instead of for meaningless external rewards.
Colin, who went back to a more traditional high school for only a few weeks,
said "it felt like the other school put tests in my way just as obstacles
and not to help me learn. When I came back to Parker, I felt an overwhelming
rush that my work would be for my own accomplishment." Jenny knew that
she could have been doing very well at a traditional high school that gave
tests and grades: "I could easily memorize whatever I need for a test.
But then I'd forget it. Exhibitions actually measure something more important,
because I really want to know how to do the things I'll need in life."
Colin added, "We develop more the skill of how to learn than just learning
information. Put me on Jeopardy and I'd get schooled. But put me in real
life where things are changing and where I need to be able to deal with
those changes and I'll never give that up." I respect these students
more than I can say. Their words are a part of our evidence.
Tenth-grader Annabel drove home the message we hear in many of these student
testimonials in a letter she wrote to the editor of The Boston Globe:
As opposed to focusing on dry, meaningless
tests and rote-learning, my school focuses on learning through experience,
compiling portfolios of meaningful work, and many major interdisciplinary
projects. Instead of perfunctorily covering lots of topics without any
real understanding, we go in-depth with a few important topics, so that
we have a real understanding of issues that are relevant to the society
in which we live. The test doesn't take any of this into account.
Are these students well-coached? Of course. But how they feel about their
own learning gives powerful testimony for what a performance-based assessment
system can achieve, especially when their reflection is backed up by test
scores and the reports of independent outside experts.
Can it work for all?
Parker's experience proves that it does work for us in our classrooms and
in our school as a whole. Can performance-based assessment or portfolios
work on a larger scale? Could it be a better MCAS? There have been a few
attempts at large scale performance assessment systems. While no research
has declared them an unmitigated success, they provide several lessons.
One attempt to implement large-scale performance-based assessment and portfolios
was the New Standards Project directed by Lauren Resnick and Marc Tucker
in the early 1990s. This project was an effort explicitly aimed at "creating
tests worth taking." Elizabeth Spalding, who helped work on this project,
describes:
NSP sponsored hundreds of meetings involving
thousands of educators in the crusade to build assessments toward which
teachers would want to teach. Thousands of students in grades 4,8, and
10 completed New Standards performance assessment tasks and compiled portfolios
to demonstrate their ability to meet performance standards that were being
developed simultaneously. (2000)
The results of this project were mixed. According to Edward Haertel, these
assessments had all the reliability, generalizability, and non-standardized
scoring problems found in other large-scale performance assessments. And
yet according to Spalding, they effectively reflected constructivist views
of knowledge, thinking, and learning, they positively influenced curriculum
and instruction, and they provided more meaningful information about student
performance.
Vermont and Kentucky have implemented statewide
portfolio systems in the 1990s, another significant attempt at implementing
performance-based assessment on a large scale. While research results from
both states have been mixed, they have shown that -- again -- such a system
significantly improved teaching and learning in classrooms. According to
an article in Improving America's School: A Newsletter on Issues in School
Reform, "preliminary observations of classroom instruction in Kentucky
and Vermont, two states with portfolio assessment, indicate that teachers
spend more time training students to think critically and solve complex
problems than they did previously." The authors go on to report:
After studying Vermont's portfolio assessment
program during the first two years of its implementation the RAND Corporation
concluded that the effects of portfolio assessment on instruction were
"substantial and positive." Half the teachers surveyed by RAND
reported an increase in the time students spent working in pairs or small
groups. Almost three-fourths of the principals interviewed said the program
produced positive changes in instruction at their schools. Between 70 percent
and 89 percent of the math teachers reported more discussion of math, explanation
of solutions, and writing about math in their classrooms since the advent
of the portfolios; three-fourths reported having students spend more time
applying math knowledge to new situations, and roughly 70 percent reported
devoting more class time to writing math reports. Principals in half the
sample schools reported expanding portfolio assessments to other grade
levels, an indication that they approved of portfolios' effect on instructional
practice in their schools. (1996)
These are encouraging results in many respects. The authors note, however,
that other states have not had success with such large-scale attempts and
several have abandoned the effort.
The issue is not simply whether one form of
assessment works and another does not. What is becoming clear is that different
assessments show different information and they should be used accordingly.
At this point, large-scale performance assessment has not overcome several
serious shortcomings, though that does not mean the efforts to implement
it successfully should be abandoned. William L. Sanders and Sandra P. Horn
of the University of Tennessee write:
Non-standardized alternative assessment has yet to demonstrate the ability
to provide generalizable information for comparison purposes over time
on a large-scale basis without proving more costly in time and resources
than standardized testing and without itself falling prey to the "teaching
to the test" syndrome so often cited as a major deleterious result
of standardized testing. The strengths of alternative assessment lie in
its ability to individualize assessment, to mimic good teaching practices,
and to involve teachers more deeply in the assessment process. Currently,
attempts are being made to improve the generalizability and reliability
of alternative assessments in order to use them for the evaluation of school
and school system efficacy.
They go on to say that:
The issue is not whether one form of assessment
is intrinsically better than another. No assessment model is suited for
every purpose. The real issue is choosing appropriately among indicator
variables and applying the most suitable model to render them. It is necessary
to determine what information is sufficient to each purpose before deciding
upon the form of assessment to be used. When a variety of valid and reliable
assessment methods exist, it is parochial and ineffectual to adhere to
only one, asserting that it is in all instances superior. (1995)
One size does not fit all. We need such a variety of assessments in our
schools. Parker's assessment system can coexist with MCAS, but only if MCAS
does not become so dominating and so prescriptive as to crush all other
means of seeing what our students know and can do.
Conclusions
Performance-based assessment has been successful
at Parker for several reasons. One is the very thoughtful definition of
criteria for excellence. When the criteria are clear, teachers and students
and parents all know what we're working towards. Two is that we have a shared
vision for our school and for the way that teaching and learning should
happen. Thus there is mostly agreement on how assessment should work.
Parker's assessment system does hold students
accountable for their learning. There is coherence between the skills they
have to demonstrate and the specific performance criteria we have set for
these skills. Their work that reveals what they know and can do goes into
a portfolio and the portfolio qualifies a student for promotion and eventually
graduation. Such coherence is then bolstered and brought alive by exhibitions
where we really see what our students can do. Because of this experience,
we conclude that performance assessment leads to more thoughtful, more compelling,
and ultimately more skilled student work.
The larger system must not squash this kind
of work. Indeed, the 1993 Reform Act requires us in the State of Massachusetts
to use assessment practices that are more than MCAS. All of us know that
there are many challenges ahead before performance-based assessment becomes
a widespread practice. This is work that we have to do -- first school by
school, but inevitably for the wider system as well.
Bibliography
Annual Report of the Francis W. Parker Charter Essential School.
Devens, MA. 1996-97, 1997-98, 1998-99.
Dietel, R.J., Herman, J.L. and Knuth, R.A..
"What Does Research Say About Assessment?" North Central Regional
Educational Laboratory: Oak Brook, 1991. http://www.ncrel.org/sdrs/areas/stw_esys/4assess.htm
Herman, Joan L. "The State of Performance
Assessments." School Administrator, December, 1998. http://www.aasa.org/SA/dec9804.htm
___________"What Constitutes a Quality
Assessment?" School Administrator, December 1998. http://www.aasa.org/SA/dec9805.htm
Horn Sandra P. and Sanders William L. "Educational Assessment Reassessed."
Educational Policy Analysis, Number 3 Volume 6. March 3, 1995.
Madaus, George F and O'Dwyer, Laura M. "A Short History of Performance
Assessment." Phi Delta Kappan; May 01, 1999.
SchoolWorks, "Renewal Inspection Report." Massachusetts Department
of Education, February, 1999.
Spalding, Elizabeth. "Performance Assessment and the New Standards
Project: A Story of Serendipitous Success." Phi Delta Kappan:
V. 81 N. 10, pp. 758-764. June 2000.
"What Constitutes a Quality Assessment?" The School Administrator
Web Edition (December 1998). Retreived August 25, 2000 from American
Association of School Administrators web site: http://www.aasa.org/publications/sa/1998_12/herman_side_resource.
html
"What the Research Says About Student Assessment." Improving America's School: A Newsletter on Issues in
School Reform - Spring1996 . http://www.ed.gov/pubs/IASA/newsletters/assess/pt4.html
Wiggins, Grant. "The Case for Authentic
Assessment." Practical Assessment,
Research & Evaluation , 2(2). 1990.
__________ Educative Assessment: Designing
Assessments to Inform and Improve Student Performance. Jossey-Bass Publishers:
San Francisco, 1998.
About the Author
Peter Garbus has been teaching high school History, English, and Humanities
for eleven years. A graduate of Brown University and the Harvard Graduate
School of Education, he received National Board Certification for History/Social
Science in 1999.
|