Fellowship Paper


 

 

More Than MCAS:

Performance-Based Assessment in the Days of Standardized Tests


by

Peter Garbus

Francis W. Parker Charter Essential School


for the

Massachusetts Charter School Fellowship Program

2000



It might feel as if MCAS is the first and last word on assessment in the state of Massachusetts. However, there are more voices out here, and we say that our schools need performance-based, authentic forms of assessment, too.


At the Francis W. Parker Charter Essential School, we assess our students using portfolios, exhibitions, performance-based assessments, and sometimes tests. The state granted our school a charter to create a school where students learn to use their minds well and where they demonstrate such learning through public exhibitions. We believe that students will only learn to use their minds well if they are asked to do so, and for us that requires performance-based assessment. Ironically, our students have done very well on the English/Language Arts and Mathematics sections of the MCAS for the past two years without much emphasis from the school. For us to say that we need more than MCAS is not sour grapes. We feel strongly that MCAS might very well do damage to the good work that is going on here.


Yet the State and others have some important questions for us. How reliable and valid are these exhibitions as a form of assessment? How can we show that performance-based assessment works in ways that serve the students in our schools and also serves the State's need for some kind of accountability?


At Parker, we know that performance-based assessment works for our students and our school. It just might work for your school. It just might work for the State.


Performance-based Assessment: What it is and Why do it


Let me begin with some definitions. R.J. Dietel, J.L. Herman, and R.A. Knuth of the North Central Regional Educational Laboratory define performance assessment as:


an evaluation in which students are asked to engage in a complex task, often involving the creation of a product. Student performance is rated based on the process the student engages in and/or based on the product of his/her task. Many performance assessments emulate actual workplace activities or real-life skill applications that require higher order processing skills. (Dietel, Herman, Knuth, 1991)


Joan Herman, the associate director of the National Center for Research on Evaluation, Standards and Student Testing at UCLA Graduate School of Education, defines it this way:


The essence of performance assessments--whether in the form of open-ended questions, essays, experiments or portfolios--is that they ask students to create something of meaning. A good performance assessment taps complex thinking and/or problem-solving, addresses important disciplinary content, invokes authentic or real-world applications and uses tasks that are instructionally meaningful. Furthermore, because they require students to construct a unique answer, performance assessments typically are scored by humans, exercising judgment, rather than by machines. (Herman, 1998)


With performance-based assessment, students engage, create, emulate real world tasks, solve problems, construct unique answers, and they think.


This kind of assessment is not new. Providing us some historical context, George Madaus and Laura O'Dwyer of Boston College's Department of Education went all the way back to the Han Dynasty in China to identify essentially four ways to assess what a person knows. They describe those four ways to assess as follows:


First, you can ask the person to supply an oral or written answer to a series of questions (e.g., essay questions, short-answer questions, oral disputations). Second, you can ask a person to produce a product (e.g., a portfolio of artwork, a research paper, a chair, a piece of cut glass). Third, you can require the person to perform an act to be evaluated against certain criteria (e.g., conduct a chemistry experiment, read aloud from a book, repair a carburetor, drive a car). Finally, and historically the most recent, you can have an examinee select an answer to a question or a posed problem from among several options (i.e., the multiple-choice or true/false item). We define performance assessment broadly: performance assessment requires examinees to construct/supply answers, perform, or produce something for evaluation. Thus performance assessment embraces the first three ways of assessing and excludes the fourth. (Madaus and O'Dwyer, 1999)


In those first three methods, a student would have to create or construct or do something in order to show their ability to create, construct, and do. That's exactly what the craft guilds needed to produce master craftsmen who could skillfully make the things they were supposed to make. The fourth practice, a method of accountability emphasizing efficiency, uniformity, scientific reliability, and norm-referenced comparability, came about with the rise of industrialization and has dominated education ever since. Human beings as segments of the assembly line. So what does our world need today, cogs in the machine or master craftsmen?

Our world is changing and we need now more than ever to help our students develop essential skills and habits of mind. As technology speeds and widens our access to information and as the sheer amount of information grows exponentially, we will need fewer memorizers and more thinkers. We will need creative, resourceful problem-solvers that can identify an issue, research it effectively, and sift through the piles of information for insight and ideas. It matters less whether educated people know thousands of bits of discrete information and matters more whether they can analyze and synthesize the information they find. While it may be more challenging and more expensive to design large scale assessment systems around performance, is there really any choice not to when standardized tests don't come close to measuring whether students can think?

We also need to match our practices to what recent science tells us about how people learn. Dietel, Herman, and Knuth summarize these findings:


From today's cognitive perspective, meaningful learning is reflective, constructive, and self-regulated. People are seen not as mere recorders of factual information but as creators of their own unique knowledge structures. To know something is not just to have received information but to have interpreted it and related it to other knowledge one already has. In addition, we now recognize the importance of knowing not just how to perform, but also when to perform and how to adapt that performance to new situations. Thus, the presence or absence of discrete bits of information-which is typically the focus of traditional multiple-choice tests-is not of primary importance in the assessment of meaningful learning. Rather, what is important is how and whether students organize, structure, and use that information in context to solve complex problems. (Dietel, Herman, Knuth, 1991)


Information in use, students constructing their own knowledge -- these are the hallmarks of meaningful learning.


Not only does this learning theory suggest new goals for assessment, but it also tells us that it is extremely difficult for students even to learn "the basics" because of state requirements that emphasize discreet facts over performed skills. The same authors explain:


Learning isolated facts and skills is more difficult without meaningful ways to organize the information and make it easy to remember. Also, applying those skills later to solve real-world problems becomes a separate and more difficult task. Because some students have had such trouble mastering decontextualized "basics," they are rarely given the opportunity to use and develop higher-order thinking skills. (Dietel, Herman, and Knuth, 1991)


Indeed, students are more likely to learn the basics when their learning is situated in a real world context and when the assessment is performance-based. In light of this current research, we have to change the way we think about learning; we have to change the way we teach children, and we have to change the way we assess what our students know and can do.


Others think so too. The following organizations have argued against high-stakes standardized testing and called for systems of assessment that include performance-based assessment: The American Association of School Administrators, the American Educational Research Association, The American Federation of Teachers, The Association of Childhood Educator's International, The International Reading Association, The National Association for the Education of Young Children, The National Council of La Raza, The National Association of Secondary School Principals, The National Council of Teachers of English, The National Education Association, and The National Research Council, among others. More and more educators are convinced that performance-based assessment does something that standardized testing does not do -- they are all saying that we need to do it.

What it might look like: Parker's Assessment System


Our school starts with grade 7, after which students move through three non-graded Divisions until they graduate. To move from one division to the next and ultimately to graduate, students must complete a portfolio and an exhibition at each stage. We call the process of moving on to the next level a Gateway. The curriculum is organized into Math/Science/Technology (MST), Arts/Humanities (AH), Spanish, and Wellness (P.E. and Health). Students gateway in MST, AH, and Spanish separately.


Moving from Division 1 to Division 2, students work mostly on gathering a portfolio of their best work and then presenting the portfolio in a celebratory exhibition of what they've learned with their family, friends, and teachers. The Spanish Gateway does involve some performance; students have to describe their learning in Spanish. Moving from Division 2 to Division 3, the stakes of the exhibition rise. In both AH and MST, students have to design and complete an independent project of original work, in addition to presenting a completed portfolio. The gateway exhibition is to present their project work, demonstrating their readiness to move on to the next level. The final Gateway is to graduate, and this step involves a juried presentation of work completed on a Senior Project. A successful project along with a completed Graduation Portfolio again means that the student is ready to move on. This process that begins by looking mostly at school skills ends by looking at the authentic, independent, original work.

For each level, we assess student work that might go into their portfolio based on a set of standards defined by specific criteria for excellence. The standards include Reading, Writing, Oral Presentation, Research, Artistic Expression, Listening, Scientific Investigation, Mathematical Problem Solving and Communication, Technology, Systems Thinking, and Spanish. These are the skills that we want all students to be good at when they leave our school. And we look at each piece of work they do in light of the criteria the school community has described, indicating on rubrics whether students are just beginning, approaching, meeting, or exceeding the expectations for that level. The portfolios are collections of a student's work that meets the standards and which thereby shows readiness for the next level.


Most of our assessment within courses is performance-based. With such an emphasis on skills, we have to challenge, inspire, and test our students' skills by asking them to use them. Those projects and assignments generate the student work that we assess. Students are able to revise their work almost as much as they want to improve it and to try to meet the standards. Most students are in each Division for about 2 years. They enter just beginning to use the skills and they leave be able to use the skills. Because this is a system of promotion by performance, they don't move on until they've done the work and they're ready. As our students and we have found, that is no small task.


With our emphasis on performance and skill, we still ask our students to learn content. However, we define essential content differently than many. We believe strongly that "less is more"; students learn more and they learn better when they explore limited topics in depth. So we do not have long lists of topics that our students have to learn. Yet nearly every skill piece requires students to know what they are talking about; they must show understanding of the subject at hand. Indeed, they cannot show thinking skills for example without knowing content, even though the content is less prescribed. At present, the state frameworks for Math and Language Arts have followed a similar approach of defining the skills while leaving the content fairly open. However, the Science and the History/Social Studies frameworks require specific and extensive topics to be covered. If those frameworks remain unchanged, we might have to make significant changes to our approach to teaching and learning.

Inside Portfolios


There are many portfolio systems out there, more today than 10 or 15 years ago. While portfolios are not necessarily performance-based, ours have to be in order for us to use them to see what a student can do. For us, a student's portfolio is evidence that they have the skills we want to see for that particular level.


What goes into a student's portfolio? While specific requirements differ for each domain at each division, the basics are the same. We ask to see evidence of consistent competence. That means a student has to include a certain number of pieces of work that met the criteria for specific skills. For example, the portfolio requirements for moving from Division 2 to Division 3 in AH include 4 pieces of writing, 2 pieces of research, 3 pieces of artistic expression, 3 pieces of reading, and so on. The numbers reflect the number of opportunities students have had to work on the particular skill balanced with an effort to see consistency. Students do chose the pieces they put in, but they are selecting only from their work that met the criteria. In addition to the work, students write a cover letter for each portfolio, reflecting on the work represented there.


This is a process that requires reflection and that encourages self-awareness. It also works against the "one shot and you're done" orientation of traditional tests. Matt Smith, a sixth-year teacher in Division 1 explains:


Helping students see their growth over time is important. A portfolio provides students with the opportunity to look back at previous work, either to improve it or to compare it with recent work. They must be reflective when they decide what to include in the portfolio or what to revise to raise of piece to portfolio standards. (September 2000)


Our students care about the work that goes into their portfolios. Students must do high quality work for it to be portfolio "worthy," and they have work hard to get there.


The practical aspects of this system require physical space to keep the portfolios. We have file cabinets full of brown student folders; we keep all portfolios until the student graduates or leaves the school. Students need time to organize and review their portfolios. Every quarter or so, most classes pause to work on them, either adding new pieces or counting up the gaps that need to be worked on. And teachers need time around Gateways to review portfolios with students and to meet with colleagues in order to convey an official stamp of approval.


There is always a point in this process where we do a gut check with a student's portfolio. We try to think beyond the specific requirements of the portfolio to whether a student's skills really are ready for the next level. We check our intuition, our experience, and what we know about the student. After all, we know our students very well. Few teachers have assessment loads above 60 students, and some teacher's loads are as low as 25. When we say we know them and their work, we really do. So when it comes time to evaluate portfolios, the group of teachers gathered checks the numbers against the qualities we know we're looking for. Almost every time when a student's portfolio is complete, our gut check confirms that student's readiness. This should make sense. After all, we or some education expert or some committee of adults writes the standards. We define and describe what the skills are and what levels of performance are fair to expect at what points. For us, we simply check what we know about our students with the work that they have produced. If our assessments work well, what they produce matches the skills and knowledge they've acquired.


Inside Exhibitions


Parker's exhibitions include the big Gateway Exhibitions when a student completes a Division and the performance-based assessments that happen in classrooms. Whether big or small, our exhibitions all involve some kind of real or authentic work that students must create, they have an audience beyond the teacher, they cast the teacher in the role of coach, and they involve a balance of freedom and structure. Some have every student working on the same project. Some have students working completely independently. In every case, students have to think hard and create quality work.


Here are some examples. From Division 2 AH, students staged a trial of Emma Goldman. Having done research on the Progressive Era, on the Women's Suffrage Movement, and on Industrialization, students created characters and gave performances that tested their knowledge and their ability to artistically express themselves through drama. When studying Chinese philosophies, students in small teams designed their own schools around the ideas of Taoism, Buddhism, and Legalism, each with its own curriculum, discipline policies, and physical environment. From Division 1 AH, students created a museum of exhibits on early civilizations from around the world. From a Division 3 economics course, students studied a current economic issue in order to understand the economics of it and to propose a sound and viable solution to it. From Division 1 MST, students work on "Challenges of the Week" (COWS) to apply their learning to new problems. From Gateway Projects, a student did a family history project and created a montage of images in five frames. Another student researched and created a resource book on vegetarianism. She presented her findings along with a three course meal. Each of these exhibitions worked well for the students. As important, they clearly showed teachers and the community students' skills and knowledge.

There are several keys to making these exhibitions work. If students don't find real meaning in the work they do, we are not going to see interesting, thoughtful work from them. Many use the word "authentic" to describe tasks and problems and dilemmas that genuinely spark students in exhibitions. While some of our students might be sparked by theoretical intellectual work, probably more will be stimulated by being asked to grapple with phenomena that they care about, situations dealing with the real world around them, things they can see and feel and get involved with. Often, they will need choice about what their work is.

Related to that, students care more when they have real audiences and when their work might have a real impact on their community, however they define that. Our Senior Projects, which are presented in front of a jury, proved to our principal, Gregg Sinner, that there was something different happening in these exhibitions "To me," he said, "a juried exhibition is far more significant than a demonstrative one. The risk taken by the student is comparable to a graduate school qualifying exam." Adding to that, a Division 1 AH teacher commented that, "the Division 2 Gateway and the Senior Project Exhibitions prepare students to communicate with a wider audience, going beyond the walls of the classroom. This prepares them for life beyond Parker, as well as helps them connect their learning to the world outside the school." Thus we have seen that student work becomes powerful when there is some personal connection and commitment to the work.

Not only must we find ways for students to find their way into a real world problem, we need to ask them to use their brains and all their powers to do something with that real world problem. They need to solve a problem or answer a perplexing question or make a proposal or solve a mystery or resolve a dilemma or argue a point or uncover something that has not been seen before. For example, Jessica Jacobs, a first year teacher in MST, described the work her students do on challenges of the week:

In mathematics, I think students best show that they know how to use their minds well when they solve a problem using their own creative methods, as opposed to a test in which students know exactly which method they need to use. On the Challenges of the Week in Division I, I gave students a great deal of guidance in solving the problems. Most students then solved the problem using the methods I showed them. However, because C.O.W.s require more than one solution, students then had to use a more creative method to verify that their original answer is correct. On one C.O.W., I remember a student proving that her equations were correct by graphing them and by writing an analysis of the ways the equations make sense with the problem's parameters. (September 2000)

Jessica's students could not simply rely on doing what the teacher told them to do. These seventh and eight graders had to think of their own solutions. Not only do exhibitions ask students to grapple in these ways, but they also ask students to research, analyze, draw conclusions, make interpretations, design, create, and generally figure something out. That is putting information and thinking to use.

When we send our students off into this kind of work, it is so important that they know inside what they are being asked to do and show. There are many, many sets of standards out there and rubrics to reflect them. While they might all be valid, there is a stark difference between standards that exist on paper and standards that live in students' minds. When they have internalized the criteria, when they know what it looks and feels like to do the kind of skills and habits that exhibitions ask, then students have a legitimate shot at developing those skills and habits. This internalization is more likely to happen when the criteria are simple, clear, and when they capture core elements of the intellectual work at hand. If students see exemplars, if they use the criteria often, if they see them in their everyday work, and if they are assessed on those very criteria, then the criteria matter and they begin to mean something. This work is not fluffy. The criteria are harder than what's required by standardized tests. The standards are higher. And at the same time, real world criteria applied through real world tasks are more compelling to students and therefore more possible for all to meet.

Exhibitions need to be open enough to allow students to find their way to something that matters, to encourage them to figure something out, and to help them demonstrate the important skills that really matter. They also need to be structured enough to accomplish the very same things. Implementing successful exhibitions requires some or all of the following: student curiosity, time, choice, resources, accountability, good coaching, guidelines, suggested steps, feedback, and flexibility. In this balance of openness and structure, students seem to thrive.


Accountability, Reliability, and Validity


While teachers and students might feel good about these kinds of assessments, do such alternatives stand up to the cold, hard measures of accountability, reliability, and validity? Does Parker's system hold up under this kind of analysis?


George F Madaus and Laura M O'Dwyer, researchers at Boston College, describe why many people say large scale performance-based assessments don't measure up:


From a purely technical/efficiency point of view, we have evidence that performance assessments in high-stakes programs that generally involve large numbers of pupils 1) are less efficient, more difficult to administer, more disruptive to school organization, and more time-consuming than multiple-choice testing programs; 2) are not as easily standardized in terms of conditions of support for teachers within a school administering them and in terms of the actual administration itself, leading to a lack of comparability of results; 3) are as vulnerable to manipulation as are multiple-choice tests in high-stakes situations; 4) are sometimes insufficiently reliable for confident use; 5) sample a considerably smaller portion of pupil performance, raising questions about the generalizability of results to the larger domain of interest; and 6) are considerably more costly than traditional, commercially available multiple-choice tests. (1999)


What we do at Parker is still small scale when compared to statewide or nationwide systems, but these criticisms still matter. Do our practices hold our students accountable? Are the results reliable and valid?


Parker's practices are definitely valid and mostly reliable if reliability means that an assessment is free of measurement errors and if validity means that it is appropriately matched to what it was intended to assess. Dietel, Herman, and Knuth of the North Central Regional Educational Laboratory define reliability as "the extent to which a test is dependable, stable, and consistent when administered to the same individuals on different occasions" (1991). While our students might not get the exact same assignment multiple times, they are called upon to use the same skills in many different situations. Performance assessment seems reliable, therefore, if a student consistently demonstrates that they can read and write and think. That's why our portfolios require multiple pieces of work to show the same skill and that's why we emphasize skills as habits. Once they are able to demonstrate those habits, it seems safe to say that they can demonstrate them again and again, even if they don't always choose to. Parker's assessment system, then, is reliable because the standards and the criteria for excellence are "dependable, stable, and consistent."


The same authors define validity as "the extent to which a test measures what it was intended to measure. Validity indicates the degree of accuracy of either predictions or inferences based upon a test score." (1991) Here performance assessment seems to be considerably more valid than many multiple choice tests because it asks for exactly the skills that a student needs to demonstrate. They call for real world skills and they require real world skills to complete the task. As Grant Wiggins argues, "test validity should depend in part upon whether the test simulates real-world "tests" of ability. Validity on most multiple-choice tests is determined merely by matching items to the curriculum content (or through sophisticated correlations with other test results" (1990). And so by using exhibitions and portfolios that lay out exactly the task at hand, the validity is built in.


If accountability means setting standards for student performance and holding them to it, then our students are accountable. We tell them exactly what "the test" is and what we hope they can show on it. That is exactly as it should be. What kind of accountability is when the test is secret? Again, Wiggins explains that such secrecy is completely misplaced:


The best tests always teach students and teachers alike the kind of work that most matters; they are enabling and forward-looking, not just reflective of prior teaching. In many colleges and all professional settings the essential challenges are known in advance--the upcoming report, recital, Board presentation, legal case, book to write, etc. Traditional tests, by requiring complete secrecy for their validity, make it difficult for teachers and students to rehearse and gain the confidence that comes from knowing their performance obligations. (A known challenge also makes it possible to hold all students to higher standards). (1990)


There are no tricks here. There's no secrecy. Students know what is being asked of them. We simply have to look to see whether they can do it or not.


At Parker, we have tried to strengthen our accountability by having outside assessors evaluate our portfolios and exhibitions. We regularly ask experienced educators to visit and examine student work. We have also conducted inter-rater agreement studies to make sure we are setting performance standards in consistent ways. And our school gets inspected by the State and by accrediting committees to check whether our practices follow our principles. We now have several years of reports that essentially confirm we are doing what we say we're doing.


Nonetheless, we have experienced the kind of difficulties described by Joan Herman in her review of where performance-based assessment is today:


Research suggests that in the early years of a new assessment, achieving reliable scoring can be a challenge because it occurs while the processes of achieving consensus on standards and expectations and of encouraging ownership and support for the new assessment are also incomplete. The technical process of reaching high levels of agreement, however, is fairly well understood. It requires good training and scoring procedures, including well-documented scoring rubrics exemplified by benchmark or anchor papers; ample opportunities for scorers to discuss and practice applying the rubric to student responses; and systematic checks before and during the scoring itself to ensure that evaluators are consistent, with retaining as necessary. (1998)


We know that what she says is correct. And we do work hard to get to the kind of consensus on standards she describes using the practices she lays out. Still, our efforts always could be more frequent and more precise.


Every institution needs to reflect on its practices and Parker is ripe for a review of our assessment system. We've been running for five years now, and it is probably time to evaluate what's been working and what hasn't. The following set of criteria questions were developed by the National Center for Research on Evaluation, Standards and Student Testing at UCLA to help educators evaluate performance assessments:

    Consequences . To what extent do the assessments model and encourage good teaching practice? Are intended positive consequences achieved? What are the unintended negative consequences?

    Alignment . Does the assessment reflect content and performance standards that have been established for students? Does the assessment measure important curriculum goals?

    Fairness . Does the assessment enable students, regardless of race, ethnicity, gender or economic status, to show what they know and can do? Have students had the opportunity to learn what's being assessed?

    Transfer and Generalizability . Will the results of an assessment provide accurate generalization about student achievement?

    Content Quality . Is the assessment content consistent with the best current understanding of the subject matter? Does it reflect the enduring themes and/or priority principles, concepts and topics of the discipline?

    Cognitive Complexity . Does the assessment require students to use complex thinking and problem solving?

    Content Coverage . To what extent does the assessment cover the key elements of content standards and/or curriculum?

    Linguistic Appropriateness . Does the assessment allow students to display what they know and are able to do without being swamped by language demands not required by the content?

    Meaningfulness. Do students find the assessment tasks realistic and worthwhile?

    Practicality and Cost . Is the information about students worth the cost and time to obtain it (What Constitutes a Quality Assessment 1998)


So how would Parker come out if these were the criteria? We are probably strongest in the areas of consequences, cognitive complexity, and meaningfulness. Again and again, we ask our students to think in meaningful ways. Almost all of our teaching goes to that aim. If we have weaknesses, they are in the area of content coverage. We simply cover less ground than many want us to. But the ground we cover, our students and we cover with intellectual rigor and depth.


So Does it Work?


Performance-based assessment, portfolios, and exhibitions definitely work at the classroom and school level. How can we prove it? Our test scores are good -- so is that enough? I think we prove that performance-based assessment works by the engagement and learning of our students at Parker on an almost daily basis. That can be seen through students' own testimony and it has been confirmed by external evaluations.


If one believes that they can be effective measures of what students know and can do, Parker's test scores do show that our students can read and write and compute and think. In a renewal inspection report completed in February, 1999, the inspection team reported this success: "Student achievement on external academic assessments -- Stanford 9, MCAS, and the PSAT has been consistently strong. Standardized test scores at Francis Parker are as strong as, and in some cases somewhat stronger than, those of the many districts who send students to the school" (p. 3, Renewal Inspection Report). Interestingly, Parker's annual reports, an accountability requirement for charter schools, all point out that the school's "emphases do not align with the subject and style of standardized tests." So Parker students have achieved their scores without the school's specific attention to tests.


More important to us, outside evaluators have looked beyond the numbers to the quality of student work produced and to the quality of the learning experiences Parker has achieved. These inspectors, visiting committee members, and portfolio reviewers came to the school, looked at student work, talked to as many people as they could, and described what they saw and heard. While they have reported thoughtful and constructive concerns, their reviews have been overwhelmingly positive in the first five years of the school, especially in the area of assessment.


In the charter renewal inspection report, required by the State and conducted by Schoolworks, an independent consulting firm, inspectors found that our assessment system clearly and effectively helped students rise to high standards. Here are some excerpts:

    Ongoing classroom assessments and formal Gateway assessments indicate that most students are developing sound "habits of mind" and mastering "essential skills and areas of knowledge" as defined by the Parker School Criteria for Excellence . . .. Assessment using these criteria is frequent within a unit and both comprehensive and rigorous at a Gateway. (p. 3)

    The Criteria for Excellence are clearly and coherently organized and state, so much so that students themselves articulate academic expectations, and often discuss and adapt them to particular assignments, freely and frequently. (p. 10)

    Parker measures academic progress through its elaborate system of rubrics and benchmarks toward criteria in each division. Every assignment is disseminated using these standards and related achievement criteria and subsequently returned to students with clear feedback against the standards, in both sequential and narrative terms. The sheer volume and detail of assessment provides students with an almost momentary sense of the progress being made through the curriculum toward the ultimate Coalition goal of "the student-as-worker," principle number five, which ultimately "provokes students to learn how to learn and thus teach themselves. (p. 13)

    Every assessment (in Arts and Humanities) takes full advantage of the student's individual interests and creative potential. Instructions for successful completion of the project and descriptions of model work are clear and copious, suggesting timetables for completion but allowing individual students and student groups to organize study and investigation in the most individually appropriate ways. Of particular not is the interdisciplinary thinking and research required of students within the domain. Content and skills in literature, art, and history merge seamlessly throughout the unit and in the culminating activities." (p. 19)

Such comments match some of our highest goals for assessment, and this independent group of external inspectors said we were achieving them.


Noted educators have also described what we consider to be accomplishments. Visiting Committee member Joseph McDonald, Professor of Education at NYU and an authority on assessment, wrote after his second visit in 1998:

    The student work I read and observed was very good. In particular, the exhibitions were impressive -- signs of the strong intellectual culture that has been established in the school. But the portfolios of the students who had succeeded in passing from Division 1 to Division 2 were also impressive: good writing, good thinking, intellectually substantive content." (p. 100, 1997-98 Parker Annual Report)

Vito Perrone, Professor of Education at the Harvard Graduate School of Education, wrote the following in June 1998:

    I was particularly pleased to see the various "standards" posted in all of the rooms along with displays of student work. The latter were displayed well, good demonstrations of what I call 'challenging work,' essentially responses to powerful questions, having within them room for invention, carried out in diverse ways, ending in products that show care (something honored) and from which others can learn." (p. 102, 1997-98 Parker Annual Report)

Thus, outside educators, experts, and state investigators report that Parker students have been doing quality work at high levels of achievement. What they've observed has been the result of an assessment system that asks students to use their minds well and to demonstrate their learning through exhibitions.


Still even more important, our students know that they are learning in powerful ways. When an eighth grade student named Chris gatewayed from Division 1 AH to Division 2 AH, he described learning that never would have happened if we didn't use performance-based assessment:

    By far, oral presentation was my most improved skill category. I have always been shy, so naturally I was a terrible speaker. My first oral presentation at Parker was nothing like I had ever done before. The speeches were more natural, without strict guidelines like you have to use note cards, your cards should have the exact words that you are going to say, and the speech must be at least 5 minutes and 30 seconds. I like the "take your time" approach of my teacher. I was allowed to start over when I screwed up and not to get stressed out, which helped me to improve. Over time, I became more comfortable speaking and eventually became a good speaker. I went from so-so performances like the Apartheid Rally last year, to excellent pieces like the Constitutional Convention Simulation debate. (p. 123, 1997-98 Parker Annual Report)

Was it too soft to allow this 13-year-old to start again? Does it matter whether he thinks he did an excellent job? I would argue that this student reached an amazing level of self-awareness for a middle school student and developed confidence in an incredibly important skill.


It seems that performance assessment also helps students find real meaning in their work. Alexis, a ninth-grade student at the time she gatewayed into Division 3 MST, described why she worked so hard to prepare her portfolio:

    Most importantly, the drive behind these ambitions and efforts was not a grade on the assessment or a check mark on the rubrics of excellence. The work was done in an effort to dedicate myself to something and try to understand it. It was done in order to become a better student through solid research and writing and learning to understand hardcore scientific processes, like nuclear fusion in my universe project." (p. 130, 1997-98 Parker Annual Report)

In a conversation with two seniors, Jenny and Colin (September 2000), they described how different it feels to work and learn for one's own personal goals instead of for meaningless external rewards. Colin, who went back to a more traditional high school for only a few weeks, said "it felt like the other school put tests in my way just as obstacles and not to help me learn. When I came back to Parker, I felt an overwhelming rush that my work would be for my own accomplishment." Jenny knew that she could have been doing very well at a traditional high school that gave tests and grades: "I could easily memorize whatever I need for a test. But then I'd forget it. Exhibitions actually measure something more important, because I really want to know how to do the things I'll need in life." Colin added, "We develop more the skill of how to learn than just learning information. Put me on Jeopardy and I'd get schooled. But put me in real life where things are changing and where I need to be able to deal with those changes and I'll never give that up." I respect these students more than I can say. Their words are a part of our evidence.


Tenth-grader Annabel drove home the message we hear in many of these student testimonials in a letter she wrote to the editor of The Boston Globe:

    As opposed to focusing on dry, meaningless tests and rote-learning, my school focuses on learning through experience, compiling portfolios of meaningful work, and many major interdisciplinary projects. Instead of perfunctorily covering lots of topics without any real understanding, we go in-depth with a few important topics, so that we have a real understanding of issues that are relevant to the society in which we live. The test doesn't take any of this into account.


Are these students well-coached? Of course. But how they feel about their own learning gives powerful testimony for what a performance-based assessment system can achieve, especially when their reflection is backed up by test scores and the reports of independent outside experts.

Can it work for all?


Parker's experience proves that it does work for us in our classrooms and in our school as a whole. Can performance-based assessment or portfolios work on a larger scale? Could it be a better MCAS? There have been a few attempts at large scale performance assessment systems. While no research has declared them an unmitigated success, they provide several lessons.


One attempt to implement large-scale performance-based assessment and portfolios was the New Standards Project directed by Lauren Resnick and Marc Tucker in the early 1990s. This project was an effort explicitly aimed at "creating tests worth taking." Elizabeth Spalding, who helped work on this project, describes:

NSP sponsored hundreds of meetings involving thousands of educators in the crusade to build assessments toward which teachers would want to teach. Thousands of students in grades 4,8, and 10 completed New Standards performance assessment tasks and compiled portfolios to demonstrate their ability to meet performance standards that were being developed simultaneously. (2000)


The results of this project were mixed. According to Edward Haertel, these assessments had all the reliability, generalizability, and non-standardized scoring problems found in other large-scale performance assessments. And yet according to Spalding, they effectively reflected constructivist views of knowledge, thinking, and learning, they positively influenced curriculum and instruction, and they provided more meaningful information about student performance.

Vermont and Kentucky have implemented statewide portfolio systems in the 1990s, another significant attempt at implementing performance-based assessment on a large scale. While research results from both states have been mixed, they have shown that -- again -- such a system significantly improved teaching and learning in classrooms. According to an article in Improving America's School: A Newsletter on Issues in School Reform, "preliminary observations of classroom instruction in Kentucky and Vermont, two states with portfolio assessment, indicate that teachers spend more time training students to think critically and solve complex problems than they did previously." The authors go on to report:

After studying Vermont's portfolio assessment program during the first two years of its implementation the RAND Corporation concluded that the effects of portfolio assessment on instruction were "substantial and positive." Half the teachers surveyed by RAND reported an increase in the time students spent working in pairs or small groups. Almost three-fourths of the principals interviewed said the program produced positive changes in instruction at their schools. Between 70 percent and 89 percent of the math teachers reported more discussion of math, explanation of solutions, and writing about math in their classrooms since the advent of the portfolios; three-fourths reported having students spend more time applying math knowledge to new situations, and roughly 70 percent reported devoting more class time to writing math reports. Principals in half the sample schools reported expanding portfolio assessments to other grade levels, an indication that they approved of portfolios' effect on instructional practice in their schools. (1996)


These are encouraging results in many respects. The authors note, however, that other states have not had success with such large-scale attempts and several have abandoned the effort.

The issue is not simply whether one form of assessment works and another does not. What is becoming clear is that different assessments show different information and they should be used accordingly. At this point, large-scale performance assessment has not overcome several serious shortcomings, though that does not mean the efforts to implement it successfully should be abandoned. William L. Sanders and Sandra P. Horn of the University of Tennessee write:


    Non-standardized alternative assessment has yet to demonstrate the ability to provide generalizable information for comparison purposes over time on a large-scale basis without proving more costly in time and resources than standardized testing and without itself falling prey to the "teaching to the test" syndrome so often cited as a major deleterious result of standardized testing. The strengths of alternative assessment lie in its ability to individualize assessment, to mimic good teaching practices, and to involve teachers more deeply in the assessment process. Currently, attempts are being made to improve the generalizability and reliability of alternative assessments in order to use them for the evaluation of school and school system efficacy.

They go on to say that:

    The issue is not whether one form of assessment is intrinsically better than another. No assessment model is suited for every purpose. The real issue is choosing appropriately among indicator variables and applying the most suitable model to render them. It is necessary to determine what information is sufficient to each purpose before deciding upon the form of assessment to be used. When a variety of valid and reliable assessment methods exist, it is parochial and ineffectual to adhere to only one, asserting that it is in all instances superior. (1995)


One size does not fit all. We need such a variety of assessments in our schools. Parker's assessment system can coexist with MCAS, but only if MCAS does not become so dominating and so prescriptive as to crush all other means of seeing what our students know and can do.

Conclusions

Performance-based assessment has been successful at Parker for several reasons. One is the very thoughtful definition of criteria for excellence. When the criteria are clear, teachers and students and parents all know what we're working towards. Two is that we have a shared vision for our school and for the way that teaching and learning should happen. Thus there is mostly agreement on how assessment should work.

Parker's assessment system does hold students accountable for their learning. There is coherence between the skills they have to demonstrate and the specific performance criteria we have set for these skills. Their work that reveals what they know and can do goes into a portfolio and the portfolio qualifies a student for promotion and eventually graduation. Such coherence is then bolstered and brought alive by exhibitions where we really see what our students can do. Because of this experience, we conclude that performance assessment leads to more thoughtful, more compelling, and ultimately more skilled student work.

The larger system must not squash this kind of work. Indeed, the 1993 Reform Act requires us in the State of Massachusetts to use assessment practices that are more than MCAS. All of us know that there are many challenges ahead before performance-based assessment becomes a widespread practice. This is work that we have to do -- first school by school, but inevitably for the wider system as well.

 



Bibliography



Annual Report of the Francis W. Parker Charter Essential School. Devens, MA. 1996-97, 1997-98, 1998-99.

Dietel, R.J., Herman, J.L. and Knuth, R.A.. "What Does Research Say About Assessment?" North Central Regional Educational Laboratory: Oak Brook, 1991. http://www.ncrel.org/sdrs/areas/stw_esys/4assess.htm

Herman, Joan L. "The State of Performance Assessments." School Administrator, December, 1998. http://www.aasa.org/SA/dec9804.htm

___________"What Constitutes a Quality Assessment?" School Administrator, December 1998. http://www.aasa.org/SA/dec9805.htm

Horn Sandra P. and Sanders William L. "Educational Assessment Reassessed." Educational Policy Analysis, Number 3 Volume 6. March 3, 1995.

Madaus, George F and O'Dwyer, Laura M. "A Short History of Performance Assessment." Phi Delta Kappan; May 01, 1999.

SchoolWorks, "Renewal Inspection Report." Massachusetts Department of Education, February, 1999.

Spalding, Elizabeth. "Performance Assessment and the New Standards Project: A Story of Serendipitous Success." Phi Delta Kappan: V. 81 N. 10, pp. 758-764. June 2000.

"What Constitutes a Quality Assessment?" The School Administrator Web Edition (December 1998). Retreived August 25, 2000 from American Association of School Administrators web site: http://www.aasa.org/publications/sa/1998_12/herman_side_resource. html

"What the Research Says About Student Assessment." Improving America's School: A Newsletter on Issues in School Reform - Spring1996 . http://www.ed.gov/pubs/IASA/newsletters/assess/pt4.html

Wiggins, Grant. "The Case for Authentic Assessment." Practical Assessment, Research & Evaluation , 2(2). 1990.

__________ Educative Assessment: Designing Assessments to Inform and Improve Student Performance. Jossey-Bass Publishers: San Francisco, 1998.





About the Author

Peter Garbus has been teaching high school History, English, and Humanities for eleven years. A graduate of Brown University and the Harvard Graduate School of Education, he received National Board Certification for History/Social Science in 1999.