Learning might be the wicked problem in higher education. It’s inescapable. So when a headline like “Can Artificial Intelligence Make Grading Fairer and Faster?” is published in a leading ed-tech publication, people notice. The article was about a platform called Gradescope, whose tagline is “Grade Faster. Teach better.” Anyone teaching large-enrollment courses would want to know about this!
The headline on page B8 from the April 13, 2018 issue of The Digital Campus from the Chronicle of Higher Education (only available to paying subscribers) grabbed the eyes of Ruth Pionke, who then struck up a conversation with Marco Bonizzonni and Nathan Loewen. Ruth was intrigued by the upshot of the article, “You scan papers into a system, set up a rubric and the system helps you grade the tests. It still requires a bit of work from faculty to input the data.”
According to the article, the creators of Gradescope designed it to “read students’ names off a form and recognize patterns in handwritten answers, such as chemical formulas for acids, as long as they were entered in a designated area on standardized-test forms.” Gradescope also has the ability to “identify and cluster those answers into groups.” As the article notes, Gradescope does not create the tests or the grading criteria. That remains to be done by someone, somewhere.
So can AI make grading fairer and faster? Marco and Nathan took some time to dig around to see just what might be the reason for the article to claim that 550 colleges use the tool. As university teachers, Marco and Nathan were concerned with pedagogy and utility, rather than security issues (such as the analysis performed by MIT).
Nathan came to some quick conclusions based on a comparison with a grading solution Marco and Diana Leung have developed at the University of Alabama. Marco and Diane created their system to improve the speed and security of returning personalized feedback to students. Their solution does not speed up grading itself.
Speed ∝ Quality ∝ Cost. It’s the unattainable triangle. Anyone who has studied project management will have come across Martin Barnes’ iron triangle (1969). It should be kept in mind when anyone considers pedagogy, too. In most industries, there is a relationship between three factors that are directly related: strength (or, integrity), performance and cost. Increase any one factor, and the others also dependently increase. Deplete one, and the others decrease. Just like smartphones and other luxuries (e.g., cars), any augmentation of integrity and outcomes in higher education rarely comes without high-cost inputs. Fast, good or cheap. Pick the last and the other two are unlikely.
For some strange reason that is unproven in every other economic sector, technology is expected to break these dependencies in higher education. The break is now thought to come from adaptive learning and learning analytics (see pages 42-3). Steven Mintz puts across this “wicked problem” nicely.
Nathan’s observation is that Gradescope does not necessarily speed up grading. Like so many platform vendors in higher education, the promise of getting more done faster (and better) is based on the notion that adding up scores and grouping like responses is the most time-consuming part of grading. For most faculty, this is not the time-consuming part. Scoring is the quick part.
Grading well and grading efficiently are both borne of learning how to see what a student has done, understand where they went wrongly or correctly, and identify that for them. Grading moves faster with experience and accumulated wisdom about how students think. Two key points are worth repeating: 1) useful grading involves helping each student see where and how something went wrong, and, 2) how students think is a variable, not a constant.
These vendors trot out the same solution for issues 1 and 2. They claim to solve grading with rubrics. Rubrics are attractive for several reasons. They provide the veneer of fairness, despite problems raised by the scholarship on teaching complex subjects. Rubrics are often attractive economically since they just might allow non-experts (i.e., non-faculty graders) to grade work. There is definitely something attractive about avoiding the protracted interactions required to help students learn where and how they went wrong. Any measure which promises to speed up that process will likely underdeliver.
Nathan’s opinion is that the “faster, better” ed-tech crowd misses the crucial element of handwritten formative feedback. Not only is formative interaction important in the sciences, but it is also the most crucial element of humanities pedagogy. It is difficult to see how Gradescope would allow a faculty member to pinpoint feedback. For example, when 2 points are deducted for “split infinitive,” the student (who doesn’t know what “split infinitive” means) will not be able to “see” what is being demonstrated. How effectively faculty grade is an issue likely best solved by peer review of teaching. But that process is not fast, either.
Marco had not yet heard of Gradescope, even though he is always on the lookout for solutions to streamline the hand-grading process. He is involved in the lab courses at UA, which require a lot of hand grading. An army of graduate teaching assistants carry out the grading, and these GTAs often require guidance to avoid uneven results and poor student outcomes. In other words, input costs are required to increase the performance of GTA graders.
Marco found that, like Nathan, there is a popular misconception that, somehow, entering the grades is the problem. Generating personalized feedback is the time-consuming part. Marco went through Gradescope’s online demo out of curiosity to see what they offer.
Marco found the execution plan troubling. Gradescope disingenuously shifts the burden from producing feedback to producing a rubric. Their main claim to time savings is the fact that, once you create a rubric for a particular question to include “named common mistakes,” you can reuse that rubric. That is certainly true, technically, but practically unrealistic. In addition to Nathan’s point that simply telling student what they did wrong is typically insufficient (one also needs to point out where, and how they could fix it), Gradescope’s approach also does not recognize the fact that creating an effective rubric to include common mistakes in any sizable assignment system would be so monumentally time-consuming as to render the process non-viable for an instructor.
Indeed, this very type of rubric-plus-common-mistakes response system is the actual discriminator between “good” and “bad” online homework platforms. Almost all online homework platforms already recognize “good” answers. Most of the market offerings also recognize some incorrect answer patterns and provide feedback relevant to the mistake the student made.
The very same reusable rubric system that Gradescope wants the instructor to implement is what all systems currently on the market use to accomplish their outcome. And this comes at a cost. Many products offer rubrics that were painstakingly pre-generated by subject matter experts. These rubrics represent a very large fraction of the whole system’s value. Faculty are unlikely to do this at a reasonable cost of their time and effort (even if they work as a team!)
To cover even just a handful of the common mistakes for the 1000+ questions in a common question bank is a very significant burden for even the largest publishing companies. And they have organized entire divisions to do that for a living. The idea that Gradescope’s users will “simply generate a reusable rubric” vastly underestimates the problem. On these grounds, it may be questionable whether Gradescope has sufficient pedagogical expertise to understand what they are asking faculty to do.
Alongside these concerns about performance and integrity, Marco observed that the pricing structure likely puts Gradescope beyond the reach of public universities. The pricing for the multi-instructor version that allows for graduate teaching assistant participation is five dollars per student per course. And this is charged to the instructor, not the institution! As an example, Marco’s upcoming fall semester would cost: (320 + 65 students) * $5 / student = $1,925. Perhaps the implicit assumption is that faculty will pass this particular cost to their students?
Marco acknowledges that providing personalized feedback in large classes is definitely something every institution must constantly address. Few institutions have static enrollment projections. No institutions have the exact, same “type” of individual enrolling in the various majors and course offerings. The course offerings themselves are often changing, too.
Ruth, Marco, and Nathan enjoy any opportunity to connect with instructors, administrators or vendors who wish to discuss these issues.