S3xE2: Alternative grading, how?

January 31, 2022

In part two of our episode series with Kevin Lin and Brett Wortzman from the University of Washington, we dig into the details of how they implement their alternative grading systems. Brett outlines their ESNU system that stands for exemplary, satisfactory, not yet and unaccessible, as well as the components of his grading system for his large CS1 course, and Kevin talks about his version from his CS2 and other data structures courses. We discuss trading off complexity for precision and how much differentiation in grades is actually feasible and necessary. We also talk about grading workload and balancing the convenience of autograding with the depth of feedback from manual grading. Kevin emphasizes how grading “bundles” can provide more clarity for students on what the expectations are for each grade. Finally, both Kevin and Brett emphasize considering constraints, priorities, and tradeoffs in choosing a grading system for your own class. If you haven’t yet, consider listening to the first episode of the series on why to consider alternative grading and the potential systems to choose from!

Edited by Brett Wortzman

Kevin also wrote a post on his blog going into more details of his specifications grading policies if you want to read more!

You can also download this episode directly.

Transcript

Kristin [00:09] Hello and welcome to the CSED Podcast, a podcast where we talk about teaching computer science with computer science educators. We’ve decided to start something new. Rather than a season with a clear number of episodes, we’ve picked a theme and we’ll run with it until we run out. The season’s theme is what’s next? We focus on how we’ve rethought our teaching since COVID-19 came and upended everything. I am your host Kristin Stephens-Martinez, an assistant professor of the practice at Duke University, and hopefully this is a welcome back to many members of the audience as we do part two of my discussion with Brett Wortzman and Kevin Lin. Brett and Kevin, how about you reintroduce yourself for the audience, assuming that potentially they haven’t listened to part one yet?

Brett [00:52] All right, hi, Kristin. Good to be back. My name is Brett Wortzman, I use he/him pronouns. I am an assistant teaching professor at the Paul G. Allen School of Computer Science and Engineering at the University of Washington, Seattle. My primary focus is on teaching our large CS1 course CSE 142, which has anywhere between about 400 and about 1,000 students in any given quarter. And I’m also heavily involved in our CS pedagogy, CS teacher training and all of the practical sides of our CS education work that we’re beginning to ramp up here.

Kevin [01:27] And hi, my name is Kevin Lin, an assistant teaching professor in the Allen School just like Brett, and I often times teach a follow-up course to Brett’s 142, known as 143 at UW. And I’ve also taught the follow-up courses to that. So you can think of programming, but also a little bit of data structure and maybe some data science work. I’m also interested in teaching design courses for more critical methods in computing and how do we teach computer science education? And those have also been a focus of mine working with a few undergraduate and graduate students.

Kristin [01:57] So our discussion is on mastery grading or alternative grading with a focus on large, lower-level computer science classes. And in the first episode, we kind of focused on the why and the more philosophical what’s going on and why do we teach these kinds of classes, as well as different things to think about or talk about to someone who is thinking about doing this kind of thing but isn’t quite convinced that it’s worth the work. So that was the focus of the first part. And so in this episode, we really wanted to get into the nuts and bolts in the details of how all of this was done. And the reason why I invited Kevin and Brett onto the podcast is because they have both taught large classes using at least elements, if not fully changed their syllabus, to be mastery grading or alternative grading. So I think since Brett is the one that has more experience teaching a large CS1 class, we can ask Brett to go first. And I believe then Kevin can kind of say, given what Brett said, this is how the slight tweaks that I do for my class. So, Brett, tell me more about CS1.

Brett [03:00] Sure! So I’ll preface this with: we’ll get into the differences that Kevin and I have in our systems, but we really developed the baseline for this together, and so there are a lot of similarities and a lot of similar philosophies here. I tend to think of the changes we made as having a few main components. The first and the most obvious one usually to students is that we allow resubmission of work and we allow resubmission of work without limits on how much credit can be earned back. The notion of credit gets a little weird because of other changes we’ve made. It’s not points. So we’ll get into that in a minute. But students can resubmit previous assignments after they have already submitted them and after they have received feedback from the course staff on the work they have done so far. And they can use that as an opportunity to demonstrate that they have improved or increased their mastery of the necessary or their targeted concepts. And then the new grades will replace the old grades in the newest version of the grade for each assignment is what will be used in our final grade computation.

Kristin [04:07] I just wanted to interrupt and ask a clarifying question, so when you say they get back the grading or the feedback or whatever before they can do that resubmission, you mean they get back their hand graded piece to the assignment and then they get to submit again?

Brett [04:22] Correct.

Kristin [04:24] OK, that sounds like a lot of work because if they get to resubmit, we have to do it again. But that’s the part I wanted to make sure was clear.

Brett [04:31] Yes, and we can talk a little bit about things we’ve done to mitigate that workload. We can go through the mechanics first. So, resubmissions are the most obvious piece. And that’s really the mastery piece of this where we’re focused on, like we talked about in the first episode, evaluating students’ mastery of the concepts at the end of the class rather than at whatever arbitrary point in the middle of the course we decided to give them the assessment on those particular concepts. The second piece is that we changed the way we assign grades on individual assignments. We moved away from a points-based system to a more qualitative, more coarse grain system for us. For us, we call it ESNU, which we sometimes pronounce es-new, where E is exemplary, S is satisfactory, N is not yet, and U is unassessable. And we chose those names very carefully because those words really mean what we’re trying to have each of those grades mean. And so, for example, the lowest grade, that U, is unassessable. It’s not failing, it’s not incomplete, it’s not poor work, it is the work that is there does not allow us to properly assess your mastery. And then the other one on the lower side, the other what we would consider maybe not passing and we don’t even talk about it that way is the N, which is not yet, it’s not needs improvement, it’s not unsatisfactory, it’s not yet. Your mastery is still developing.

Brett [06:03] So we went to this ESNU system and we tried as much as possible to get points and numbers out of our system entirely so that students will stop treating it as an optimization problem and trying to do the complicated math of which points can I get to bring up my grade and how much does each point matter? And where is the most valuable thing for me to spend my time on? And instead, get them just focused on mastering the learning objectives as we present them and as we work with them. So that’s the second big piece.

Brett [06:37] And then the third big piece related to the ESNU change is the way we compute final grades because we don’t have points, we can’t do averages, which is one of the goals. You know, when you read Feldman, or listen to your episode, Kristin, with Feldman, he talks about points being fungible and all the inequities in weighted averages, the traditional way of computing final grades. We wanted to step away from that. And instead, what we do is we create what we have termed bundles, which are collections of evidence of work that we then assign to various grades. So at UW we give grades on a full 4.0 system in 0.1 increments, which has its own set of challenges that we can save for another day, but this would apply even if you’re in a more traditional kind of ABC, with or without plus or minus systems. We define a particular bundle of work that we say represents B-level work, or for us it would be 3.0 level work. And we say if your collection of work over the course of the entire quarter meets or exceeds this collection of work that we have defined as a 3.0, you will receive a final grade of at least a 3.0 and we release those not for every possible 0.1 for but for a set of them, and then we have ways of interpolating in between as necessary. So the one neat side effect of that is that doing work never causes a student’s grade to go down. So in a traditional weighted average points-based system, if I am currently carrying a weighted average of, say, an 87 and I submit some work and that work, I didn’t do as well on, and I get what amounts to a 70% of the possible points on that particular assignment, my overall grade is now worse off than if that assignment never existed. If that assignment were removed entirely from the course, I would have been better off. So I am incentivized potentially to not do this work, modulo things like zeros and such like that. And we really wanted to be in a place where students are always incentivized to do more work, to put in more effort to continue working towards their mastery. So because we have these bundles and they’re made up of counting things, every time you submit more work, you’re going to increase your count of something. And that is only potentially going to have your final grade calculation go up. So you’re always improving over the course of the quarter. There are a lot more details, but at least from my perspective, those are the three high-level points of our implementation.

Kristin [09:10] All right, so given that system, Kevin, how is your system different and what class is this for?

Kevin [09:18] Yeah, so I’ve been experimenting and changing things quite a bit. I’m not sure if, Brett, you’ve been changing things as much as I have been changing things every quarter, even within my own kind of pedagogy and grading practices. I feel like one of the major differences between our grading systems is the amount of precision that’s encoded in the way Brett does, like grading for the assignments for example. He mentioned ESNU, that four-level scale: exemplary, satisfactory, not yet, unassessable. But there’s an interesting tradeoff there of you can imagine having that grading system, there’s a question there of how many levels do you want, how much precision do you actually want in your assignment as an instrument of measuring mastery or proficiency or understanding of that knowledge? And I think there’s an interesting kind of fine line there about what actually matters for this and what do I actually care about for determining grades. The philosophy I’ve taken is to try to be as minimal as possible. And this is especially interesting because, you know, at UW we have a 4.0 grading scale, but that means that we need to give out, it sounds crazy when I say it, 33 different possible grade levels.

Kevin [10:23] I think if you think about it from that perspective, then that gives you an upper bound for the amount of complexity that you need in your system. In theory, if you don’t necessarily need more than that amount of complexity in order to assign grades using the 0.1 increments. And that’s an assumption even that I need to assign grades in 0.1 increments, which unfortunately the school says is pretty important, so I will try to figure out how to finagle that conversation. And I think the best I’ve come up with is like I should think about it in terms of what level of complexity do I need to assign final grades where my goal really is to minimize the stress of getting grades, where even I feel ESNU might expand that stress, in the sense that, like, oh, I have to get an E because the bundle for a 4.0 says I need to get a lot of Es. So I think my perspective is what can I do to make that stress level lower through the ways in which I simplify my grading system, even when we’re talking about this bundle-based or specifications-based grading system? I think it’s one of the biggest differences that I have in general, and I can also dive deeper into particular differences if you’re curious.

Kristin [11:24] Could you be a little bit more concrete what you mean by you are making something more simpler than Brett’s version?

Kevin [11:31] Yeah, well, Brett I think is being a little bit ingenuous when he’s saying that it’s just ESNU, but in Brett’s version for each assignment, there’s ESNU, but it’s across 4 different dimensions. So in theory it’s almost like 16 points of delineation for an assignment. And there it’s like you’re very much in the territory of, could this really be points? And what are the implications of having points? One of the embedded assumptions about a numerical quantitative point system is that 2 is twice as good as 1 and 3 is 3 times as good as 1. And there’s all sorts of interesting conversations or ideas, philosophies around does it make sense to have points versus ESNU? And one of the big differences that I feel is that ESNU means that you cannot read it as 2 is twice as good as 1 and have that kind of inbuilt comparison. So for me, that was one of the thoughts, was thinking about all this conversation that I’ve just mentioned about the complexity that’s embedded in grading on essentially 16 points of difference on an assignment. And then what could we do to reduce the complexity of that?

Kevin [12:28] So for my own assignments, the way I see it is that I feel like I have a lot of standardized work in my course. But I also want to have some creative work and some work that allows students to expand their knowledge. And my expectation really is that I want everyone to be able to deliver high-quality work. So with all those three assumptions, I have most of my work graded just satisfactory or not yet, and that just simplifies everything. So even for my large projects, my 2-week long programming projects that involve analysis and extra design, that’s still satisfactory or not yet because it’s standardized work. And I expect students will be able to get to the satisfactory bar with resubmissions. So I’m setting a very high expectation by saying that because everyone, even if you want a 2.0 or 1.0, you have to get a satisfactory on that project. You cannot glide by with, like some S’s here on a couple of dimensions on the assignments. You have to get an S on the entire project, and that’s a pretty high bar. But I think the resubmissions encourage that development. And again, the purpose of that change is to say that the standardized work that I have in this course, the project that where everyone’s doing the same thing, I want everyone to be able to get that satisfactory requirement.

Kevin [13:32] It’s really and this is, I think, the contentious part and the part that I’m still trying to wrap my head around is, what do I do then to actually differentiate grades? And this is the part that, you know, going back to my conversation from part 1, it really hurts my soul to talk about it in this framing, right? Like, what have I done with this? Because I think when you come down to it if a school says you have to have some kind of differentiation in your grades, in some mechanism, basically just don’t give out all 4.0s, what do you have to do to get that differentiation? And philosophically, where do you want to draw that difference from? In the past, before we started this project, it was exams. We would say most people will do well on assessments. This is true even before we change the grading system. I did the math, actually. Most students in the class, whether you got 2.0 or 4.0, you got 18 out of 20 on the assignments on average, for example. And it’s really the exams that gave you all the differentiation. And then my question there is, is that fair? Is that what we want from the course? Is that how we want to get a differentiation? Because that’s how things are being done currently. And so there’s a big conversation we can have about that.

Kristin [14:44] Quick concrete nuts and bolts question. When you say satisfactory on an auto-graded assignment, what is that? Is that a pass all the tests or is it a pass X% of the tests? Is it pass these but you don’t have to pass those?

Kevin [14:50] For myself, it’s for when I teach the intro programming course, like Brett mentioned, we have components that are autograded and manually graded. A satisfactory requirement for that might look like, complete pass all the tests and meet our code quality or review guidelines. For my more basic data structure course, when I have students do analysis, I actually don’t have them submit any code. I have them submit a video presentation of walking through those different aspects, so that presentation should include all the components and meet the requirements that we set for satisfactory.

Kristin [15:20]** So Brett, Kevin hinted at the fact that you have 4 pieces to your assignment that are each ESNU. Can you go deeper into what is autograded, what is not, like how much work is all of that?

Brett [15:33] When I do the ESNU, Kevin’s right. I have these 4 dimensions and there’s a few different reasons we went that way. Part of it is to create more differentiation, Kevin’s absolutely right, we should have a discussion about whether differentiation is important and the ways in which we are achieving that. But we work at a place that offers 33 different final grades at the end and has an administration that says that grades should be differentiated to a certain extent. And so this was the way I went about trying to create some differentiation.

Brett [16:06] But the other thing for me was, and this gets back to the purpose of grades, grades are overloaded in general. But one of the specific ways that they are overloaded is they are both a signal to external actors about a student’s achievement or mastery or whatever, depending on the particular system you’re using. But they are also a mechanism for feedback for students. And so creating these four separate dimensions allowed me to give more nuanced feedback to students, but still in a structured way that they would be ideally more likely to actually read and process, than the comments we leave on their code. When we hand grade, we leave comments on their code just like everybody does. But also, like most instructors, we have anecdotal and some concrete evidence that not all students actually read that feedback. So by going to the four dimensions, we could at least give them a very quick, very easy to grok way of saying you’re doing great on these pieces of it and you’re not doing so great on these other pieces of it. And those four dimensions are roughly aligned with my overarching learning objectives for the course, and they are consistent from assignment to assignment. I have the same four dimensions from assignment to assignment, so they’re not shifting, they’re broad and high level, and then the specific interpretation of them varies as we move from, say, the loops assignment to the arrays assignment or whatever else.

Brett [17:35] In terms of how we grade it, working through these systems was to minimize the impact on the TA’s workload as we made these changes. So they are grading the assignments in very much the same way that they were under the old points-based system. But as TAs are grading, they’re working one row, one dimension at a time and they just have to decide which box in each row the student’s work is going to end up in. And it’s a little bit more holistic, but it also is simpler in that they don’t necessarily have to track every single mistake if they find something that we qualify as, this particular thing in a student’s code means it’s S level work on this dimension, it’s S level work. It’s not going to become E level work because they did something better later. It might become N level work if we later find that they made some other error or mistake, that is more egregious. But it’s just once they’re in a box, they’re in a box. And they don’t have these complex interactions. We do still have a few interactions between dimensions that we are trying to figure out how to get rid of, but we’re much closer than we were when we were at 20 points.

Kristin [18:50] I realize you haven’t told us what those four dimensions are yet.

Brett [18:54] You’re absolutely right, I should have done that. For the CS1 course, our four dimensions are behavior, functional decomposition, because that is one of our key learning objectives throughout our CS1 course, we’re very methods-first and so one of our key learning objectives from beginning to end of that quarter is students learning to define good methods and break their code up into well-structured methods and code. So functional decomposition is number two. The current name for this third one is “use of language features,” which is very broad. This kind of captures any versions of the fact that it works isn’t sufficient, you should be making good choices in which constructs you use. So this is things like, are you using an “if if” versus an “if else if” versus an “if else.” This is things like, are you using a for loop versus a while loop, this is things like have you created an array that is actually a meaningful array versus an array that you’ve just shoved a bunch of values in so that you can treat them as one thing rather than a bunch of separate things. And that third one, use of language features, is the most variable for us from assignment to assignment. Although once we add an expectation that, say, you’re using the correct conditional construct, it never goes away. We always want you to be using the correct conditional construct. And then our fourth one is documentation and readability. Documentation and readability is things like comments. It’s things like variable and method names. It’s things like alignment and indentation and whitespace and just the general readability and presentation of the code, which is something that I think almost all CS1 courses assess. Not all do a great job of teaching and that includes us. We want to be doing a better job of explicitly teaching those skills, but we do assess on them.

Kristin [20:45]** So I have a question about the first one. What is behavior? Because my brain goes to student behavior.

Brett [20:53] No, you’re right. It is program behavior. It is basically does your program do the things we want it to do. This is the most auto-graded of our four dimensions. It is still not entirely auto-graded, but we do have a large suite of tests. We expose most, if not all of them to students. But we also do a few manual checks. Sometimes it’s because there are things that it’s just hard to write automated tests for. Sometimes it’s because there are things that the automated test might detect as incorrect in terms of the output, but there is some ambiguity in the spec and we want to respect that, we found that ambiguity and we want to have different interpretations of it. There are lots of different reasons we might do manual checks on that, but the behavior is mostly but not entirely auto-graded. And then we have some scripts to help with the other dimensions. We have what essentially amounts to a linter to check for some of our common readability and use of language features things. But we still have TAs do manual checks on all of that because it turns out linters are hard and I have not seen or heard of one that perfectly detects exactly the sorts of things you wanted to use and never gets it wrong. So we still want that human oversight there.

Kristin [22:09] So you’ve told us what the four pieces are. So, I think another nuts and bolts question. Is excellent work, like, passed all the autograded tests and like S level work 80%, 75%, like what is that number?

Brett [22:26] So the way we do it at this point, and this is something we’re continuing to refine as we develop the system, we get it in front of real students and we see what kind of turns up. The way it stands right now, E is passing everything or almost everything. We might make some very small exceptions, but by and large, you can think of it as passing everything. S, rather than setting a percentage of like pass 80% of the tests or something like that, we identify certain tests or certain behaviors that we consider to be more edge cases or more challenging or for whatever reason, we decide that if you fail this test, you still demonstrated satisfactory mastery, but not exemplary mastery. As opposed to there are other tests you would say if you fail this test, you did not demonstrate mastery yet. And so you could imagine it’s something like, on our for loops assignment has a lot of nested loops, we’re producing ASCII art, that’s a very kind of structured set of output. And you could imagine that, like if a student produced output that does not have any of the repetitive or nesting structure that we were looking for, that one test is probably going to require an N grade because they did nothing with loops, which was the crux of the assignment. So they have not demonstrated mastery of the concepts for that assignment. Whereas if they have more or less the correct structure, but like there’s some nesting that’s weird, or they have a couple of characters wrong in their output at one end or the other, we might consider that S where they’ve demonstrated some mastery, or demonstrated a decent amount of mastery in being able to produce something that looks more or less like what we want it to look like. But it’s not exemplary mastery because they have some errors around the edges. So we don’t do it just in terms of number of tests. We look at the specific tests or the specific cases and decide whether that’s S level work or N level work.

Kristin [24:34] Could one of you describe what a bundle is? Like, what is a B bundle versus an A letter grade bundle? Just to kind of help me wrap my head around it.

Kevin [24:44] Yeah, I think this varies, again, between our courses quite a bit. A bundle could include things like, if you have all that ESNU counting in Brett’s version, if you have eight assignments, then each of those has four dimensions, so that’s 32 dimensions. And so out of those 32 dimensions, you might ask, well, maybe a 4.0 requires like 30 out of 32 being an E, and that might be that part of the bundle. You might have other parts that you care about. Maybe you want students to do a midterm exam or final exam, which this alternative grading system is also compatible with. You might say, hey, you should also get like a 60 percent on or 70 percent or 80 percent whatever grade cutoff that you feel is appropriate for that exam component. You could say, in my bundle for 4.0, you should get an 80 percent on this exam at the end of the quarter, or in some sense, that assessment of that knowledge. So that bundle for a certain grade could involve counting a number of E’s or S’s or N’s or U’s, counting the number of those that appear that you would think is appropriate for that level of understanding or that final grade.

Kevin [25:45] It can also differ a little bit. Something I’ve been experimenting with now is trying to make the bundles a clear abstraction because one of the challenges that we came up with is when you go through that process as a student, there’s a lot of stuff to keep track of. For you to count all these different things and probably you need your own spreadsheet to manage it, or your own notes. And so I’ve been thinking like, can we make the bundles themselves an abstraction that we present to students, so they can actually see, have I completed this bundle? So the way I’m organizing my courses nowadays is I’m using what we call modules, which is basically a couple of weeks of a course and I say, to get a 2.0, you have to complete the first two or first three modules of the course. And that just involves getting check marks or satisfactory grades on all the components and the nice thing is that I just have it synced in our learning management system. So once they get something done in the learning management system, it’ll show up. And then once they finish the entire module, the learning management system says you’re done with the module, too. So they can actually just very clearly see that, let’s make that clear to them.

Kevin [26:43] So there’s definitely different approaches to thinking about bundles and depends a little bit on the complexity of your system and what you want to count, basically. You can also be more specific and say, I think it is really important that students master or understand or show proficiency on this particular assignment, so I want everyone to have done this one. So you can definitely be more specific about that. And I think that’s one area that we still haven’t, I may be just starting to think about now with the module-based system, because if you’re counting, those E’s can come from any assignments on any dimensions. Like maybe you actually want everyone to get an S on behavior because you think it’s really important that they write correct programs by the end of the quarter. And so you can actually say that’s an important value to my bundle.

Brett [27:24] As another kind of concrete example of that. And Kevin’s absolutely right that one of the nice things with these bundles is you can be very explicit about what you want students to achieve or complete for each grade, as opposed to being more fungible and a points-based system, again to steal Feldman’s term. And so, like, one of the things that I do is I have these four dimensions for the highest levels of grades in my class. I care not only that you get a certain number of E’s across all of your assessments, but I also care that those Es are relatively evenly distributed across my four dimensions. So the idea there is that if you’re going to get a very top grade in the course, I want you to be well-rounded across all my learning objectives. I don’t want a student to be able to, for example, always get the behavior exactly correct and not care at all about documentation and readability. So that’s a priority that I have made in my bundles. Doesn’t have to be that way. You could also decide, like Kevin said, that I don’t care about all your other three, but you’d better get an E on behavior every time because if you’re going to get a really high grade in this course, you better be able to produce functionally correct programs. You could decide that that’s your priority and any number of other things. I think it’s really unique what Kevin just said. I hadn’t thought about this before, about you could pick an assignment and say, like, the seventh programming assignment is a really big one that kind of puts all the things together. If you want to get a really high grade in this course, you had better do a really good job on that assignment and you could decide to prioritize that. So there’s a lot of flexibility here.

Kristin [28:59] Yeah, this is reminding me of for our intermediate data science class, one of the things I’m dabbling with is I’m going to convert each week of things into like a module. And then I’ll tell the students that, like, if you do X number of modules or the first X number, you will get a B. And if you want to get an A, you got to do a group project, like if you if you don’t want to do a group project and you don’t need an A, you can skip that part. And the other thing that I’d have those like X required modules done by a certain point of the semester. And then for students who don’t get it done, by that point, I was thinking, OK, you don’t get to do the project, but now you can stretch your deadlines all the way to the end of the semester. So you can get some kind of B and now you have an extra three or four weeks to get it all done because you’re not going to do the project.

Kristin [29:51] I had a thought that I wanted to articulate that I was learning when I was reading about specifications grading, that I really liked how she described it in the book where instead of thinking of in terms of bundles, when it comes to assigning letter grades, one way to think about it is either the number of hurdles the students have to achieve or the height of the hurdles that the students have to jump over, though I don’t like the word jump over because it makes it sound like just grunt work rather than mentally achieving some kind of learning, but that was kind of a metaphor that she kept using. So if a student wanted to get an A, they’d have to do X number of hurdles. And if they want a B, they can do X minus two number of hurdles, that kind of thing. That was one way to do it. The other way to do it would be if you want an A, you have to be able to jump over, air quotes, this height of every hurdle to get an A. But you can jump over this height of every hurdle, which is slightly shorter, to get a B. That was how she kind of framed it. And I really liked that framing because it helped me think about how that means I could have a different number of hurdles per bundle or different heights per bundle and also mix and match as necessary. So I want to get that metaphor out there and see, does that resonate for you all?

Brett [31:08] I think it does for me, and I think it kind of gets at, I keep coming back to this idea of points being fungible or interchangeable. And that’s not an inherent problem with points, but it is an issue with the way we tend, as educators, to use points. And it’s the fact that the way I do my system with the different dimensions or the way Kevin does his system with just the one grade on each assignment or what you’re describing here, Kristin, with these different types of assignments that mean different things. It’s all different versions of communicating to students what our priorities for them in terms of the work we’re asking them to do are. So we are able to signal to students that this is what we think it means to achieve this certain grade or this is what we need to see you demonstrate to achieve this certain grade. And by being very clear about that up front, this is where we get an increased student agency. We are telling them, if you want a B, I’ll use your example, Kristin. If you want a B, you need to complete this many modules. I am telling you that up front, I am making you that promise, complete this many modules, you will get at least a B. If you want an A, you also need to do a group project. If you don’t want to do a group project, that’s your choice to make. But that means that you are not going to get an A. And now it’s in the students hands, what choice they make there. We are giving them the ability to make those trade-offs for themselves when they have a really busy week or something like that.

Brett [32:36] And I’ll also say that, I think none of these things are incompatible with, say, a points based grading system. None of these things rely on any of these other implementations we’ve made. You don’t need resubmissions. You don’t need ESNU or something like it. You could take, you know, a series of assignments that have existed in your course forever and are graded on a points based scale and that points based scale has existed forever, and you could just change the way final grades are computed. Instead of saying your final grade is this weighted average of all of these different things and the 90% is an A and 80% is a B or whatever, you could say, final grades are computed by, to get an A, you must get at least 18 out of 20 on these assignments and nine out of ten on this other assignment. And to get a B you need to get 18 out of 20 on fewer assignments or you need to get only get 16 out of 20 on the same number of assignments to use your number of hurdles versus height of hurdles thing. You can do all of that and keep your assignments and the way you grade your individual assignments exactly the same as you’ve always had before. So for you, Kristin, or anyone else who’s listening, who’s like I love this idea of bundles, but this whole mastery thing scares me. You don’t have to go all the way in on all of these other things. You could just go to a bundled version of final grade computation and do it based on a number of points earned instead of an ESNU sort of thing.

Kevin [33:59] Yeah, even for the bundle system, without going to full resubmissions, you could even say, well, you know, maybe if I want to have grades capture or represent knowledge at the end of the course, then you can say, well, maybe I want to somehow say that this grade you get on the last assignment because it also includes skills from earlier assignments can also override it, or because you have this kind of opportunity to say that, you know, this last assignment, it meets the same learning objectives or covers all the learning objectives of other ones. That could be a good representation. And so that final assignment is kind of like enables you to be that automatic resubmission without introducing more work.

Kristin [34:35] I wonder if it would be a useful exercise to just, like, come up with the new grade calculation and just run it on like the prior semester is a gradebook just to see what it does. Though I think if I ever did that, I would first have to ask myself, like, what would I expect to see? What would I want to see and what would I be upset by, before I do that, because it’s so easy to kind of rationalize the results once you see them.

Kristin [34:57] All right. So I want to be careful of time, and normally I would have asked you, like, as we transition out of the pandemic, what are you changing and keeping the same in your teaching and why? But I feel like we’ve covered a lot of that already. So instead, I think I’ll just do TL;DL, too long, didn’t listen. What would you say is the most important thing for part two of our two part episodes that you want our listeners to get out of our conversation?

Kevin [35:21] Yeah, so for my too long didn’t listen, I think what is most important to me is to think about what are your constraints and have that conversation about like, it might be a very different story at your university, honestly. Like at our university UW, we have the 4.0 scale. And then based on that constraint, then I have to, and the reasoning being that you have to design on 0.1 increments, then I have to design to some extent, my grading system around that assumption. But if you’re on a system with ABCDE and there are also even, you know, some universities, some schools that don’t even have plus minuses, that gives you a lot more freedom to say, well, maybe I don’t actually want that many grades in my assignments, or maybe I want to have students to focus on feedback. What are mechanisms for enabling that? You know, the mechanisms could involve grades. I like how Brett was mentioning earlier that you could have grades as a mechanism for helping summarize the feedback that’s in your system. But you have to be thoughtful about how do I communicate that still, especially when I have multiple grades in an assignment. But it is an opportunity to rethink, you know, what do I actually want my different parts of my grade to represent and what do I actually need in terms of determining the final grade? I would think about what are your constraints? And then based on those constraints, what could I work with inside that design space, whether you’re thinking about a number of hurdles, the height of the hurdles or some combination of those questions.

Brett [36:40] Kevin talked about some of the things I was going to talk about, so I will talk about what I expected Kevin to talk about, and it’s related to what Kevin said, which is there are varying levels of complexity you can have in grades. And I think it’s really important when you’re designing any grading system, be it a very traditional weighted average points-based system or something that’s mastery-based or broader or coarser like we’ve done or anything anywhere on those various spectra. Think about how much complexity you want to have for yourself as an instructor, for your core staff and TAs, if you have them, and for your students, and think about what value that complexity is adding. Kevin is absolutely right that having those four dimensions, like I have, adds complexity. His system is much simpler, having a single grade per assignment. I have made the choice that that complexity is worth it for me because it allows me to communicate a little bit more signal to students. But it also absolutely makes it harder for students to keep track of what’s going on. And it adds some complexity to the grading load for my TAs. And so I have thus far decided that trade-off is worth it. I will be constantly revisiting that choice in light of new information as we run this more and more. And for any individual instructor, thinking about what their priorities are and how that works at whatever scale you happen to be working at. It’s a little interesting that I tend to teach at the larger scale, but I’ve also chosen the slightly more complex grading system. Because you would normally think that it would be the other way around like the larger scale would drive me to a simpler system. And so, it can be done at scale, we know because we’ve done it. It’s not trivial and it absolutely requires effort there, but it is doable.

Kristin [38:29] All right, well, thank you so much for joining us, Kevin and Brett.

Brett [38:34] Thanks, Kristin.

Kevin [38:36] Thanks, Kristin.

Kristin [38:37] And this was the CS Ed Podcast hosted by me, Kristin Stephens-Martinez, and produced by Amarachi Anakaraonye. And remember, teaching computer science is more than just knowing computer science, and I hope you found something useful for your teaching today.