S1xE2: Designing Exams with Dan Garcia

December 23, 2019

In this episode, we talk with Dan Garcia, a teaching professor at UC Berkeley in the EECS Department. He was selected as an ACM Distinguished Educator in 2012 and ACM Distinguished Speaker in 2019. He has won all four of his department’s computer science teaching awards.

Our conversation focused on designing exams, which he boiled down to his five-finger rule: (1) material coverage, (2) reasonable time, (3) range of difficulty, (4) variety of question types, and (5) ease of grading.

His “something awesome in computer science” highlighted his mentors Mike Clancy and Brian Harvey, who are both emeritus teaching professors at UC Berkeley. Mike taught him about having a variety of question types on his exams. While Brian taught Dan his philosophy about grades and grading in general.

Dan’s Too Long; Didn’t Listen (TL; DL) summarized this five-finger rule into an excellent short sound bite.

You can also download this episode directly.

Transcript

Kristin: Hello, and welcome to the CS-Ed podcast, a podcast where we talk about teaching computer science, with computer science educators, to learn how they teach and manage their classrooms. I am your host, Kristin Stephens-Martinez, an assistant professor of the practice at Duke University. And joining me today is Dan, a teaching professor at UC Berkeley. Dan, tell us about yourself. What do you teach? How many students do you have?

Dan: Sure, hi! So happy to be on your podcast! So, as a teaching professor at UC Berkeley, we typically teach the lower division classes, maybe the teaching techniques class and the ethics class. So, that’s kind of my rotation. Usually it’s one big class and one small class, where our CS – I’m most in the rotation for CS0, our – the course that we invented, The Beauty and Joy of Computing. That’s around 200 or 300 students a semester. These are semesters, not quarters, by the way, 15-week semesters. I also teach our CS1 class although we call it CS1.5 because it’s just so much in one class. This is CS61A, The Structure Interpretation of Computer Programs, and this has now–I taught at the spring to 1,300 students, and in the fall it’s going to be 2,000. So, I believe it’s one of the largest classes in the country.

Kristin: Wow… wow!

Dan: And I–and I’m in awe of people like John DeNero, who teaches it in the fall, because that’s when it’s 2,000. I teach it in the spring. I just recently tried it for the first time in 20 years or so, in the spring it was only thirteen–only 1,300. So, I’m thankful that it wasn’t two–2,000. And by the way our largest classroom on campus is 800. So, we’re challenged. We could talk about that; most people do just watch it on YouTube, actually. I also teach our architecture class, we’ll call it CS3, you know after from 1 and 2, Data Structures being CS2. So, our CS3 architecture class is called Great Ideas in Computer Architecture! That was coined by David Paterson, a Turing Award winner, and that’s around 1,000 students. I also teach an animation course that’s to–exactly 24 students. We make two teams of 12 and we teach for a full year. So, it’s actually two back-to-back semesters.

Dan: And I teach them for a full year, every other year, to try to keep the animation spirit going, and community going, in our–in our campus. I also have been in the rotation of the ethics course. That’s to about 250 students, and that’s called The Social Implications of Computing. So that’s–it’s a lot of fun–it’s a lot of fun! And typically, the small–so, the small–the small classes, you know, that all of us are experiencing these enrollment pressures. So, a small class used to be, you know, a teaching technique class to, you know, 25-50 students. And now the small class is the ethics class to 250. So, it’s–the big class is a 1,000 and a small one is 250. But we’ve been team teaching that, so it actually isn’t that bad, it only meets–it only meets once a week, so it all isn’t that bad.

Kristin: Could you talk a little bit about what a teaching professor is? Because I know the UC system is kind of unique in that perspective from a lot of universities.

Dan: That’s a great question. So, a teaching professor is a faculty rank that doesn’t have research as the kind of main thing that you use for promotion. The criteria for promotion are teaching quality, curriculum development, innovative curriculum development on the local and the national scale, outreach, scholarship as well. But it isn’t necessarily papers. So, you can have impact, it’s–Charles Isbell of Georgia Tech always talks about impact as the key measure of a faculty member. So, the impact could be: maybe you’re doing a lot of best practices, maybe you’re sharing software. So, the nice thing about teaching professors is that there’s a little bit more–it’s not just the standard research as 90 percent of it. You know, maybe that’s not true, but. And then service a little bit, and teaching a little bit.

Kristin: Let’s talk about exam creation. This is something that I’m always interested in talking to people about; because my dissertation focused so much on how do students predict the output of code. And I looked at a lot of wrong answers and so, I feel like I’m pretty good at creating good multiple choice distractor questions at this point

Dan: Oh, okay.

Kristin: But I’d love to hear your process for creating an exam.

Dan: I think this has been a recent movement. I certainly have been bitten by the bug to move toward multiple choice exams, rather than open ended exams. Finding–I spent five years on the CS Principles Development Committee, and there were many, many cases where I felt like, You know what? These–if you have good enough distractors that common answers–the common misconceptions that students put in–and then use–use those as the distractors, obviously. But, you know, people who’ve been doing this for enough times know that people will switch in for an or, or get a true or false wrong, or switch the order or not.

Dan: So–so, we get pretty good writing, pretty good distractors, hopefully. And if not, you know the students win. So, it means it’s not the perfect assessment, but the students win if the distractors aren’t perfect. So, you were asking about the process. I’ll just give you some sense of scale. I–I really–I enjoy writing exams, so I’ll start by saying that. I don’t know what it is. Maybe it’s a puzzle guy in me. Like, I get to be the puzzle guy. I get to be the person thinking about how to create it. It’s really–I think it’s an art form. How to create a really good assessment question that gets to exactly the essence of what you–you know when you created a bad question, because either it’s all right or all wrong, or it’s confusing and you have to throw it out. I mean there are 100 ways to create bad questions, but a really well-crafted question, there’s some beauty to it. So, I spend a lot of time doing that, and I’ll–when I have–when I know I have an exam coming up to write I’ll–I’ll use every spare moment thinking about questions. It’s really fun! Like, this is why I love my job so much. It’s that my idle thoughts are thinking about fun things. How do I–how would I ask–ooh, okay, what if I did–and something in the shower, on the commute on my drive. I mean I’ll–if–I usually take public transportation to work–but I’ll be on a drive, if I happen to drive one day, I’ll be on a drive to work. And by the time I get there, an hour has gone by and I’ve like–I didn’t notice it. That’s what I’m saying. Like you know when you’re having a good time, when your thoughts are just so pleasurable, that you don’t even realize the time is passing like that. So, if I have the sense of it. Okay. Well, I didn’t have enough time to prep for it. Now I’m–now I’ve got an exam in a couple of days, now it’s kind of crunch time. This is when it’s not as fun, right?

It’s a lot more fun to have to have a barbecue cookout and eat all you want, but it’s not as fun if I say, “Now, in 20 minutes, you know, eat 25 brisket sandwiches or something.” So, if you–if you have–if you–if you have time, you know, time constraints, it really does take the fun out of it. So, I really do try to try to think about, you know, giving a lot of lead time, so I can have that spare time for that. But if I had to crunch on it, it takes me–so, here’s how it works: for an hour of assessment–and typically what I do with my midterms is I give two midterms and a final, and my first midterm is an hour long, then my second midterm is two hours long, and my final is three hours long, so it kind of goes one-two-three, and I actually–the point value also follows in that one-two-three pattern–it takes me about a full kind of weekend day, like a full, twelve-hour day, and this is, you know, when you have to borrow time from the family to do this, so it’s hard. And during the semester I’m fortunate to be able to move my schedule around to kind of have Tuesdays and Thursdays as my workdays. I say workdays, it’s like–it’s like when I get to write exams. Like I’m in meetings and teaching on Mondays, Wednesdays, and Fridays, and Tuesdays and Thursdays are my workday. So, I’ll use a–a full Tuesday and Thursday, which means a full eight or nine-hour day, then with the family in the evening, and then go back to work. So, it’s about 12 solid hours to create a one-hour assessment. So, for me you know if I–if I know I have my first exam coming up, I need a one full day for that, I know I need a Tuesday and a Thursday for my second midterm etc. So, I think of that. That’s a that’s a fair bit of work. And even though I’ve been doing it for 27 years, it takes me that much time. And maybe it’s because I’m slow? I mean, I always–I’ll always just credit that. Maybe it’s because I put a lot of effort into it, I usually don’t. I usually try to write a new exam every time, I try not to borrow other, you know, older exam copies, you never know when they got out. You never know how unfair that’s gonna be for somebody that might have a copy of it. I do lean on that every once in–I mean, the nice thing about it is, I now have enough old exams that if I–if push comes to shove and I got sick or something, let’s say I got mono for the week that was me prepping, I couldn’t do anything; well, I could always lean on an old exam and grab a piece from this year, grab a piece from that year. So, it’s nice having those resources, and no student’s gonna go through, you know, 25 years of this thing, so I can grab pieces from other folks. There’s a lot of exams online, so I can go online, even grab my own, grab some else’s question that I really liked and put that in.

And maybe modify it a small–in a small way. So, the point is it’s a long time. So, it’s a long answer to a short question which is: it is a fair investment of my time to build these exams.

Kristin: Yeah, if I think about how long it takes me to write an exam, my exams are 75 minutes for both midterms, and it takes me probably about the same ratio. Like I can’t do 12-hour-a-day, make an exam, like, I have to spread it out over the course of a week or two, but it takes me 12 to 15 hours to make an exam.

Dan: Yeah. Yeah. I mean, it takes a while, and especially because to make a really good exam, if it’s any piece of code there, you want to write it yourself, you want to play with it, you wanna make sure. I mean there’s nothing–as I said, there’s a lot of ways to make a bad exam and one way to make a bad exam is to never test it, right?

Kristin: Yes.

Dan: “Oh, here’s a basic code. No, well I had nine errors, I didn’t realize it.” Yeah. Yeah, you got to test it, and that’s how you often make exams. You make an interesting question, and then you pull pieces out and you put that in there. But it–but it isn’t always–I mean you mentioned earlier, that sometimes, you know, you ask students to say, predict the output. That–that is something I want to talk about because that’s just one of 15 ways you can address coding exams.

And–and it’s funny as–if people–we spent a fair bit of time in our teaching techniques course, actually, kind of the advanced one–the one where I’m preparing future instructors–talking about exam writing, and how easy it is to fall into a trap of only–I’m not saying you do this–but only writing “predict the output” kind of questions, or “write the function” kind of questions. “Write a function that tests to see if a list is all in order.” It’s just really easy to fall into the trap of just “write a function, write a function, write a function.” Or “predict the output, predict the output, predict the output.” And there’s so many different elements to that, that–that I think are really important to add. So that’s–that’s the other piece of it is that I’m thinking about, as I’m writing the exam, are all these parts of my brain are firing, thinking about “well, that question was a predict the output. So, that means I owe the students a right a this, and that that I also owe them a debug, I owe them.” So there’s a lot of other elements I try to hit in my–and that’s kind of–I have this, I’m going to get to–have this five rule of–the five-finger rule, that as I’m writing it I’m saying like, “am I checking all five boxes for my five-finger rule.” So, I’ll talk about that in a moment.

Kristin: So, what is your five-finger rule?

Dan: Ah!

Kristin: You can go straight into that.

Dan: Yeah, I’ll jump right in.

Kristin: If you’d like to talk about it.

Dan: Sure, I’ll jump right in. So, this evolved over the course of–of years. And I haven’t seen this any other place, so I’m happy to share this, and I’m also happy to learn, and realize I’m missing a finger and it should be the sixth finger. So, and this isn’t really in any order, so it’s the first piece of it. This shouldn’t be kind of bulleted as number one is the most important. The first is the most obvious–and some of these are gonna be obvious to most seasoned instructors, so I don’t claim to have cornered the market on this idea, but, you know, the kind of–all of it together, maybe this is a unique way to look at it–so the first is coverage between the last exam and your exam. You’ve had some material, and you have to obviously make it very clear what is in scope for the exam. And then you have to decide how much time you’ve been putting into each of these topics and how much–to then assess–how much the percentage of both time and points are gonna be reflective of that. So, as I said, there’s a 100 ways to make a terrible examine, and another way to make a terrible exam would be to highly emphasize and heavily point-weight the thing you talked about in passing on the last lecture, right? That would be crazy.

So–so you actually kind of make it balanced, in terms of the time given to the material during the–during the time, and–and the exam. The other question you have to ask about coverage is–is it–is it only–let’s say you’re talking about midterm two. So, you’re only being tested on the material between one and two, or is it, you know, cumulative. So, you need to think about those things, and make that clear to the students, that they need to not worry about midterm 1 material, because you were tested already. Or will you see that again? You know, if people–I love doing that, by the way, I love having students struggle. I don’t like having students struggle, I certainly want all of my students to ace every exam. I never do that to try to, you know–in fact, I tell my students, “If I ever–if I, at the end of the year, if my histogram isn’t all of you getting a perfect score, I’ve failed; because it means that I didn’t set up the ecosystem, I didn’t teach you–things that I did, or set up a learning environment to have you all succeed.” But the idea is you’re making it clear whether the material is cumulative or not. And it is interesting if it’s not. If you say, “No, it’s only since midterm one,” I mean, the easier answer is to say it’s always cumulative. It’s because you never know when there’s an element for a midterm one that kind of is needed for the question in midterm two–

Even though it’s not explicit. So, “No, you said it didn’t need it!”

“Well, but you kind of still–” So, you’re–I mean, I think the default is you always say it’s cumulative. You’ll heavier weigh the new stuff, but obviously cumulative overall. But I love doing–

Kristin: Yeah, that’s the–that’s what I use.

Dan: Yeah. I–

Kristin: Heavily weighted towards the more recent stuff.

Dan: Exactly. But the other thing is I really like–if students missed a question on midterm one–I really like the idea of a small twist on it, and you ask them again. Like, not exactly the same, but you ask them again. So that’s the first–that’s–the first finger is coverage. Second finger is time. So, many times when you beta test these exams, you should expect your TAs to take it in one-sixth of the time. So that’s really hard, and–and–

Kristin: That’s really short.

Dan: Yeah, that’s really short. And in what you do–you kind of–you–you–it’s like, hope, you know, hope for the best but plan for the worst. So, you hope for a sixth, but you live with a fourth. But you’re really trying to lock it down. But no worse than a third. I mean, I really mean, that the slowest T.A. gets it done in a third of the time. I really mean that. And to get there I want to honor Julie Zelinski and Nick Parlante, who argued–who used to share, you know, best practices of making exams, and some teaching track sessions at SIGSCE, and they talked about a lesson they’ve learned–which when I heard that I said, “Oh man, that’s so right!” And I hadn’t been doing it, so I was really thankful that they shared that, which is cut out all the stories. I used to love setting up a question with, “Bob lives in a dungeon, and there’s dragons, and you–you want to fight the dragon!” And it’s this complicated story, and it has nothing to do–other than for me to set a context and motivate. No. If you want them to sort a list, make them sort a list! You don’t have to tell a story about the list, if the dragon–people, and the knights who all got eaten. No! Just have them jump straight into it. Take all the air out. What abstraction is, you know, one argument of abstraction is its detail removal. Jerry Seinfeld talks about the idea of taking all the air out of a joke. You take–remove every extraneous word, every extraneous pause, just to have the fewest number of words that get the point across. I think of exactly that with exam creation, with question creation. You want to take, and pump all the air out of the system, so there isn’t any flowery story that surrounds it. It is just exactly the minimum needed to go in and get in–and get into the meat of the question and then be able to work with it. You know, you certainly need to add constraints and boundaries around the outside of these problems sometimes. So, it’s about taking all the air out and having minimal wording. If there’s more than a paragraph to read before you’re ready to go, it’s too much.

Kristin: Yeah, I think I’m–I’m a little guilty of having a little bit too much backstory–

Dan: Yeah, no, no–

Kristin: in my exam questions. And I’m getting–

Dan: And–

Kristin: And I’m trying to get better at weeding it out.

Dan: That’s exactly right. And I was bad when I was young as well. I think it’s one of the things you gain with experience. Part two about time is the ratio between points and time should be about consistent.

So yes, as I said, you can make a bad exam where you have one question worth the whole thing, that takes no time, and you either get it or you don’t. That’s–that’s harder. So, you want to make sure that, you know, the ratio of where you’re spending your time in the exam–and this is why it’s useful to beta test these exams with a lot of your staff before–is appropriate, and ask them to look at that. Ask them to have that lens and say, you know, when you’re giving it your, kind of quote-unquote, TA feedback, be a student and take it as a student. But then be a TA, and give me your TA feedback, and tell me–does it feel like the ratio is right in terms of points-to-time? And also, one last piece. I found it useful sometimes to put estimated times on exams. So, if I find that there is a really heavy question, they might not realize that you expect it to be, you know, half the exam, right? It might not be half things in terms of paper, but it’s half the exam in terms of where you want them to focus their time. So, if you put estimated time, if you say, you know, it’s a two-hour exam and this is a 60-minute question. If they, “Oh, I see! 60 minutes!” Then they’re doing that. That makes sense to–to be able to give them some–some–some heads up, that they’re not they’re not getting drowned. I mean a lot of issues with people performing poorly is that they get stuck on the first question and can’t get out, right?

They get down a rabbit hole, and all of a sudden, “Oh, boy, there’s no time left,” because question one was so juicy, or so interesting, or they couldn’t, you know, they didn’t take time to look up, they just got into it, and now they’ve they can’t finish the exam. That’s terrible. So, this is why–

Kristin: Yeah.

Dan: This is why you go for the sixth, or fourth, or fifth. I try to stress to young teachers, “Try to have everyone done with your exam by the end by the time the bell goes off.” It should be an empty classroom if that’s the case. You’ve done great on the time management part of it.

Kristin: Yeah, for me I also have my TAs beta test my exams. I actually do two rounds, like there’s one round–

Dan: Sure, sure, sure, sure.

Kristin: And then I fix it and there’s another round.

Dan: Yeah, we go through–we go through a couple of rounds as well.

Kristin: Yeah. And I–and I have them fill out a spreadsheet telling me, “How many minutes did take you to do every single question?” And then I’m doing all of this like, extra math, but usually I do multiples of–I multiply by three. I probably should do by four, because I’m definitely seeing from my past year of teaching that my exams are a little too long.

Dan: And that’s actually that’s my next topic, which is range of difficulty. And my first bullet point that I made notes from to–to remind me to mention is that TAs write too difficult questions.

And I say TAs, in the TAs and young faculty meaning, when I was a young professor I–my exams were too hard, and I was famous for having the hard exams. So that’s not good. Yeah, I took a little pride. “Ooh, I wrote the hard exams!” No, that’s not good. You want to have an exam that allows everyone, from kind of the A-student to the C-student, you know, I’m not going to really focus on the D-student–if you’re really a D-student, maybe you shouldn’t be in my class–but there’s an A to C, and that’s kind of the safe range for A to C. So, in the A to C student you want to have things that the C-student can grab a hold of. And at the end there should be questions that only the B-students and above can grab ahold of. And there should be questions only A-students can grab a hold of. And you should–you should even, within an exam, have the questions go from easier to harder, so that people don’t get stuck on the hard one and think, “Oh, God,” you know the morale is just done, “If I can’t answer this question I can’t do anything.” Well, it got easier after that. Well, that’s not as useful as having that hard question at the end. I think it’s great if you let this–let–let people know that it’s a hard question. I’ve written that, “This is a hard question. We don’t expect everyone to finish it.” People always get upset if you say we don’t expect everybody to finish it. So then, I’ve taken the last part out.

But if I do say, “This is a hard question.” So then if they say, “Ooh, I’m having trouble with this! Look, maybe I’m not the only one.” Because a lot of times people always feel–I’m not always–a lot of times students feel, especially students that are often underrepresented feel, like if–if they’re having trouble, they’re the only ones, and everyone else must be doing well.

Kristin: Yeah.

Dan: No, everyone else is having trouble too. Don’t–don’t, y’know–don’t–don’t think it’s only you. So, I internally label each question A through C. I try to put my Cs first. I want to also talk about partial credit. Partial credit isn’t part of my five, but partial credit kind of goes into difficulty, which is if you give–if you give a hard question. it’s nice if: a) it’s broken up into the parts where even a C or B-student can get some of the earlier parts of this hard question.

And so maybe it’s a three or four-part exam. And even, you know, it’s not like a–a B or C-student can’t answer anything of an A-question, you want to have enough parts to it that they can grab pieces of it and feel like, “Okay, I got this question a little bit. I couldn’t do that extra part, because that was the hard–that’s where you’re combining three things–but the early one I can get the idea of that.”

So, I try to do that. And that–that also gets to the sub-conversation of: you don’t want to have cascading questions. It’s hard to go back and have a student say, “Look, I messed up A, but see how B, and C, or D are consistent?” So, I have a really clever idea which I learned, which is if you have B make use of A’s result, say, at the beginning of the B-question, say, “Assume, from this point on, whatever you wrote in question A is called the variable A.” And so, in B, if there’s an expression, it’ll be–the answer will be as a function of A. So, whatever you wrote in the slot for A, I don’t–

I ignore that anymore, because the expression for B should have the variable A in it, and it can be right. So, even if you’ve got A wrong by, you know, 2 times too big, or too small, or a plus one–

Then B can still be right. So, it’s a way to kind of counter that cascading effect.

Kristin: Yeah, I do that with helper functions. Like, “Assuming helper function Part A works.”

Dan: Exactly.

Kristin: So, I use it in part B.

Dan: Exactly. Exactly. Exactly. That’s great.

Kristin: I want to be careful of time.

Dan: Oh, yeah. Sure, sure, sure.

Kristin: So, we’ve gone three, what’s your number four?

Dan: Four is–uh–and I learned this one from Mike Clancy four is types of question–so, we talked a little bit earlier about making sure it’s not just predict the output or write the function. But, you know, if you’re writing–it’s a coding exam. Think about: What does this function do? Write this function. Find the case that reveals–reveals the bug in the function. Fix the bug. Then what happens? What input triggers the bug? Given this output, what’s the function’s input? How about that? Kind of reversing it.

Kristin: Oh! I hadn’t thought of that.

Dan: Right? Ah! You learned something today. And feel free to come up with your own. So, I really like buggy questions where you can say, you know, “What input doesn’t trigger the bug?” What, y’know, what class of inputs? How about that? Thinking about it it’s almost like a language thing. What class of inputs doesn’t trigger the bug? What class of inputs does trigger the bug?

Now make the fix! Now make the fix and show me how it works. Or maybe there’s two bugs, and you fix the first one and you say, “Well, the first bug the numbers are negative for some reason, why is that?” So, you kind of give a hint of what the first bug is, but then fix that, we say, “Now there’s still one remaining bug!” And it maybe doesn’t work for lists that are length one, or something special, some special degenerate case.

I love the idea of Parsons problems. As I said, I moved to try to do multiple choice exams, so Parsons problems are that–when you get given a chunk of code, you put it in the right place–which works great for blocks-based languages–where, you know, I can have a snap file, it’s open, with all the blocks there, but you have to kind of reorder them in the right way. Harder to do that with exams. And so, we’ve been creative about kind of ordering things or setting them up. I really want to have full creativity. So, I want to be able to have students not–in the olden days we used to write questions with just, “Write a function,” and a big huge block. And it was impossible to grade. And that’s–that’s my last–my last finger, but I’ll jump to that. So, because we don’t want to have just one big block, where you can’t grade and there’s a million different possibilities, you try to have maybe lines, where the–where the template, or “If blank, else blank.” So, there’s some blanks in there to help them to control the–the variety of the answers you’d see. But also, allowing to say, “Here are these five things,” and actually saying, “Just order them, somehow the order is gonna do the right thing.” It’s kind of a fun thing to do that.

So… we’re talking about cascading… the other thing I want to mention is sometimes I–what I love to ask–and this is–I didn’t put this on my list–but one of my favorite questions is: what could the function ever do? So, we had like a–it was like a, you know, like an L system, like a recursive–very small recursive thing–it was like a dance. It said, you know, “If–if day equals 1, return stop. Otherwise–it’s like–if day is less than two, return stop.” Then it’s like, you say–well then it’s like–you make a sentence, and you’re–you’re joining it together of like, “Go left, and then day minus two.” It’s gone. It’s almost like a Fibonacci way of you calling yourself minus two and minus one.

So–so you join the output of day minus two, and so you say, “Left day minus two with right, day minus one and reverse–or maybe reverse of day minus one–and stop,” or something. And so then, you say like, “What could you ever–” So, this is a dance where you’re supposed to move left, or right, or stop, okay? And so, you–the question you would ask is: could you ever have three lefts in a row?

Kristin: Oh!

Dan: So, in a way, you’re–you’re asking them to like, see a pattern as you’re moving forward, and you realize, “You know what. You can have two lefts in a row, but the output of this will never have three lefts in a row!” So, there are some questions I’ve asked that are almost like at the junior level, but I’ve asked this to non-majors, and people can get it! People can say, “You know, the pattern, because,” I’m not asking for a proof, right? I’m not–that would be the kind of discrete math that theory type classes–but I’m asking, you know, could you see that it would ever have–and know that this output would never have three lefts in a row. That’s a really interesting kind of way to ask a question that’s above and beyond this, “What does a function do?”

Kristin: Mhmm.

Dan: There. That’s bullet–that’s the thumb–thumb number four. Finally, thumb five is easy–and this is your thumb, by the way, this is the last finger, your thumb–easy to grade!

Kristin: Yes.

Dan: And this is why I’ve moved–I mean, this is–I did not know this early. And we–I have stories, I’m not joking, I have stories of teaching a CS1 class, starting grading at six, and finishing at 6 am, and every TA stayed in the room, and we were all dead the next day. The TAs didn’t have the wherewithal to say, “You can’t make me miss my classes next day, you know, I’m a student too.”

But we did and that’s what we all–we just did that–this was what everyone was doing! Right? No one–no one knew better.

Kristin: Mhmm.

Dan: So, we learn better, to say, one: use Gradescope. Gradescope is this amazing tool that happened to come out of Berkeley–but it’s not only that I’m reasoning on, I’m extolling the virtues of it–it’s an amazing tool that lets you scan your exam into PDF and then you can grade it from anywhere! And so now you–

Kristin: Yep.

Dan: Everyone–and if I write it in parallel! And you can update your rubrics! There’s like a hundred reasons Gradescope is amazing. And–

Kristin: I love Gradescope.

Dan: And they were bought by Turn it In, so now it’s a part of a larger company and they’re stable. But it’s an amazing, amazing, amazing tool. Please use it. So, this advocates–for me, at least–for multiple choice questions, because Gradescope can auto-grade multiple choice questions! I’m telling you, we give an exam, and an hour after scanning, it’s done. Like it’s–

Kristin: Yep.

Dan: I have one T.A. who kind of coordinates the boxes, and make sure the boxes look good, and pushes go, and it’s all done. I mean, it’s amazing. So, use Gradescope, think about multiple choice exams. Multiple choice exams are great, except that there are some questions that you cannot get. So, I would say 95 percent of the questions you want to ask you can write great distractors and make multiple–multiple choice exams work. And also, not just multiple choice, single select, but also make sure you have multiple choice, multiple select: where it’s not–it’s not just one of the four where they’re guessing from one to four choices, or one to five, it is check the ones that are true for. So that’s actually harder for them to kind of get, and you have to think about how to grade those. What we do as we usually do is do an XOR. So, the questions you can’t ask in multiple choice exams are trying to come up with an algorithm design. There’s no way that if I showed you five solutions you couldn’t just run each solution through and say, “Well, that doesn’t work, well that way–oh, that one works!” Right?

Kristin: Yeah.

Dan: It doesn’t make any sense to have that question multiple choice. That’s the space where you want them–their full creativity on display. So, if you want full creativity on display, that’s when you kind of have to have a big old white box and put your stuff into the box. So, multiple choice handles 95 percent of the cases that’s not like that, but every once in a while, you really want them to do some algorithm design, and that’s when you want–you want to eschew the idea of having multiple choice. And that’s it. Those are my five tips.

Kristin: So, as we wrap up, I want to make sure we get to our next segment, which is Something Awesome in CS, where you get to share something–or someone–from computer science you think is interesting, though maybe not necessarily well known.

Dan: Sure. So, I want to give an homage to the people who are my mentors when I was growing up. I think these folks have fundamentally and foundationally influenced who I am as instructor, at least with, you know, within the topic of–of–of exam design, but also, just in general, as a professional. These are my mentors: Mike Clancy and Brian Harvey at Berkeley. They’re both Emeritus Teaching Professors. What Mike taught me was don’t just write a function that–his exams had questions that were all over the map. So, really, kind of the thumb that–that talks about question types is really all for Mike. I got that to think about, “Could you ask a question where you had two bugs and then you fix the bug?” And I just love those questions, and I appreciated seeing examples of how to write great questions from Mike. Mike is one of the best I’ve ever–at writing these kinds of questions, he’s got really creative questions. I also want to honor a teaching–an Emeritus Teaching Professor Brian Harvey, another of my mentors, who taught me, number one, that–eschew grades. Like, grades–get grades out of the equation. Grades are gonna be an impediment to learning. So, if we have to deal with grades–so, if you have a choice don’t have grades at all–if you have to have them, try to be creative about them. He tries to always see, when he’s doing assessment of things, he tries to have an idea of, “Does this student have the idea? Does it–is a student perfect? Like, of a five-point question, five points is the perfect, four is: do they have the idea?” And being really soft about what “has the idea” means. What is the essence of that question? And, yeah, you get this wrong and this wrong, but you got the idea of “recursion is call yourself” or whatever. So, maybe the details of it are wrong, and you lose a couple points here, but you have the idea, so the bulk of the points you get for having the idea.

Kristin: Hmm.

Dan: Or you also might have had some idea. So, they were–they were not fully clue-full, but they were not clueless. They had some idea.

Kristin: Yeah.

Dan: So, you want to make sure that you sprinkle in there, so just having this rich nice range. He had this wonderful idea of group exams! Which we tried together, and it was harder to do.

Kristin: Oh.

Dan: Group exams, the idea that you have an exam, you launch it, and then you collect it; and then the students have already been working in groups, because you obviously–we have group teaching and learning communities. And so, we try to have our students work in teams to–to do some of their projects. You have the students sit next to the teammates. So, they’re the same people they work with already. And then you either hand the exam out again and have the whole team take the exam after everyone having everyone taken it individually–

They now take it as a team; or you could have, like, a question that was a hard question that wasn’t on the exam, and you give them some new question to take as a team.

And so–but the point is–you’re allowed to talk! And so, you look back and they’re just all talking and learning and they’re not cheating. They’re not like listening to their neighbors–they don’t care about that, they’re working together. And it’s worth a smaller fraction of the whole grades, so the whole grade might be 80 percent your individual, and 20 percent the group. But it’s really great to have–see people–and to see people: a) Work collaboratively on exams. I just love that idea. 2) Feel that the smartest kid in a group isn’t the person that always talks. Sometimes they’ll be listening and learning, and the score of the group was higher than any individual person’s score, also, which is really great.

So, he also was amazing at writing. Really, really clean-written solutions like he’s–he’s a wonderful writer. And so, you would look at his written solutions–there was so much teaching in that, it’s almost like–you felt that reading the solutions, as if he sat down with you and explained every point. He explained the purpose of the question, he explained what the question was trying to ask, he explained how to think of it, he explained two or three possible solutions, he explained how that was–how the–what the rubric is, why the rubric was what it was in terms of part, I mean, he was so good about writing these tomes, these documents, that were these written solutions. I really, really appreciate that.

Also, I also want to mention that Mike–I forgot this but I wanted to say–Mike also said students have trouble if you give them experience in lab, always with a computer, always with an interpreter, always with the compiler, always with a computer support, but then ask them in this strange paper and pencil sense to do something they’ve never done all year.

So, he would advocate for with-computer exams, and so we do that in our BJC course. We have a with-computer part of the exam where I take–

Kristin: Oh!

Dan: Almost like the harder question I would have done for that group exam–I take it away from the actual paper exam, and I make it be on computer. It’s still solo, it’s not teamed, but they get to work with the inter–with our snap programming environment and–and it’s great! And I can either give them a starting point where it’s half-written program say, “debug it, or fix it,” or say, “go from scratch and write this thing,” and it’s something that, again they can do within the time range, but that’s a fraction of their grade. So, I’ve appreciated–it’s hard to coordinate–but I’ve appreciated that the spirit of what I’ve learned from both of them in thinking of how to think of exams writ large. It took a lot of coordination to set up our first with-computer exams.

And there’s a whole–this is–this is a whole other conversation with Craig Zilles of UIUC, who’s trying to build a computer-based testing facility where there are no exams. You have this lab that’s staffed by professional staff, with cameras all around, and you push a button and an exam is generated for you, and this exam is–every two weeks you have an exam. So, there is no final exam in this model.

It’s–an exam is generated, and you take it–and it’s, you know, you have to write exam generators, but you take it. And then here’s a great thing: the next week, if you get it wrong, you can then go back, and it knows what you got right what you got wrong, you can go back and be given another questions that are–again regenerated anew, but–only the ones you got wrong from the week before you would take the second week.

So, but imagine you do this all semester. It basically can prove, in some sense, mastery. At some point you’ve shown some mastery of everything, and that it can be–there’s no final. These exams are smaller, lighter-weight exams. There’s a whole–

Kristin: Yeah.

Dan: There’s a whole conversation–which maybe you should bring up in another time–which is having one or two big midterms versus like an exam-quizzie every week, and no midterms.

You know, that makes you keep up, and you don’t have to–that’s kind of the model for lab-centric curriculum that Mike Clancy has been advocating for. It’s–

You’d never fall behind, because lab-centric material is like, you’re on that week’s lab. The idea of going to lecture, skipping lecture, and then cramming for the midterm, you can’t do in a lab-centric experience because you’re in lab with the TA, you’ve got to do that week’s lab. It like forces you to be up with the material.

So, in a way these kind of mini, small quizzes every week–the idea of having no final–is amazing; so that’s actually one of my research projects is, “Could I build a generator for my beauty enjoy computing class–

That would allow me to know mastery of my student along the way?” And at any point I could look to see my whole–all the–all my student work and see how they’ve done. And then, by the end, I know that my students have mastered all the elements. And maybe the last four weeks would be for repeating–taking repeated exams where you can keep taking the questions that, you know, the last month or so–the students are working on projects anyway, not a lot of new material other than kind of fun lectures–

So, could–could I have that be a time where the students are kind of taking mini-finals? Where like they’re one-hour finals, and the moment you aced something you show me you know that material, I don’t need to test you again on that.

So, you keep doing this so, by the end, there’d be no final exam. And there’d be no exams! And it’d be like this thing you just do in lab–you just do this in lab, and it’s kind of fun, and you mess it up–it’s like Khan Academy, you know, you mess it up so you drop back nobody cares. Imagine that model for all the assessment of the class. So, I’m working on trying to write auto-generators for questions and that might be the future. It’s kind of exciting.

Kristin: So, with our last segment, TL; DL, or Too Long, Didn’t Listen, what would you say is the most important thing you would want our listeners to get out of our conversation?

Dan: One, if there’s only one thing I can say, it would be that try to have a rubric for yourself as you look to exam authoring. I’ve come up with what I call the five-finger rule, feel free to adopt that, take a finger off, if you don’t, doesn’t matter to you, blah-blah-blah. But my five fingers are: Material coverage, try to make sure that the exam you’re asking covers the material in the appropriate way, equivalent to what is being emphasized in class. Reasonable time, try to have your TAs or beta testers take it in four or more times less than–than the actual student time, or even just give your students more time. That’s one way–there’s ways to deal with that. Difficulty range says that try to have a range of questions from a C, label them, be clear about them, put the Cs first, put the easier midterms first to allow people to have early confidence and not get stuck. Put question types–think about different question types, it’s like–it’s like Bloom’s–obviously in Bloom’s Hierarchy–but it’s that with an angle toward how you ask about coding and try to be creative with it. And ease of grading. So, think about at the end of the day, you’re going to have a staff member–and I think, by the way, I remember visiting Australia for a while and they didn’t have TAs, so they have a two-week grading period where every single exam is graded by the faculty member. Solo, final exam.

Kristin: Wow.

Dan: Yeah. So–so that these folks have obviously perfected the idea of easy grading, because it’s their own time, not their staff’s time; but thinking about that and thinking of your Gradescope can help you in thinking of multiple choice. And I’ve now been, you know, swung, I’m a multiple-choice fan now. Although, there are times where I pull back and I say, “You know what, this–this has to not be a multiple choice,” and I’ll–I’ll say that I reserve the right to take some of these questions off multiple-choice, which most of them are.

So, five-fingers: coverage, time, difficulty, question types, and easy grading.

Kristin: Awesome. Well thank you so much for joining us, Dan

Dan: My pleasure! What fun this was, this is great.

Kristin: This was the CS-Ed podcast, hosted by me, Kristin Stephens-Martinez at Duke University, edited by Susannah Roberson, and funded by a SIGSCE special project grant. And remember: teaching computer science is more than just knowing computer science, and I hope you found something useful for your teaching today.