S1xE1: CS50 Tools with David Malan
In this episode, we talk with David Malan from Harvard University, Professor of the Practice of Computer Science in the School of Engineering and Applied Sciences. He teaches Computer Science 50, Harvard University’s largest course, with our conversation focusing on CS50 tools.
Our conversation focused on CS50 tools. An overview of the tools is in a YouTube video David provided. We spent most of our time talking about help50 and style50. Help50 is a tool that, when fed error output, returns a suggestion or question a student should focus on to help interpret the error output. Style50 is a tool to help students fix the style of their code by highlighting what to change. However, David emphasized that he wanted the tool to require the student to change the code themselves.
When asked about something awesome in CS he’d like to share, David talked about containerization, especially tools like Docker. In CS50, they use containers on both the server and client-side. He finds they are a great way to package up everything for students.
His Too Long, Didn’t Listen (TL;DL) focused on encouraging fellow teachers to see if someone else has already created an educational tool that would fit their needs rather than reinventing the wheel.
You can also download this episode directly.
Kristin: Hello, and welcome to the CS-Ed podcast, a podcast where we talk about teaching computer science, with computer science educators, to learn how they teach and manage their classrooms. I am your host, Kristin Stephens-Martinez, an Assistant Professor of the practice at Duke University. And joining me today is David Malan, from Harvard University, and he is also a professor of the practice of computer science. David, tell us about yourself, and what do you teach? How many students do you have?
David: Sure. My name is David Malan and for the past 12 years I’ve been teaching, among other courses, a course called Computer Science 50, otherwise known as CS50 here at Harvard. Happens to be our largest class on campus, but it itself is an introduction to the intellectual enterprises of computer science and the art of programming for majors and non-majors. So, we have about 800 students on campus each fall, as well as 200 or so students taking the class through Harvard’s Extension School or continuing ed program.
Kristin: And do you have–is CS50 the only thing you teach?
David: No, there’s a few variations thereof. The course itself is freely available as open courseware and also available to students around the world on edX. And in addition to the CS50 itself, do I also teach a variation thereof that we unofficially call CS50 for MBA’s, which is more of a concept focused course as opposed to a software driven course. So, whereas CS50 itself is very much bottom up learning, from nuts to bolts all the way up to higher levels of abstraction, how computer science works, how programming works; CS50 for MBA’s is more of a discussion-oriented course focused on concepts that pertain to the intersection of technology and business, like cloud computing and security and so forth. And then I teach another variation of the class as of this past year at Harvard’s law school called CS50 for JD’s, which is for aspiring lawyers, where we focus a bit more on the intersection of computer science with policy and the law. And then through Harvard’s Extension School, there’s been a number of courses over the years, most recently there’s been an introduction to technology, which is a broad-based introduction to concepts that folks encounter in their everyday lives. And another one for business professionals, a variation of the class at Harvard Business School.
Kristin: Wow that is a lot. So, our main conversation today is talking about the CS50 tools. I was at your last SIGCSE (Special Interest Group for Computer Science Education), Birds of a Feather, and that was just like a never-ending laundry list of tools. I thought I would first give you the opportunity to talk about some of the CS50 tools, and then we’d dive into some of the ones that I found more intriguing, for potentially adapting it to my own class.
David: Yeah, absolutely. So, over the years, thanks to CS50’s team of TF’s and full time colleagues, we’ve developed really a whole ecosystem of software based tools. Even though we have as best we can–as best we could–tried to resist the tendency of reinventing the wheel when we can avoid it. Indeed, we try to use off-the-shelf, or even commercial software, as often as we can for problems in the classroom that we want to solve, whether it’s administrative or pedagogical. However, over the years a few themes have emerged for which we haven’t found comparable solutions, and so we’ve developed a few ourselves. So, for instance, within the class to provide students with the uniform programming environment, without them having to struggle in the first weeks of the class with any technical difficulties on their own Macs and PCs, we have what we call CS50 Sandbox, which is essentially a lightweight IDE, or integrated development environment, in the cloud, complete with code editors and most importantly, a terminal window so that students have full-fledged control over a cloud-based server environment of their own. On top of that, have we built what we call CS50 Lab, which essentially adds to that sandbox environment the addition of markdown-based instructions; so that there’s very simple HTML-based instructions along the side of the lab, integrated into the environment. Similar in spirit to Code Academy, but as opposed to it being a proprietary platform, per say, the goal of CS50 Lab is to enable any teacher at Harvard, or well beyond, to create their own Code Academy style lessons using just a free GitHub account, and get free GitHub repo in which their markdown is stored. And then lastly in the cloud environment, we have CS50 IDE, which is a more general purpose, integrated development environment; still cloud-based, but that does not hold student’s hands as much. It’s meant to be used in the latter portion of the semester, so that they have a full-fledged programming environment that is not tailored to a specific problem. And within all three of those environments have we deployed a number of command line tools, some of which might be of interest for your own class, or your own classes, among them a tool called style50. This is a command line tool, open source, written in Python, that anyone can install, even on their own Macs, or PCs, or any server environment. They are not in any way tied to CS50’s own infrastructure, and this is a tool that could essentially lints student’s code, so to speak, does static analysis of the style, the syntax of their code, how well commented is it, how well indented is it, and so forth. And style50 is a command that does not actually fix their code for them, as many of these command line tools do, but rather points out, using color coding and syntax highlighting, where it is they should add or remove whitespace, or add comments, and so forth. The pedagogical goal of which, is not to solve the problem for students, but to help them develop the muscle memory for actually improving, instinctively, the style of their own code. And then another tool that’s baked into all of CS50’s cloud-based environments, that can also be installed via pip, the command line in anyone’s Mac or PC, is a tool called help50. The design of this tool is to try to translate, what are too often, very arcane error messages that you might see from Clang, or Python, or even javac, or any number of other compilers or interpreters, into more TA-like rhetorical feedback. So, if you see an error message on the screen that’s referring to some undefined symbol; in the first week or two of the class, this might be completely non obvious to a student what a symbol even is, let alone what it means to be undefined. But we have not–we have not pedagogically wanted to hide those messages, or sort of simplify them for the student, to the exclusion of them seeing those real-world arcane messages.
David: So, what help50 does, it essentially pipes the output–the standard output–or standard error of standard programs into the standard input of help50, at which point we use some regular expressions, essentially, and search over a corpus of helpers that we’ve created that then highlights that arcane outputs in a standard tool and then provide TA-like feedback. “Oh, did you perhaps forget to include this file at the top of your own file?” Or the like the kind of feedback that you would expect a human, a good teacher would give in-person. And that’s completely extensible, we’ve written a bunch of these helpers for C and Python, but it can be generalized to any language as well.
Kristin: I have so many questions, I don’t even know where to start. Let’s go with the problem that I face the most, which is very much along the lines of help50. So, I teach–also–intro computer science, and we try very hard to make sure that the class is based on no prior coding experience, and if students have coding experience, we often encourage them to take the classes afterwards. And so, we’re starting with students that are very novice, very know-nothing, and I try to make it a like, “We are here to learn together, we will be gentle with each other, and we will not think that we are all stupid that we don’t know something.”
Kristin: And so, the class is only in Python. And CS50, what was it, help50? Sounds like something that would be very useful for the students to really start, when they’re first trying to understand what an error message means. My–our–coding environment is a little different, though. So, currently we’re in Eclipse. I do plan to change that eventually. I just–it is not–was not the highest thing on my priority list when I changed the class–the last time I did. Which means that we’re not really on a command line when we’re running Python.
Kristin: So, my plan is most likely to transition to an IDE, like PyCharm or something like that. Could help50 still work in that context? Because it sounds like you have to basically do something along the lines of either run help50 in the command to run–to run–the python code, or somehow pipe it into the tool. But that doesn’t quite work when they’re not really running it from the command line. They’re like hitting some button that says run the code.
David: Correct. So, you can also go to help.cs50.io, which is a web-based variation thereof, works exactly the same fundamentally, but allows you to copy-paste the error message into a simple web UI, at which point you see exactly the same output as you would at the command line.
David: We ourselves don’t really use the web UI. We simply make it available, partly for testing, partly for this use case, but you could certainly do that. So long as the modals or the error messages that the students are seeing are–can indeed be–highlighted and copied and pasted.
Kristin: Okay, yeah. They totally have the ability to go into the–the command line output and just copy, paste it, and stick it somewhere.
Kristin: So, I think that could work in my class. So, I guess the other question I have is, how much do students actually use the tool? Because there’s a difference, from the literature that I know of, between a student kind of being given the hint without really given the information without doing anything versus, they have to opt in to ask for it.
David: Yeah, I know this was probably one of our most impactful tools a few years ago, when we first rolled it out. Used on the order of thousands of times during the fall semester by hundreds of students.
David: And so much so, that we actually did, the first semester, see a market downturn in the number of questions that were being asked on the course’s discussion forum.
David: Presumably, we think because the tool was preempting what those questions would have been, because, indeed, the types of helpers we’ve written really are for the most common error messages, the ones that effectively become FAQs. And the fact that students can answer those questions themselves just by running a quick command, really put a downward pressure on the inclination to ask those same questions.
Kristin: Hmm. So how do you introduce the tool to the students? And do you–how do you encourage them to use it? Especially, if you keep in mind, that like in my context I might be like, “Go to this web browser, and then copy-paste your error in there and see what it says.”
David: We did two things. One, I demonstrated it in lecture, as really the recommended first-past solution, to an error message that a student doesn’t understand. And we then bake it into the problem set, or the homework assignment specification, reminding students that they can get help by running the command in this way and the command too, is meant–there’s a couple of design goals of it. One, we as best we can, try not to teach students how to use CS50 specific tools, because of course a few months later they’re not going to use them, or exist in the real world for them. And so, it’s very deliberate that every command that is CS50 specific ends with the number CS50, to make clear that this is not a standard Linux tool that you’re gonna encounter in the real world. But two, the usability is hopefully quite straightforward. If students are in the habit, as they are for us, of building their code with Make, which in turn uses Clang, they’ll often just run Make foo. And if that triggers some arcane Clang error message that they don’t understand, they can then change the command to help50 Make foo, which will then wrap the command’s output, send it to our server, parse it, and actually then translate the message for them. So, it’s pretty low impact to actually run the command, certainly from the command line, and it’s not all that big a deal to highlight and copy-paste into the web UI, either. So, I don’t think there’s ever been a sense among our students that using help50 is a sign of weakness, indeed, it is the officially recommended way to wrestle with an error message you don’t understand.
Kristin: Cool. If I wanted to add to the tool in some way, would I be able to do that relatively easily?
David: Yes. There is a–it is entirely open source. We are literally working this summer on increasing the documentation, and the samples that are available online, so that everything will be robustly documented, well in time for SIGCSE 2020, for instance.
David: And you have a couple of approaches. Either, one, you can clone the GitHub repository in which it’s hosted, and then simply submit a pull request, if you’d like, to make it part of the official corpus of helpers, particularly for other languages that we might not have written as many helpers for. Or, via command line flag, you can also run it locally, or even the server could you run locally, if you don’t want to contribute to the publicly available corpus, but you just want to run it locally on your own machine or server. So, you, with a command line argument, you can, say, use these helpers instead of the default ones.
Kristin: Awesome. This definitely sounds like a tool that I might adopt into my class, just cause, there are definitely times where I think–I think there’s a barrier in my class of students asking each other for help, because I wish that I saw more activity on Piazza, which is the forum that we use.
David: Yeah, no, I think all the more of these self-service tools have been pretty empowering for students, and the fact that they can just stop using them by no longer running that command in front of their own command, has been a nice way to just take the training wheels off oneself. Rather than, our presumptuously, for instance, translating all arcane error messages to human friendly format, we want there to be some opt-into it so then we can take them off themselves.
Kristin: Yeah. So, let’s talk about style50 a little bit. Why did you invent–well, this one seems a little bit more like reinvention of the wheel, because there are linters already out there. So, why did you decide to make your own?
David: There are. I mean many of them, for many languages, actually presume to do the restylization for you; and certainly IDE’s do that, if you click the button it fixes all of your braces, and your indentation, and your whitespace, and so forth. And we very specifically did not want to do that.
David: So, style50 exists solely for the purpose of telling students where we think their style could be improved. But the onus is on them to actually delete those characters, or add those characters, based on its output. We also wanted it to be a little more parameterizable, and we did not want to reinvent the process of linting itself, that would be quite the endeavor for any number of languages and their grammars. So, what style50 instead does is it just runs on top of existing linters. So, in a C world we use AStyle, which is a popular, age-old tool for doing the linting. And essentially what we do is–we actually do lint students’ code behind the scenes. We then do a character-for-character diff between their code and what the linter suggests that they do.
David: And then we interpret that difference and present it to these students in a way that helps them understand, “Oh, I should move this over here, I should add some whitespace here, I should fix my indentation here,” and so forth; and none of the tools that are out there, to my knowledge, provide that level of visual feedback for code.
Kristin: I feel like I must be old-school, because I haven’t used a linter in a while, and so like my memories of linters is very much: you run it on the command line, and it spits out like all the lines supposed to change, and how to change them.
David: So, A style will actually change your code for you. Check style in the Java world might work a little differently, because that’s another very popular one. But the ones baked into IDE’s typically presume to do the formatting for you, but there too, even the line-for-line wasn’t exactly what we wanted.
David: We really wanted to show the students the–their code, with some red, green highlighting some spaces, some signaling of deletions for them. And there too, we didn’t want them to have to wrestle with, “Well on line twenty-three and characters six you should do this.”
David: It really is getting in the way, cognitively, of what the problem actually is.
Kristin: Yeah. I definitely agree with that. Would–is that tool available in a non-command line-like form, just like help50 is?
David: That one is not, at the moment. Though, it would not be terribly hard to do so. So, I can take that question as a feature request.
Kristin: Yes. Take it as a future request.
David: Yes. Does not exist yet, but could soon, so let me see if we can slide that into our to-do’s.
Kristin: So, one quick–one curiosity I have is, I guess there’s two questions here. One is: why emphasize style? And I–you’ll be preaching to the choir, because I agree that style and teaching students good style is important. But I would like to hear your rationale for that. And, so that’s the first question. And the second question more has to do with detecting cheating. Because knowing how Moss works, if you get students to be more standardized in their style, you kind of lose a little bit of the power of cheating detection.
David: That is fair. However, I certainly don’t think we should be not teaching students good form, just so that we can detect instances of plagiarism more easily.
Kristin: Yes, I definitely agree with that.
David: That seems to be reversing the pedagogical goals in the classroom.
David: I think the more positive affirmation of it would be the readability of code.
David: Not only for the students themselves, but certainly for any future colleagues they might have in the software engineering world, and in the near-term certainly for the TAs who need to read their code or provide help.
David: Really, we’re–we at least are trying to help students develop good muscle memory, good instincts, and to be able to distinguish visually what good code is from bad, which they might not have those initial instincts for. And, indeed, I can think of innumerable instances in our own in-person office hours, one-on-one opportunities for help; where we, myself included, have walked over to the student’s screen, they have a question, because their code isn’t compiling, or if it’s not–or it’s not running logically–correctly, and almost every line of code is sort of left-aligned along the edge of the screen, and that is certainly not helping the situation. Seeing the indentation, and seeing the curly braces line up, as mundane as some of those details are, it just allows you to focus on the ideas, and not on the syntax. And so, both for the TA’s sake, and also for the student’s longer-term sake, are we trying to teach them good habits early on. Much like, I presume, I was taught years ago, in grade school, how to write an English paper. The capitalization of letters doesn’t technically get in the way of semantics, and I suppose I don’t need all those periods and commas; but that same good form allows you to communicate more effectively, and I think that’s no less true in the software world.
Kristin: Yeah, I completely agree, though, I’m having a funny moment, because I think I’ve been coding in Python so long that I’ve–I’ve kind of forgotten that tabs and indentation don’t necessarily mean anything in other languages.
David: I’m sure folks teaching Java as well run into this–this, too. Yes, Python helps this problem go away in parts.
Kristin: Alright. So, we talked about help50, we talked about style50, and now I–since we still have a little bit of time–I want to talk about IDE’s a little bit. Since you did have this–you have this whole infrastructure that you’ve built. More on the kind of philosophical side of–in the pedagogy side of it, just because I’m curious. So, the reason why I haven’t adopted an online IDE–and I know that there is, you know, there is C9, I think, and a couple other ones that are available through the web browsers–is because in my mind, I want students to walk out of my class at the end of the semester to be able to work on their own project, without being necessarily stuck in a web browser, or having an account that might disappear when the class is over.
Kristin: And I’m wondering what your thought process was to still decide to go with the build my own thing that is in the browser approach.
David: Yeah, so, CS50 IDE itself is actually based on C9 or Cloud Nine, that you mention.
David: And ours is essentially a set of pedagogical simplifications thereof. It’s primarily motivated by those first several weeks, where the reality is, among eight-hundred students, just probabilistically, there is gonna be a non-trivial number of technical support headaches that can only serve to turn students off and to frustrate when we want them to be understanding computational thinking and principles of software design, not on how to get some stupid command working on their own Mac or PC because of some annoying, technical support hurdle. Like that’s a valuable practical skill and the diagnostics thereof is compelling, ultimately, but not in those first several crucial weeks.
David: So, we do try to off board students by semesters, and especially by way of their final project in the class–
Which is the capstone experience in the class. We don’t strictly require that they transition to their own Macs or PCs, but the reality is most students end up doing Python-based final projects,
David: Or even mobile-oriented final projects for which they have to use Xcode, or Eclipse, or some Java-based idea–
David: To implement those final projects. So, it does tend to happen naturally. Though, this coming year, will we actually, more deliberately try to off board students, and even introduce them to a bit of gits. Long story short, for our own submission process, we have our own command, called Submit 50, that uses git, but underneath the hood, so that students needn’t worry in the first weeks about git adding and committing and pushing and God forbid merge conflicts. But we do want to give them a little bit of hands on exposure to that, but it will happen at term’s end, and so pretty much all of the scaffolding, all of the training wheels that we provide to students, almost all of that do we take off by the end of our twelve plus weeks with students.
Kristin: Huh.I also like how you–you take off the training wheels using the motivation of that final project, because that’s a very natural transition of like, this final project that you’re defining yourself, also as a time for you to start getting rid of the training wheels and really being able to do your own thing by the end of the semester. I like that a lot. What do you transition them to? Do you have like, a standard set of suggestions, like, “If your project is in Python, we suggest that you use this IDE?” Or anything like that?
David: To some extent. We typically have end of semester seminars, so to speak, that are led by the teaching assistants who introduce students to ancillary tools, that aren’t strictly necessary for the course, but are very real-world helpful.
David: Lately we’ve been pointing students at Atom and VS code–And the integrated terminal windows that they provide. And then installing whatever software tool chain you would need on top of that.
David: We’ve typically never recommended full-fledged IDE’s like Eclipse and NetBeans and the like–and the like, largely because I just find them to be so heavyweight, and students then really learn how to use the tool, as opposed to the language. The one exception frankly is Xcode, where for iOS development you pretty much need to use that ecosystem. But for the most part we’ve taken the more Silicon Valley-style approach of having students use an extensible editor, and Atom, and VS code especially, are probably the front runners.
David: The latter gaining all the more steam of late, and pointing them at various plugins and such to extend the capabilities of those editors.
Kristin: So, do they have–do they use the command line within those? I don’t really know these tools, so I’m wondering, do they have access to the command line inside the tool, or do you also give them a terminal tool of some kind?
David: The former. Both Atom and VS code have a plug-in ecosystem, and one of the first things we have students always add is a terminal window.
Kristin: Got it.
David: So we have–with X code aside, which we don’t officially use in the class, but some students do opt to use for their final projects, for iOS applications–we always provide students with a terminal-based environment in some form, even though they now have graphical editors in which to write the actual code.
Kristin: Okay. This is now making my decision of what IDE to transfer out of, after Eclipse, more–much harder. Now I have to think this through.
David: Well, for what it’s worth, probably, the cool way to do it these days would be VS code with a built-in terminal window and then any settings that you would want to distribute to the class to help students configure their environment in at least some familiar or standardized way.
Kristin: Alright, I’ll keep that in mind.
David: The point we’re hoping to get to is to actually have students, some number of semesters from now, actually use a client-site editor like Atom and–or VS code, but have the requisite software tool chain installed locally. In our case that would be Clang and GDB and a few others. The problem, of course, is that while some operating systems make this relatively straightforward, certainly a Linux distribution, or even Mac OS these days. Windows has been getting better, especially with its Windows 10 Bash subsystem. But there are just, invariably, technical support challenges. Docker and containerization has made this a little easier, so we fancy a world where students will download the equivalent of a virtual machine or container, run that behind the scenes on their computer, and then use whatever their preferred text editor is. However, that too, is not without its technical support challenges now. So, I think it’s a little too bleeding edge to use on scale, unpleasantly.
Kristin: I definitely liked the model that you’re currently using, though, of using a browser-based IDE, and then–so the barrier to entry is very low, and slowly ramp them up by–more framing it as slowly taking off–taking off the training wheels, so that by the end of the semester they are ready to go as if they never needed that in-browser thing before. So, I–I like this model and it’s definitely giving me something to think about for the next time that I feel it’s time to update 101 in the class.
David: Makes sense.
Kristin: Alright. So, let’s transition to our next segment, where our guests share something or someone from computer science they think is interesting, though maybe not necessarily as well-known as it should be.
David: So, containerization, as incarnated by tools like Docker, I think is the best thing since sliced bread when it comes to architectures and platforms. We, for instance, use containers these days to run all of CS50’s server-side web applications. Containers, similar in spirit to virtual machines, allow you to sandbox your environment, so to speak, so that you can install all of your own software, and dependencies, and libraries without affecting some other application. But we also use containers on the client side. So, for instance, as an alternative to CS50 Sandbox and Lab in IDE, we also have a tool we call CLI 50 for command line interface 50.
David: And this is simply a tool that provides students with their own local copy of a Docker image, called cs50cli, which is exactly the same software as all of our cloud-based environments. This, therefore, allows students to run all of CS50s tools and all standard Linux utilities on their own Macs or PCs, while still using their own text editor or even IDE, but having a standard build or execution environment. I have not seen this being used in terribly many CS courses out there, just yet. Virtual machines are perhaps in use a decent amount, but those tend to be fairly heavyweight, slow to start, annoying to maintain. Docker is amazing, and it has been game changing, I think, for us both technologically and pedagogically.
Kristin: So, if you’re gonna do Docker for your students, what do the students need to actually install in their machine, then?
David: They would install Docker itself, which is free, open-source software that can run on their own Macs and PCs.
David: It’s sort of the equivalent of what a hypervisor is in the virtual machine world.
David: Then they would run, quite simply, a command. They would, quite simply, run a command like “pip install cli50” and that would automatically download all of the requisite dependencies, including the requisite Docker image. And then by running cli50, they would suddenly get on their Mac or PC, a Linux Bash prompt, inside of which is all of our own standard software. But a teacher could also create his or her own Docker image, post that to some repository online for free and students could pull that down as well. So, I’ve actually been encouraging some colleagues to consider Docker for some of our follow-on classes, especially systems level classes, where you really want to provide students with a certain set of tools–tools, maybe a specific operating system and the like.
David: But it’s a lot easier just to package it up for students, rather than have them run a dozen independent commands, which is invariably going to lead them to have slightly different version numbers from each other over time. So, it’s a great way of packaging up core-specific software for client-side use.
Kristin: Huh. Definitely something else to–to start thinking about. Alright, so. Let’s close out with Too Long, Didn’t Listen, or TLDL. What would you say is the most important thing you’d want our listeners to get out of our conversation today?
David: There are a lot of educational tools out there, that are open-source and are solutions to problems that a lot of us in CS education might have. And before inventing your own, I would encourage teachers to look for someone else who’s done it already. Despite the number of tools that we have developed for CS50 and others, those are all in response to our having engaged in that process and not quite having found the fit that we need. But there’s a lot of ways to stand on other’s shoulders just as we–we do here.
Kristin: So, where do you think people could go to learn about all the possible tools?
David: The best place to go would be to cs50.readthedocs.io, where everything is documented.
Kristin: And if I recall, there was also that SIGCSE Birds of a Feather, where we just sat down and talked about every tool that we all were using as a collective.
David: Yes, a few of CS50’s tools got mentioned there, but the value of that session was that there were so many tools from others that were being introduced as well. That was a great session.
Kristin: Yeah, I’d like that session. Alright, so, thank you so much for joining us today, David.
David: Sure. Thanks for having me.
Kristin: And this was the CS-Ed podcast hosted by me, Kristin Stephens-Martinez, at Duke University, edited by Susanna Roberson, and funded by a SIGCSE special project grant. And remember, teaching computer science is more than just knowing computer science, and I hope you find something useful for your teaching today.