The Marmoset Project
Marmoset demo server
A demo marmoset server, currently available at
Anyone can sign up, and try to complete the provided exercises. More projects will be coming shortly,
and we'll be allowing people to set up their own demo courses and define their own projects on our
demo server. There is a web page and a piazza page for
using the marmoset demo server and demo course.
Marmoset Open source project, now on Google code
with a GIT repository
The Marmoset source code repository. Marmoset is open
source under the Apache 2.0 license.
Survey on collection and grading of student programming projects
A survey about to assess current practices in the collection and grading of student programming projects.
Results as of May 7th
Links to Marmoset papers and bibtex entries.
What is Marmoset?
a system for handling student programming project submission, testing and code review. We've been developing it at the University of Maryland for over 5 years, Last fall, we handled 74,000 submissions for a total of 172 different projects from 1,700 students in 27 different courses.
It works in all different programming languages, and is designed to work well with both very small and very large projects, such as our OS course in which a submission consists of tens of thousands of lines of code.
Marmoset is described as part of a talk given by Professor William Pugh
on Innovation in teaching software development skills Dec 2nd, 2011 at SUNY Stony Brook.
How most CS departments handle programming projects
The faculty/instructor posts a project description, along with some sample input and the expected output,
and are told that the project is due in 1-2 weeks (depending on the size of the project). Students start work on the project, ask questions
and get help in class and office hours. When they are ready, they submit the project, where it goes into some kind of electronic
drop box. No one looks at the submissions until after the project deadline, at which point a TA is given all the submissions and runs
them against both the test data they were provided with and some additional secret test cases, which sometimes isn't made up until after the project
deadline. Some submissions won't compile or run at all when the TA tests them, and the TA will have to determine if the submission was
completely broken, or if it had some kind of platform or system dependence which keeps the code from working on the test machine.
Perhaps one or two weeks after the project deadline, students get their grades back.
How programming projects work with Marmoset
The faculty/instructor posts a project description, along with some sample input and the expected output (typically formulated as
unit tests). Students work on the project, and whenever they want, they submit the project. Within minutes after submitting the project,
they can go to a web page where they can see the results of testing their code against the tests they were provided and any tests they wrote.
The results shouldn't be surprising, but the server testing will catch problems such as platform dependencies. For some languages
we also run tools such as static analysis and code coverage.
If the submission passes all of the public test cases, the student is given an option
to perform a release test of the submission. Perhaps this is the poker game project, and the student performs a release test.
They might be told:
There are 12 release tests. This submission passed 7 release tests, and failed 5. The names of the first two failed tests are "full_house" and "4_of_a_kind"
(the names of only the first two failed release tests are revealed).
Now, a student can think "oh, I think I know what I did wrong," change their code and resubmit. But performing a release test
requires using a release token. Students are given some number of tokens (typically 2 or 3) and they regenerate 24 hours after being used.
This has many repercussions.
- Students have an incentive to start early: the earlier they start, the more opportunities they have to
perform release tests.
- Students are told that when they learn that they failed a release test, they shouldn't try to first fix their code.
Instead, that should try to write a test case that replicates that failure, so that when they next perform
a release test they have some confidence that they will actually pass the release test.
- If students make an incorrect assumption that causes them to fail many of the instructor's test cases, they find out
before the project deadline and have a chance to ask questions and try to fix their code.
- When it gets down to the last day and the last two release tokens, it replicates much of the pressure that real software
developers feel to ensure the qualify of their code, and helps them develop good software development skills.
All tests are run as soon as the project is submitted, so instructors can see if students are having particular problems with a test
case. That might be because project specification is unclear,
the test case doesn't completely match the project specification,
the test case particularly challenging, or the material required to handle that case hasn't
be covered in lecture yet.
All the details of the test case can be revealed immediately after the project deadline, so students get full feedback on the project
before moving on to the next assignment.
Marmoset also support code reviews in the browser. There are several kinds of code reviews:
- in-progress reviews, done by an instructor or TA before the project deadline. This can
either be initiated by a student request for help (via a submission help-desk system handled
by Marmoset) or by a TA.
- An instructional code review after the project deadline. In this case, student submissions
are assigned to instructional staff.
- Peer reviews, where each student might be assigned to review the submissions of two other students.
All code reviews share the same properties. By clicking on a line of code, you start a comment thread.
The author can either acknowledge the comment, or respond with a request for a response (e.g., "I don't understand"
or "I disagree, ..."). Such a response would then be seen by the original commenter as a code review comment
they need to respond to. A thread is open if the last comment in the thread requests a response.
An instructional or peer code review assignment can also have a set of rubrics: things the reviewer
is request to look for and evaluate in the code. These can have check boxes, numeric scores or drop downs associated
with them, and also create a comment thread for discussion about the rubric evaluation.
The submit server consists of several components:
- A J2EE webserver (we've used Tomcat in our deployments), using servlets, jsp,
- An SQL database (we've used MySQL in out deployments).
- One or more buildservers. For various reasons, we never run student code on the machine that
hosts the submit server and the database. Instead, build servers connect to the submit server,
and present credentials for the courses they are authorized to build projects for.
If there are any appropriate submissions, the build server downloads them, disconnects
from the submit server, builds and tests the submission and then uploads the results to the submit server.
We can run multiple build servers for a course to provide both redundancy and quick turn around
time for project testing.
Because of the decentralized natural of the build servers, we can setup a single shared submit server,
perhaps even shared by multiple institutions, and let instructors setup their own buildservers for their own courses.
Marmoset was originally developer by Jaime Spacco
as part of his 2006 Ph.D. thesis under the direction of Bill Pugh.
For 5 years, various graduate students and lab staff members
at UMD worked on further enhancements. Starting in 2011, Bill Pugh and Ryan Sims began a major
revision of Marmoset, the biggest component of which was in-browser code review using GWT (Google Web Toolkit).
Marmoset compared to Web-CAT
Marmoset is somewhat similar to Web-CAT, another
tool you should take a look at if you are looking at web-based programming project submission tools.
Web-CAT has been around for a while, and has a lot of nice features and capabilities.
Below, we've tried to summarize the key difference between Marmoset and Web-CAT as we've observed them.
- Scalable, more secure build architecture
- separate build servers compile and test submissions
- avoids problems with student submission code DOS-ing web server (intentionally or not)
- Can use lots of build servers (we typically use 4 build server instances running on each of several machines)
- allows use of specialized build machines for courses that need them
- Provides release testing to limit access to instructor test data, and provide incentives for starting early, working in bursts, and developing good software quality skills
- Provides instructors with better overview of how students are doing on instructor tests, which instructor tests are giving students the most problems, etc.
- Submissions are automatically retested by buildservers when no new submissions need to be tested, inconsistent test results are flagged
- Student tests supported, encouraged and useful, but not required.
- Editing and submission of submissions in web browser
- Code review review system much more capable, with many features inspired by professional code review features (e.g., Mondrian at Google).
- click on line of code to create comment
- Each comment begins a conversation. Students can acknowledge or respond to comments with questions. If a student responds with a question, the TA sees it as something they have to respond to
- Supports pre-defined rubrics. Each rubric can can a title, a kind (checkbox, dropdown, numeric score or flag) and a comment.
- Supports peer code reviews, in which students review submissions by other students. Peer reviews can optionally be anonymous
- Provides both Java and makefile based unit testing frameworks.
- Apache 2 license
- More mature stable infrastructure
- Lots of plugins and configuration options
- More emphasis on TDD, urging/requiring student to writing their own test cases the achieve high code coverage
- More capabilities in assigning point deductions based on style and static analysis tools
- Provides testing frameworks for Java and other programming languages
- GNU Affero license