Code Treadmill
June 2026
This Spring, I took over a popular classroom coding website, https://code-treadmill.com/. In this post, I'll document how I took it over, the enhancements I've added, and how this all became an awesome lesson for my students. At the end, I have some reflection about how an LLM assisted me in a lot of the enhancements.
Background - Inheriting Code Treadmill
Code Treadmill (formerly "way-to-code") is an awesome website that helps students practice coding problems in a fun, gamified setting. I've been a fan of Code Treadmill for a long time and have had great success using it with my students. It's specifically popular because it randomly generates practice problems written in AP Pseudocode[1].
I tried to use Code Treadmill in class this year and noticed that it was gone! After a bit of searching, I found This video by the website's creator, Brad Wray, explaining that he was proud of this website but had retired from teaching and had decided to take it down. I reached out offering to help, and Brad was excited to work together. Within a few days, Brad had transferred the domain and added me as a contributor to the repo.
This was an awesome practical lesson for my students in the power of collaboration on open-source software. I had lamented the loss of code-treadmill in class one day, and had already found the repo and set it up to run from a server in my classroom. Then I got the response from Brad during class and talked to my students about the process as we transferred the domain and setup a hosting provider.
Hosting, and how that got complicated quickly
As an immediate first step, I setup hosting on https://railway.com/, which is fast, easy, and cheap. You basically just point Railway to the repo on GitHub and it handles the rest. This worked pretty well, and Brad posted an announcement on a CS Teachers Facebook group that code-treadmill.com was back!
I shared this story with a colleague, who asked how hard it would be to add more programming languages. Code Treadmill was basically just a React app which only supported JavaScript (and AP Pseudocode, which it translated into JavaScript for execution), so all code execution was handled client-side in the browser. Our school does use a little bit of JavaScript but mostly emphasizes Python and C++. There is no way to support those in the browser[2], so I needed to find a way to send code to a server for execution. After a bit of searching, I discovered Judge0, an open-source and self-host-able code execution sandbox that fit my needs perfectly. Judge0 supports tons of languages and is easy to work with - you just run it in a Docker container, make a web request with the code, and it responds with the stdout/stderr.
Railway doesn't support docker containers, so at this point I migrated to a virtual private server ("VPS") hosted on Linode. I spun up the VPS[3], deployed Judge0 and code-treadmill via a single docker-compose file, and tested it out- it worked great! But when I tried to run a race live in class, it couldn't handle the load. I had 30 students in my classroom all trying to answer questions as fast as they could, and every time a question loaded they were making a code execution request to the Judge0 container. This was too much for my tiny Linode to handle.
Once again I turned this into a live learning opportunity for students - we
ssh'd into the server and ran htop to inspect the load on the CPU and RAM. Linode
makes it super easy to scale the server up and down, and it only takes a few minutes
to apply the changes. My initial $5/month server had 1 CPU core and 1GB of RAM,
which was way too small for the load. I switched to the next plan up ($12/month
for 2GB of RAM and 1 core), waited a few minutes for the server to reboot, and
tried again - it wasn't much better. We tried a few more times until we got up to
a plan that costs $48/month - that seemed to work better, but was more than I was
willing to pay, so I dropped back to the cheapest option and went back to the
drawing board. The nice thing about VPS pricing is that they only charge for the
amount of time you use - so experimenting with different server configurations
for an hour cost me almost nothing.
In the end, I decided to host from the media server that I keep at home. This server has plenty of CPU and RAM to spare, and I already had it setup to serve to the internet via TailScale connected to a lightweight VPS running Caddy. I wrote more about that setup in a dedicated blog post here:
Caching
I also realized that I could cut way down on the server load by caching. One of the cool things about code-treadmill is that problems are randomly generated. There are hand-made 'templates' which are filled in with random values. If a student gets a question wrong, it gives them the same question with new values and lets them try again. This makes sure that students don't just memorize answers - they actually have to understand the code to get the new problem right. To do this, code-treadmill defines its own template language. Here's an example:
let a = ##;
let b = ##;
let result = a > 0 && b > 0;
console.log(result);
The ## will get filled in with random integers between 1 and 7. There are tons
of placeholders for different types of values - documented here.
To preserve this effect, I added a system that generates random versions of each problem and caches them. The client browser keeps track of which version of each problem the student has already seen and will make sure to serve from the cache a version of the problem that the student hasn't seen yet. If they've already seen every problem in the cache (or the cache is empty), it'll generate and cache a new version.
This works well as long as the cache is already pretty big. I wrote some code that automatically "warms" the cache when the server first runs - it generates and caches 3 versions of every problem. The cache is stored in memory on the server, so this warming script needs to re-run every time the server restarts.
More Features
At this point everything was working great. We were doing a ton of review in my class as we were just a few weeks out from our end-of-year certification exam, so my students made for enthusiastic testers. This was an awesome experience for them - we kept a running list of ideas to add, and every class I would give them an update on which ideas I'd implemented and which were still in the queue - then they would test it out and give me a new round of feedback and ideas. See the bottom of this post for an extensive list of new features we added. It was fun for students to help design the tool that we were using to learn, and unlocked a lot of metacognition - we discussed which coding concepts they were struggling with the most, and the strategies we could bake into the website to help them practice.
Success
Thanks in large part to the review we did on Code Treadmill, my students did better than I expected on their end of year certification exams. I'm really excited about how well this worked and excited to keep using it in the future. I'd love for other educators, programmers, and learners to check it out and share feedback!
Reflection - LLM-assisted Coding
This as my first major project where I used Claude Code. I'd previously used LLMs here and there to answer specific questions, but this was a totally different interaction model, and it's incredible how well it worked. There is no way that I could have had this much success this quickly on my own.
I was transparent with my students about my LLM use throughout this project, and it sparked a lot of meaningful conversations.
This is a common topic in my CS classes. I wrote about my AI-informed assessment strategy in a recent post here:
And I'm working on a more complete blog post about LLM-assisted coding. This project informed a lot of it and inspired a lot of questions about what skills I should be emphasizing for my students.
One question that I kept coming back to was what it means to say that I'm proud of code that I didn't write. I certainly worked hard on this project (I was waking up at 5am for weeks to work on this every day before school) and feel that I met the standard I set for my students (I didn't write all the code myself, but I can explain it all). And although the LLM was great at generating code, I feel that I can claim all the ideas. I spent a lot of time making architecture decisions (eg thinking through the caching system), suggesting refactors to make code more maintainable, and writing the documentation. And I wrote this blog post myself - I still never use LLMs for writing, when I've tried I've ended up spending more time revising than if I had just written myself from the start.
Next Steps and Contributions
This is an open source project, so I'd love to have some collaborators. You can find the repo here: https://codeberg.org/code-treadmill/code-treadmill. I'm also working on transferring ownership of the domain to Social Justice Computing, a nonprofit that I'm on the board of and passionate about. My hope is that SJC can help fundraise for a more permanent hosting solution that isn't in my house. If you'd like to support that effort, you can make a (tax deductible!) donation here
One immediate next step is to re-architect the way I write and maintain problems. Right now, they're all stored in text files in the repo. If someone wants to add new problems, they would need to make a PR on the repo. My plan is to switch this to a database separate from the repo and add a way for any user to write and save their own exercises. The classic trade-off here is that I can't version-control the contents of the database - I'm still thinking through the best solution.
Features
Here's a quick summary of the features I've added:
- Code writing problems: Our curriculum heavily emphasizes code writing problems, and we write our assignments using Doctests[4] so we definitely wanted to support that. To make sure that students don't just "slime" the tests (cheat by writing the output that the test expects instead of actually computing it), I also added additional hidden tests.
- More problem types: multiple-choice problems and Parsons problems. My students were specifically struggling with multi-select problems (like multiple choice but there might be multiple correct answers), so I added those, too. I'm excited about including multiple-choice problems because this will allow me to replace Kahoot, Juicemind, and similar live review quiz type websites that I've used in the past but never loved.
- Race enhancements: Code Treadmill already included a race mode - the teacher
could select a workout and everyone in the class would race to finish it as fast
as they could. I made a few enhancements that made it work a lot better in my classroom:
- Synchronous Mode: I want to be able to move the entire class through one problem at a time - Give students a chance to solve the problem and then discuss it as a class. Students still get more points for getting the right answer faster!
- Bonuses to occupy the fast students: The issue with any in-class race situation is that some students finish faster than others. Inevitably, the students who finish early get bored, navigate to some other distracting website, and never come back. To preempt this, I added bonus questions to the synchronous races. When a student finishes a question and is waiting for their peers, they have the chance to solve as many bonus problems as they can to boost their score. These bonuses are each only worth a tiny score bump, but I found that my top students can fly through tons of them. I'm so happy with how well this worked at keeping my students engaged. I also learned that it can get out of hand - so I added a button that cuts off the bonus questions so that everyone would pay attention to our whole-class review.
- Security: My students are clever! The socket.io server that handled all of the race communication was handling everything in plaintext and without any authentication. When a student gets a question right, their browser sends a message to the server letting it know their username and their new score. Students quickly realized that they could intercept, repeat, and modify this message to change their own score, and even to change other students' scores. I added some simple encryption to make this harder - each socket client gets a unique id which is used to hash their score. We discussed this new system in class, talked about the vulnerabilities it still has, and I let them know that if they manage to get past it, they deserve any extra points they manage to give themselves =)
- Lots of little UI tweaks: Not quite as curricular, but the students really
got into this part, especially the themes.
- Brad had already setup a configurable theme system that the students loved, so I had to add my favorite theme, Catppuccin (my personal blog you're probably reading this post on also uses Catppuccin's colors!)
- For the writing problems I used CodeMirror, an easy embeddable IDE. Our students write most of their code in Vim, so I was excited to find this plugin, which made it easy to add vim keybindings to the editor.
- Lots of tooltips, modal explanation windows, and a thorough documentation page, which you can read here
- More languages, and more workouts: Judge0 makes it easy to add languages, so
I added all of the languages that we use in our program. I also added a few unique ones:
- Intel 8080 Assembly: One cool feature of our program is that students in their senior year spend a few months learning about computer architecture using the Altair 8800, one of the first-ever personal computers, which was coded by flipping a set of toggle switches to represent binary values one register at a time, using the Intel 8080 Assembly language. My colleague who teaches this class has a few Altair clones in his classroom, and students can also use This Altair Simulator to practice in a web browser. That simulator uses https://github.com/maly/8080js under the hood, an open source emulator that executes 8080 Assembly in JavaScript - so I had to add that!
- Number systems: as part of our curriculum and especially in that architecture class, students spend a lot of time practicing with conversion and basic math in binary, octal, and hex. I added some basic math problems to practice with these skills
- HTML and CSS: Most of our students start out in our web design class. To help practice basic web design skills, I added HTML/CSS writing problems. These are a bit different. It shows them a screenshot (automatically captured via Playwright) and runs their code in an iFrame. When their iFrame matches the screenshot, they pass.
- Exercise Creator: Brad had started working on an exercise editor, which makes it way easier to author exercises and test out the template filling. I added a workout builder which combines exercises into workouts and makes it easier to re-use exercises between workouts. Especially as we got closer to our exam and we were doing comprehensive review, I found that this was super useful for building custom workouts out of problems from various exercise sets.
I think I first heard of Code-treadmill from an instructor during my AP Summer Institute (a crash-course orientation for new AP class teachers) - the AP Computer Science Principles course is unique in that it allows teachers to select what programming language they use to teach. Because there is no single shared language that every student learns, the required final exam is based on a made-up language, "AP Pseudocode." Among other funky things, this made-up language has 1-indexed arrays. It was tricky to teach my students to use 0-indexed arrays in Python all year and then train them to use 1-indexed arrays just for the exam! The College Board has official study material based on AP Pseudocode, and there are some community-generated tools like this awesome ASCII-to-AP Block Generator, but it's hard to find much else. Under the hood, the problems are actually JavaScript, and it uses this clever code to convert them to AP Pseudocode for display. I took over this website about a week before the AP CS Principles exam, and I know that a lot of teachers were planning to use it for final review, so I worked hard with the original author to get the website back up as quickly as we could! ↩︎
I did briefly look into WebAssembly for Python execution and it looks like https://github.com/pyodide/pyodide would work well. Once I decided to add languages beyond Python, I abandoned this idea and moved on to server-side execution. ↩︎
I used to think that configuring a VPS was a daunting task. In the past year I've started using Ansible which makes this so easy! ↩︎
I inherited my appreciation for Doctest from my colleague Jeff Elkner. We find that it's a great way to introduce students to Test-Driven Development and eventually we have students write their own doctests as part of each assignment. Here is our documentation about how to setup Python Doctests, and Here is our same documentation for C++. For Java, I wrote a custom testing system that works similarly to C++ doctests. ↩︎

