Chris standing in a field with wildflowers and a cabin in the background.

Code Treadmill

June 2026

favicon from code-treadmill.com - a 16x16 pixel image of a cyclist on a bike

This Spring, I took over a popular classroom coding website, https://code-treadmill.com/. In this post, I'll document how I took it over, the enhancements I've added, and how this all became an awesome lesson for my students. At the end, I have some reflection about how an LLM assisted me in a lot of the enhancements.

Background - Inheriting Code Treadmill

Code Treadmill (formerly "way-to-code") is an awesome website that helps students practice coding problems in a fun, gamified setting. I've been a fan of Code Treadmill for a long time and have had great success using it with my students. It's specifically popular because it randomly generates practice problems written in AP Pseudocode[1].

I tried to use Code Treadmill in class this year and noticed that it was gone! After a bit of searching, I found This video by the website's creator, Brad Wray, explaining that he was proud of this website but had retired from teaching and had decided to take it down. I reached out offering to help, and Brad was excited to work together. Within a few days, Brad had transferred the domain and added me as a contributor to the repo.

This was an awesome practical lesson for my students in the power of collaboration on open-source software. I had lamented the loss of code-treadmill in class one day, and had already found the repo and set it up to run from a server in my classroom. Then I got the response from Brad during class and talked to my students about the process as we transferred the domain and setup a hosting provider.

Hosting, and how that got complicated quickly

As an immediate first step, I setup hosting on https://railway.com/, which is fast, easy, and cheap. You basically just point Railway to the repo on GitHub and it handles the rest. This worked pretty well, and Brad posted an announcement on a CS Teachers Facebook group that code-treadmill.com was back!

I shared this story with a colleague, who asked how hard it would be to add more programming languages. Code Treadmill was basically just a React app which only supported JavaScript (and AP Pseudocode, which it translated into JavaScript for execution), so all code execution was handled client-side in the browser. Our school does use a little bit of JavaScript but mostly emphasizes Python and C++. There is no way to support those in the browser[2], so I needed to find a way to send code to a server for execution. After a bit of searching, I discovered Judge0, an open-source and self-host-able code execution sandbox that fit my needs perfectly. Judge0 supports tons of languages and is easy to work with - you just run it in a Docker container, make a web request with the code, and it responds with the stdout/stderr.

Railway doesn't support docker containers, so at this point I migrated to a virtual private server ("VPS") hosted on Linode. I spun up the VPS[3], deployed Judge0 and code-treadmill via a single docker-compose file, and tested it out- it worked great! But when I tried to run a race live in class, it couldn't handle the load. I had 30 students in my classroom all trying to answer questions as fast as they could, and every time a question loaded they were making a code execution request to the Judge0 container. This was too much for my tiny Linode to handle.

Once again I turned this into a live learning opportunity for students - we ssh'd into the server and ran htop to inspect the load on the CPU and RAM. Linode makes it super easy to scale the server up and down, and it only takes a few minutes to apply the changes. My initial $5/month server had 1 CPU core and 1GB of RAM, which was way too small for the load. I switched to the next plan up ($12/month for 2GB of RAM and 1 core), waited a few minutes for the server to reboot, and tried again - it wasn't much better. We tried a few more times until we got up to a plan that costs $48/month - that seemed to work better, but was more than I was willing to pay, so I dropped back to the cheapest option and went back to the drawing board. The nice thing about VPS pricing is that they only charge for the amount of time you use - so experimenting with different server configurations for an hour cost me almost nothing.

In the end, I decided to host from the media server that I keep at home. This server has plenty of CPU and RAM to spare, and I already had it setup to serve to the internet via TailScale connected to a lightweight VPS running Caddy. I wrote more about that setup in a dedicated blog post here:

Self-hosted Media Server

Caching

I also realized that I could cut way down on the server load by caching. One of the cool things about code-treadmill is that problems are randomly generated. There are hand-made 'templates' which are filled in with random values. If a student gets a question wrong, it gives them the same question with new values and lets them try again. This makes sure that students don't just memorize answers - they actually have to understand the code to get the new problem right. To do this, code-treadmill defines its own template language. Here's an example:

let a = ##;
let b = ##;
let result = a > 0 && b > 0;
console.log(result);

The ## will get filled in with random integers between 1 and 7. There are tons of placeholders for different types of values - documented here.

To preserve this effect, I added a system that generates random versions of each problem and caches them. The client browser keeps track of which version of each problem the student has already seen and will make sure to serve from the cache a version of the problem that the student hasn't seen yet. If they've already seen every problem in the cache (or the cache is empty), it'll generate and cache a new version.

This works well as long as the cache is already pretty big. I wrote some code that automatically "warms" the cache when the server first runs - it generates and caches 3 versions of every problem. The cache is stored in memory on the server, so this warming script needs to re-run every time the server restarts.

More Features

At this point everything was working great. We were doing a ton of review in my class as we were just a few weeks out from our end-of-year certification exam, so my students made for enthusiastic testers. This was an awesome experience for them - we kept a running list of ideas to add, and every class I would give them an update on which ideas I'd implemented and which were still in the queue - then they would test it out and give me a new round of feedback and ideas. See the bottom of this post for an extensive list of new features we added. It was fun for students to help design the tool that we were using to learn, and unlocked a lot of metacognition - we discussed which coding concepts they were struggling with the most, and the strategies we could bake into the website to help them practice.

Success

Thanks in large part to the review we did on Code Treadmill, my students did better than I expected on their end of year certification exams. I'm really excited about how well this worked and excited to keep using it in the future. I'd love for other educators, programmers, and learners to check it out and share feedback!

Reflection - LLM-assisted Coding

This as my first major project where I used Claude Code. I'd previously used LLMs here and there to answer specific questions, but this was a totally different interaction model, and it's incredible how well it worked. There is no way that I could have had this much success this quickly on my own.

I was transparent with my students about my LLM use throughout this project, and it sparked a lot of meaningful conversations.

This is a common topic in my CS classes. I wrote about my AI-informed assessment strategy in a recent post here:

Project Based Learning in the AI Age

And I'm working on a more complete blog post about LLM-assisted coding. This project informed a lot of it and inspired a lot of questions about what skills I should be emphasizing for my students.

One question that I kept coming back to was what it means to say that I'm proud of code that I didn't write. I certainly worked hard on this project (I was waking up at 5am for weeks to work on this every day before school) and feel that I met the standard I set for my students (I didn't write all the code myself, but I can explain it all). And although the LLM was great at generating code, I feel that I can claim all the ideas. I spent a lot of time making architecture decisions (eg thinking through the caching system), suggesting refactors to make code more maintainable, and writing the documentation. And I wrote this blog post myself - I still never use LLMs for writing, when I've tried I've ended up spending more time revising than if I had just written myself from the start.

Next Steps and Contributions

This is an open source project, so I'd love to have some collaborators. You can find the repo here: https://codeberg.org/code-treadmill/code-treadmill. I'm also working on transferring ownership of the domain to Social Justice Computing, a nonprofit that I'm on the board of and passionate about. My hope is that SJC can help fundraise for a more permanent hosting solution that isn't in my house. If you'd like to support that effort, you can make a (tax deductible!) donation here

One immediate next step is to re-architect the way I write and maintain problems. Right now, they're all stored in text files in the repo. If someone wants to add new problems, they would need to make a PR on the repo. My plan is to switch this to a database separate from the repo and add a way for any user to write and save their own exercises. The classic trade-off here is that I can't version-control the contents of the database - I'm still thinking through the best solution.

Features

Here's a quick summary of the features I've added:


  1. I think I first heard of Code-treadmill from an instructor during my AP Summer Institute (a crash-course orientation for new AP class teachers) - the AP Computer Science Principles course is unique in that it allows teachers to select what programming language they use to teach. Because there is no single shared language that every student learns, the required final exam is based on a made-up language, "AP Pseudocode." Among other funky things, this made-up language has 1-indexed arrays. It was tricky to teach my students to use 0-indexed arrays in Python all year and then train them to use 1-indexed arrays just for the exam! The College Board has official study material based on AP Pseudocode, and there are some community-generated tools like this awesome ASCII-to-AP Block Generator, but it's hard to find much else. Under the hood, the problems are actually JavaScript, and it uses this clever code to convert them to AP Pseudocode for display. I took over this website about a week before the AP CS Principles exam, and I know that a lot of teachers were planning to use it for final review, so I worked hard with the original author to get the website back up as quickly as we could! ↩︎

  2. I did briefly look into WebAssembly for Python execution and it looks like https://github.com/pyodide/pyodide would work well. Once I decided to add languages beyond Python, I abandoned this idea and moved on to server-side execution. ↩︎

  3. I used to think that configuring a VPS was a daunting task. In the past year I've started using Ansible which makes this so easy! ↩︎

  4. I inherited my appreciation for Doctest from my colleague Jeff Elkner. We find that it's a great way to introduce students to Test-Driven Development and eventually we have students write their own doctests as part of each assignment. Here is our documentation about how to setup Python Doctests, and Here is our same documentation for C++. For Java, I wrote a custom testing system that works similarly to C++ doctests. ↩︎