openedx · timmc-edx · May 21, 2025 · May 19, 2025 · May 21, 2025 · May 21, 2025
diff --git a/README.rst b/README.rst
@@ -1,7 +1,7 @@
 codejail_service
 ################
 
-|ci-badge| |codecov-badge| |doc-badge| |pyversions-badge|
+|ci-badge| |codecov-badge| |doc-badge|
 |license-badge| |status-badge|
 
 Purpose
@@ -12,14 +12,6 @@ Run codejail (sandboxed Python execution) as a service. This implements the cust
 Warnings
 ********
 
-Developers
-==========
-
-This service is configured with an in-memory database simply to make Django happy. The service itself **effectively does not have a database**, and should not rely on any database-dependent features such as waffle-based toggles.
-
-Operators
-=========
-
 **It is critical to configure this service securely**, as a misconfigured codejail-service will almost certainly allow an attacker to compromise not just this service, but possibly the rest of your infrastructure. See configuration and deployment docs for details.
 
 This is intended to be run as a fully internal service with no database or admin frontend, with the LMS and CMS making calls to it unauthenticated. It should not be callable directly from the internet.
@@ -37,7 +29,9 @@ Getting Help
 Documentation
 =============
 
-TODO (`<https://github.com/openedx/codejail-service/issues/3>`__)
+See docs directory.
+
+TODO: `Set up ReadTheDocs site <https://github.com/openedx/codejail-service/issues/3>`__.
 
 More Help
 =========
@@ -102,24 +96,21 @@ Reporting Security Issues
 
 Please do not report security issues in public. Please email security@openedx.org.
 
-.. |ci-badge| image:: https://github.com/openedx/codejail-service/workflows/Python%20CI/badge.svg?branch=main
+.. |ci-badge| image:: https://github.com/openedx/codejail-service/workflows/Python%20CI/badge.svg
     :target: https://github.com/openedx/codejail-service/actions
-    :alt: CI
+    :alt: Python CI
 
 .. |codecov-badge| image:: https://codecov.io/github/openedx/codejail-service/coverage.svg?branch=main
     :target: https://codecov.io/github/openedx/codejail-service?branch=main
     :alt: Codecov
 
 .. |doc-badge| image:: https://readthedocs.org/projects/codejail-service/badge/?version=latest
     :target: https://docs.openedx.org/projects/codejail-service
-    :alt: Documentation
-
-.. |pyversions-badge| image:: https://img.shields.io/pypi/pyversions/codejail-service.svg
-    :target: https://pypi.python.org/pypi/codejail-service/
-    :alt: Supported Python versions
+    :alt: Docs
 
 .. |license-badge| image:: https://img.shields.io/github/license/openedx/codejail-service.svg
     :target: https://github.com/openedx/codejail-service/blob/main/LICENSE.txt
     :alt: License
 
-.. |status-badge| image:: https://img.shields.io/badge/Status-Experimental-yellow
+.. |status-badge| image:: https://img.shields.io/badge/Status-Maintained-brightgreen
+    :alt: Status: Maintained
diff --git a/docs/decisions/0001-purpose-of-this-repo.rst b/docs/decisions/0001-purpose-of-this-repo.rst
@@ -1,57 +1,59 @@
-0001 Purpose of This Repo
-#########################
+1. Purpose of This Repo
+#######################
 
 Status
 ******
 
-**Draft**
+**Accepted** 2025-01-13
 
-.. TODO: When ready, update the status from Draft to Provisional or Accepted.
+Context
+*******
 
-.. Standard statuses
-    - **Draft** if the decision is newly proposed and in active discussion
-    - **Provisional** if the decision is still preliminary and in experimental phase
-    - **Accepted** *(date)* once it is agreed upon
-    - **Superseded** *(date)* with a reference to its replacement if a later ADR changes or reverses the decision
+(Written in hindsight May 2025, but generally accurate as to the state of things in January.)
 
-    If an ADR has Draft status and the PR is under review, you can either use the intended final status (e.g. Provisional, Accepted, etc.), or you can clarify both the current and intended status using something like the following: "Draft (=> Provisional)". Either of these options is especially useful if the merged status is not intended to be Accepted.
+Risks of codejail
+=================
 
-Context
-*******
+Codejail is an inherently hazardous feature, as by design it executes untrusted code. By default, this code is executed on the same host as the LMS and CMS, core services that generally hold the most critical data in the deployment—any confinement failure would be disastrous. Historically, this was less of a concern, as the original uses of codejail were for trusted partners. In fact, before February 2013, submitted code was simply run in-process via Python's ``exec`` call, with no confinement whatsoever. There is still a platform feature allowing operators to use this unsafe execution method for specified courses, likely a relic of this period. Over time, the feature was opened up to all course creators, but the architecture remained the same.
+
+On top of the concern of colocating execution with sensitive data, the codejail library is also difficult to configure to be safe. If it is not configured, it defaults to running all code with no confinement. Even if codejail is configured correctly, it still relies on AppArmor to be configured properly; if the wrong file path is specified in the AppArmor profile, code will again run with minimal confinement (just what's provided by user permissions on Linux).
 
-TODO: Add context of what led to the creation of this repo.
+Besides these dangers, it is also simply difficult to configure codejail, and not amenable to the same sort of containerization that other services enjoy. When deploying to a Docker-based environment, codejail requires AppArmor to be configured on the edxapp hosts themselves, and the profile must be applied to the service container. This adds additional complication to deployment.
 
-.. This section describes the forces at play, including technological, political, social, and project local. These forces are probably in tension, and should be called out as such. The language in this section is value-neutral. It is simply describing facts.
+Movement towards remote codejail
+================================
+
+In 2021 eduNEXT had previously implemented a Flask-based remote codejail service at `eduNEXT/codejailservice`_ and a ``remote_exec.py`` interface in edx-platform to call it. This enabled deploying codejail on Tutor.
+
+In 2025, 2U made a push to move its own deployment of edx-platform from the legacy Ansible and EC2 based build system to a Docker and Kubernetes system. In the process, 2U wanted to move to a remote codejail for both security and ease of deployment reasons.
 
 Decision
 ********
 
-We will create a repository...
+We will create a repository at ``openedx/codejail-service``, implementing the ``xmodule.capa.safe_exec.remote_exec.send_safe_exec_request_v0`` remote exec API (same as eduNEXT/codejailservice). This is intended as the standard Open edX remote codejail option going forward.
 
-TODO: Clearly state how the context above led to the creation of this repo.
+The new service will be implemented as a Django service (rather than using the current Flask-based service), allowing for the reuse of existing monitoring and configuration code and patterns that are already standard across the Open edX ecosystem. The total amount of code to write is small, since this is largely a wrapper around the codejail library itself, so a rewrite is acceptable instead of forking or updating the eduNEXT code.
 
-.. This section describes our response to these forces. It is stated in full sentences, with active voice. "We will …"
+The new code will come with an API test suite for evaluating security and functionality of a running instance.
 
 Consequences
 ************
 
-TODO: Add what other things will change as a result of creating this repo.
-
-.. This section describes the resulting context, after applying the decision. All consequences should be listed here, not just the "positive" ones. A particular decision may have positive, negative, and neutral consequences, but all of them affect the team and project in the future.
+- It will be possible to run codejail with additional protections that are not possible in the current default configuration. This includes locking down disk and outbound network access at the container level so that an AppArmor confinement failure can be mitigated.
+- There will be one more service to maintain, although one with fairly minimal requirements.
+- The remote codejail option in edx-platform will have an official implementation.
+- eduNEXT/codejailservice can eventually be deprecated, and `eduNEXT/tutor-contrib-codejail <https://github.com/eduNEXT/tutor-contrib-codejail/>`__ updated to use the new repo.
 
 Rejected Alternatives
 *********************
 
-TODO: If applicable, list viable alternatives to creating this new repo and give reasons for why they were rejected. If not applicable, remove section.
-
-.. This section lists alternate options considered, described briefly, with pros and cons.
+Use eduNEXT implementation
+==========================
 
-References
-**********
+We considered using the existing eduNEXT/codejailservice implementation. The main issue was that it used Flask rather than Django, which meant losing out on various bits of Django-based tooling that we've built up over the years (toggles, telemetry, etc.) Rewriting it to use Django would not have been particularly onerous, but also not much less work than simply creating a new Django-based repo. (The core of the application is simply a translation layer between HTTP and a call to ``codejail.safe_exec.safe_exec``.)
 
-TODO: If applicable, add any references. If not applicable, remove section.
+Because the two services implement the same API, there should be a smooth migration path for users of the existing repo.
 
-.. (Optional) List any additional references here that would be useful to the future reader. See `Documenting Architecture Decisions`_ and `OEP-19 on ADRs`_ for further input.
+Creating a new repo also allows us to make breaking changes (including tightening down security) without interfering with users of the existing repo.
 
-.. _Documenting Architecture Decisions: https://cognitect.com/blog/2011/11/15/documenting-architecture-decisions
-.. _OEP-19 on ADRs: https://open-edx-proposals.readthedocs.io/en/latest/best-practices/oep-0019-bp-developer-documentation.html#adrs
+.. _eduNEXT/codejailservice: https://github.com/eduNEXT/codejailservice
diff --git a/docs/decisions/0002-no-database.rst b/docs/decisions/0002-no-database.rst
@@ -0,0 +1,32 @@
+.. _adr2-no-db:
+
+2. No database
+##############
+
+Status
+******
+
+**Accepted** *2025-01-13*
+
+Context
+*******
+
+An early decision we had to make in the design of a remote codejail service was whether to include a database. A database would provide several possible benefits, such as enabling an admin dashboard for configuration (e.g. Waffle flags) or audit logging of submitted code. Django and its ecosystem also expect there to be a database.
+
+On the other hand, supporting a database would complicate isolation. The value of configuration flags would also be of limited use, as they could not include anything related to security; a partial confinement failure could lead to malicious code changing the application's settings, escalating its own privileges.
+
+Decision
+********
+
+codejail-service will not have any database connection.
+
+Consequences
+************
+
+Django will be configured to use an in-memory SQLite DB. There will be no persistence mechanism.
+
+It will be possible to lock down outbound networking entirely on the service, and the service will not be required to hold any connection-related secrets.
+
+As a downside, this means there will be no admin dashboard, including no possibility of Waffle flags.
+
+Any audit logging will be performed on the edxapp site. That would be a better option for logging anyhow, as edxapp has more context on the submitted student code, and what codejail-service receives has already been wrapped in a standard template along with compatibility shims and other support code (more than we want to include in the audit log).
diff --git a/docs/decisions/0003-no-authentication.rst b/docs/decisions/0003-no-authentication.rst
@@ -0,0 +1,35 @@
+3. Unauthenticated API
+######################
+
+Status
+******
+
+**Accepted** *2025-01-13*
+
+Context
+*******
+
+While codejail-service does not store or process particularly sensitive data, it does provide computation resources. This indicates that authentication or other API call limitations may be warranted.
+
+For its intended purpose (implementing remote codejail, for LMS and CMS), the service is not required to be directly exposed to the public internet. Codejail execution in edxapp is only available to logged-in users, although those users may not need to have verified accounts (depending on deployment settings).
+
+There is currently no provision in the API client code for passing authentication.
+
+Decision
+********
+
+codejail-service will not require authentication for its API calls, but will instead document that the service should not be exposed to the public internet.
+
+Consequences
+************
+
+Deployments that inadvertently expose the service risk abuse of the compute resources. Any vulnerabilities in the webapp's communication with codejail (as opposed to inside the sandbox itself) may also be exposed in such a situation.
+
+Deferred Alternatives
+*********************
+
+While codejail-service doesn't have a database (by design, see ADR 2) it could still authenticate the caller as another service in the deployment. For example, asymmetric JWT tokens could be used to authenticate the caller as edxapp. (Symmetric auth would be unsuitable, as a partial confinement failure could then expose the locally stored keys to an attacker.)
+
+It may still be worth adding the option of authentication at some point, but the risk is relatively low (as the service is already designed to be locked down) and exploitation would require the existence of an API vulnerability and an attacker who can reach the service directly (via SSRF, a misconfigured deployment, or pivoting from the internal network after an unrelated compromise).
+
+This would require modifications to ``remote_exec.py`` in edxapp and additional configuration work in both services.
diff --git a/docs/deployment.rst b/docs/deployment.rst
@@ -98,3 +98,15 @@ API tests
 After the first setup, and after any significant change to security settings, run the tests in ``./api_tests/`` (see README in that directory). This will probe the service for a variety of possible vulnerabilities.
 
 These tests can also be incorporated into your deployment pipeline.
+
+Monitoring
+**********
+
+codejail-service provides telemetry in the form of ``set_custom_attribute`` calls. If telemetry is configured (see `edx-django-utils monitoring docs <https://github.com/openedx/edx-django-utils/blob/master/edx_django_utils/monitoring/README.rst>`__), these can be used to monitor for unexpected API call failures or an unexpectedly high rate of errors returned from codejail executions.
+
+It is also recommended to ingest AppArmor logs from the host, such as the output of ``SYSTEMD_COLORS=false journalctl -k --grep='apparmor.*<PROFILE_NAME>' -f`` (where ``<PROFILE_NAME>`` is the name of the AppArmor profile in effect). This will help you debug failures due to overly restrictive policy.
+
+Migration from local codejail
+*****************************
+
+Starting in Teak, there is code in edx-platform for migrating codejail execution from local to a remote codejail-service with a darklaunch mechanism. See the `Teak release notes <https://openedx.atlassian.net/wiki/spaces/COMM/pages/4570710024/Next+Release+Teak+-+Operator+Dev+Notes>`__ for details.
diff --git a/docs/debugging.rst → docs/developing.rst b/docs/debugging.rst → docs/developing.rst
@@ -1,8 +1,26 @@
+Developing
+##########
+
+Setup
+*****
+
+Due to the complex needs of codejail itself, running a local instance of codejail-service for debugging can be difficult. One of the following is recommended:
+
+- Write unit tests for the behavior of interest, mocking out calls to ``safe_exec``.
+- Run codejail-service using the same Docker image you would use for deployment, and install the corresponding AppArmor profile on your development machine.
+
+Special notes
+*************
+
+The service does not have a database (see ADR :ref:`adr2-no-db`), and so cannot use DB-dependent features such as waffle-based toggles.
+
+(The base settings configure Django to use an ephemeral in-memory database, since Django demands *some* kind of DB. But it isn't used for anything.)
+
 Debugging
-#########
+*********
 
 Segfaults and "resource temporarily unavailable"
-************************************************
+================================================
 
 In some cases, you might get the error message ``Couldn't execute jailed code: stdout: b'', stderr: b'' with status code: -11`` from a code execution. Status code ``-11`` is POSIX signal ``SIGSEGV``. If you encounter this, you're most likely running into process resource limits. Experiment with the ``CODE_JAIL.limits`` values until you discover which ``rlimit`` feature is the issue.
 
@@ -14,7 +32,7 @@ Examples where this has come up:
 rlimit-related failures can also present as "Resource temporarily unavailable" rather than a segfault.
 
 AppArmor
-********
+========
 
 Generally, code-exec failures related to AppArmor will be reported as "permission denied" exceptions in the application, but only if the original exception is allowed to propagate unchanged. If you're unsure whether AppArmor is at fault in an unexpected failure, watching the kernel logs for the profile name may help identify whether it was involved::
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -14,11 +14,11 @@ Contents:
    :maxdepth: 2
 
    readme
-   testing
    deployment
-   debugging
-   changelog
+   developing
+   testing
    decisions
+   changelog
 
 
 Indices and tables