An interpreter inside an interpreter

An interpreter inside an interpreter

A few months into development, I decided my north star for Memphis would be to run a Flask server entirely within my interpreter. I had no idea how much work this would entail, only that it sounded cool and would probably teach me a lot along the way. If I were making this goal today, I may pick FastAPI or nothing at all because that was silly of me.

Python stdlib

A big decision I encountered was how to deal with the Python standard lib. As you are likely familiar, the standard lib of a language is not technically part of the language definition or runtime. It is included with releases in order to make the language and runtime more useful. Imagine Python without threading or async support. You would still be able to evaluate expressions and instantiate classes, but most production-ready programs need some sort of concurrency support.

One option would be to rewrite the entire standard lib myself. I’m building an interpreter, aren’t I? I believe this is the approach taken by RustPython, which is an admirable path. I figured I had enough on my plate getting the runtime to work, was looking for any and all corners to cut, and decided against this.

The Python standard lib consists of two main parts: the parts implemented in Python and the parts implemented in C. Conveniently enough, I had my own Python interpreter. Could I just interpret the Python source file from the host machine to satisfy the former? Yes, I could. I’d need to support every syntax and feature they used, but after that, it would Just Work.

The C part is where it gets interesting. Way back yonder in 2023, I made a decision to embed a Python interpreter inside my Python interpreter without fully understanding what that meant. Now it was time to wrap my head around this and decide if I wanted to stay with this approach or chose another path.

The interop shop for Rust and Python is Pyo3. As the only game in town, Pyo3 uses the Foreign Function Interface (FFI) to allow your Rust code to make calls into the CPython binary. This works by agreeing on the Application Binary Interface (ABI), a concept I used during my career at AMD. Core software ftw!

Importing modules

My initial use-case was to run import sys and have it give me an object on which I could perform a member access operation. I’m getting into interpreter-speak here, but this is the type of REPL session I’m talking about.

Python 3.12.5 (main, Aug  6 2024, 19:08:49) [Clang 15.0.0 (clang-1500.3.9.4)]
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys
<module 'sys' (built-in)>
>>> type(sys.modules)
<class 'dict'>

Getting this functionality using Pyo3 was straightforward.

pub struct CPythonModule(PyObject);

impl CPythonModule {
    pub fn new(name: &str) -> Self {
        pyo3::prepare_freethreaded_python();
        let pymodule = Python::with_gil(|py|
            PyModule::import(py, name).expect("Failed to import module").into()
        );

        Self(pymodule)
    }
}

And we can use this to drive a similar REPL session in Memphis, assuming you remember the cocktail of features flags to get this to run.

memphis 0.1.0 REPL (Type 'exit()' to quit)
>>> import sys
>>> sys
<module 'sys' (built-in)>
>>> type(sys.modules)
<class 'dict' (built-in)>

If you’re asking yourself, couldn’t you just use this approach to import the entire standard lib (including the parts written in Python and C) and make your entire life, liberty, and the pursuit of happiness, easier, the answer is yes. That would be a valid approach! However, that would make my interpreter more of a shell around CPython than I would like. This is a learning exercise so I’m all for arbitrary decisions. For the purists out there who say loading any piece of CPython inside Memphis makes Memphis not a real interpreter, I would just say: please show me your interpreter.

I conducted a quick test with htop by running import sys inside a REPL session using both Memphis and CPython. On Memphis, because this load the CPython libraries into memory, it increased the RAM usage (Resident Set Size in htop) by about 5MB. For comparison, the Memphis REPL after loading the sys module uses about 9MB of RAM, while the Python REPL before and after loading the sys module uses about the same. I’m sure this isn’t an apples-to-apples comparison, but it at least told me that Memphis wasn’t gonna slowly choke my computer to death.

Converting objects and getting existential

The next complexity with this setup involves converting my Memphis object representation into CPython representations and vice versa. This is a work-in-progress and my primary directive was, initially, “do not fail” and, more recently, “show warnings when you do a lossy conversion.”

Here is my conversion from a PyObject, which is the object representation on the Pyo3 side, into an ExprResult, my Memphis representation.

pub mod utils {
    pub fn from_pyobject(py: Python, py_obj: &PyAny) -> ExprResult {
        if let Ok(value) = py_obj.extract::<i64>() {
            ExprResult::Integer(Container::new(value))
        } else if let Ok(value) = py_obj.extract::<f64>() {
            ExprResult::FloatingPoint(value)
        } else if let Ok(value) = py_obj.extract::<&str>() {
            ExprResult::String(Str::new(value.to_string()))
        } else if let Ok(py_tuple) = py_obj.extract::<&PyTuple>() {
            let elements = py_tuple
                .iter()
                .map(|item| from_pyobject(py, item))
                .collect();
            ExprResult::Tuple(Container::new(Tuple::new(elements)))
        } else if let Ok(py_module) = py_obj.extract::<&PyModule>() {
            let mut module = Module::default();

            // Get the module's __dict__ to iterate over all attributes
            for (key, value) in py_module.dict() {
                let key_str: String =
                  key.extract().expect("Key is not a string");
                let expr_value = from_pyobject(py, value);
                module.insert(&key_str, expr_value);
            }

            ExprResult::Module(Container::new(module))
        } else if let Ok(py_set) = py_obj.extract::<&PySet>() {
            let elements = py_set
                .iter()
                .map(|item| from_pyobject(py, item))
                .collect();
            ExprResult::Set(Container::new(Set::new(elements)))
        } else if let Ok(py_list) = py_obj.extract::<&PyList>() {
            let elements = py_list
                .iter()
                .map(|item| from_pyobject(py, item))
                .collect();
            ExprResult::List(Container::new(List::new(elements)))
        } else {
            // TODO think of a way to detect whether this is an object we can
            // convert or not
            // log(LogLevel::Warn, || {
            //     "Potentially ambiguous CPythonObject instance.".to_string()
            // });
            ExprResult::CPythonObject(CPythonObject::new(py_obj.into_py(py)))
        }
    }
}

And here is the reverse comparison. Note that for both of these we must pass in a Python object, which controls our access to the CPython GIL (global interpreter lock).

impl ToPyObject for ExprResult {
    fn to_object(&self, py: Python) -> PyObject {
        match self {
            ExprResult::None => py.None(),
            ExprResult::Boolean(b) => b.to_object(py),
            ExprResult::Integer(i) => i.borrow().to_object(py),
            ExprResult::String(s) => s.as_str().to_object(py),
            ExprResult::List(l) => {
                let list = PyList::empty(py);
                for item in l.clone().into_iter() {
                    list.append(item).expect("Failed to append to PyList");
                }
                list.to_object(py)
            }
            ExprResult::Function(_) => {
                // TODO our PyCFunction implementation is a no-op, we need to find a way to pass
                // the interpreter into here.
                let callback = |_args: &PyTuple, _kwargs: Option<&PyDict>| -> PyResult<bool> {
                    log(LogLevel::Warn, || {
                        "Potentially lossy PyCFunction invocation.".to_string()
                    });
                    Ok(true)
                };
                // TODO use real function name
                let py_cfunc = PyCFunction::new_closure(
                    py,
                    Some("memphis_func"),
                    None,
                    callback
                ).unwrap();
                py_cfunc.to_object(py)
            }
            ExprResult::Class(_) => {
                // TODO same here, our PyClass implementation does bring real fields
                Py::new(py, TestClass {}).unwrap().to_object(py)
            }
            ExprResult::Module(module) => {
                let py_module = PyModule::new(py, &module.borrow().name()).unwrap();

                // Flatten all key-value pairs from scope into the module
                for (key, value) in module.borrow().dict() {
                    py_module.add(key, value.to_object(py)).unwrap();
                }

                py_module.to_object(py)
            }
            ExprResult::CPythonModule(module) => module.borrow().0.to_object(py),
            ExprResult::CPythonObject(object) => object.0.to_object(py),
            _ =>
                unimplemented!(
                    "Attempting to convert {} to a PyObject, but {} conversion is not implemented!",
                    self,
                    self.get_type()
                ),
        }
    }
}

This is a rich area that I’d like to explore further. Here are some of the directions I’ve considered:

  1. Convert each time an object crosses the FFI interface. (And yes, I realize that acronym expands to foreign function interface interface.) That’s roughly what I’m already doing, I would just need to own it and not feel like an imposter. This could be simple but inefficient.
  2. Keep a registry so that each object exists at most once on each side. This would be more efficient than (1), but it’d require a stable value which you could use to lookup and link up these objects.
  3. Aim for a single representation on the Rust side and use Pyo3 to proxy and lazily convert fields as needed. I believe this would still leverage the functionality of (1), but in a more efficient manner.
  4. Make the memory layout of a Memphis object match that of a PyObject. Similar to how #[repr(C)] already works in Rust, this would be similar to the role an ABI plays for a function call. I’m not even sure if this one is possible given the difference in what each side needs to do its evaluation, but this intrigues me.

I’m getting ahead of myself because I can barely load a C module right now, but there’s truly no end to where my curiosity could take me in this area.

The End

I continue to poke at this when I hit a new conversion failure while plodding along towards getting Flask to boot. This exercise is a good reminder that all objects (or classes, modules, etc) are a set of attributes that exist in a known format in memory. If we understand that format well enough, we should be able to do incredible things, regardless of whether it is on the Memphis or CPython side.

This philosophy drives my work with From Scratch Code as well. If you are tired of being unable to get a library to work in your code, I encourage you to step back and ask: what the library is actually doing? Do you need it, or could a simpler solution work? I believe in cultivating this curiosity about software—and I’d be happy to help you incorporate this mindset into your toolbox.


If you’d like to get more posts like this directly to your inbox, you can subscribe here!

Elsewhere

In addition to mentoring software engineers, I also write about my experience as an adult-diagnosed autistic person. Less code and the same number of jokes.