Here are some of my current and recent projects and activities:

  • As part of Caltech's Digital Library Development team, some projects I have worked on include:
    • DIBS (Digital Borrowing System): an implementation of a basic controlled digital lending (CDL) system using IIIF to make scanned books available for time-limited viewing to students and faculty.
    • Boffo (Barcodes from FOLIO): a high-performance add-on for Google Sheets that lets library staff retrieve get data from our FOLIO library system.
    • IGA (InvenioRDM GitHub Archiver): a standalone program as well as a GitHub Actions workflow that lets you automatically archive GitHub software releases in an institutional repository such as CaltechDATA.
    • Handprint (Handwritten Page Recognition Test): runs handwriting recognition services from Amazon, Google, and Microsoft on images of archival documents, and produces annotated overlays.
  • SBML (Systems Biology Markup Language) is a community standard format for exchanging computational models in biology. In addition to cocreating the original SBML specifications and leading the international SBML effort for over a decade, some of the notable open-source software I codeveloped include the following:
    • MOCCASIN (Model ODE Converter for Creating Automated SBML INteroperability), a program to convert certain classes of MATLAB ODE-based models into SBML. It does not require MATLAB. Some of its innovations include a Python-based parser for MATLAB.
    • The SBML Test Suite, including a test runner written in Java with the SWT GUI widgets and bundled as a self-contained desktop app.
    • libSBML, an API library for reading, writing, manipulating, and validating SBML files and data streams in many languages including C++, Java, MATLAB, Python, R and others.
  • CASICS (Comprehensive and Automated Software Inventory Creation System) was an effort to create is a system to catalog software by leveraging ontologies and ML. Software modules include:
    • Spiral, a package of functions (including a novel heuristic algorithm trained on real data) for splitting function names and other identifiers in source code files.
    • Nostril, a Python module that infers whether a given short string of characters is likely to be random gibberish or something meaningful.
    • Dassie, a database of the subject terms found in the Library of Congress Subject Headings (LCSH).
  • COMBINE (COmputational Modeling in BIology NEtwork) is an international community organization that I cofounded in 2010 to coordinate development of standards and resources for computational biology.