Here are some of my current and recent projects and activities:
-
As part of Caltech's Digital Library Development team, some projects I have worked on include:
- DIBS (Digital Borrowing System): an implementation of a basic controlled digital lending (CDL) system using IIIF to make scanned books available for time-limited viewing to students and faculty.
- Boffo (Barcodes from FOLIO): a high-performance add-on for Google Sheets that lets library staff retrieve get data from our FOLIO library system.
- IGA (InvenioRDM GitHub Archiver): a standalone program as well as a GitHub Actions workflow that lets you automatically archive GitHub software releases in an institutional repository such as CaltechDATA.
- Handprint (Handwritten Page Recognition Test): runs handwriting recognition services from Amazon, Google, and Microsoft on images of archival documents, and produces annotated overlays.
-
SBML (Systems Biology Markup Language) is a community standard format for exchanging computational models in biology. In addition to cocreating the original SBML specifications and leading the international SBML effort for over a decade, some of the notable open-source software I codeveloped include the following:
- MOCCASIN (Model ODE Converter for Creating Automated SBML INteroperability), a program to convert certain classes of MATLAB ODE-based models into SBML. It does not require MATLAB. Some of its innovations include a Python-based parser for MATLAB.
- The SBML Test Suite, including a test runner written in Java with the SWT GUI widgets and bundled as a self-contained desktop app.
- libSBML, an API library for reading, writing, manipulating, and validating SBML files and data streams in many languages including C++, Java, MATLAB, Python, R and others.
-
CASICS (Comprehensive and Automated Software Inventory Creation System) was an effort to create is a system to catalog software by leveraging ontologies and ML. Software modules include:
- Spiral, a package of functions (including a novel heuristic algorithm trained on real data) for splitting function names and other identifiers in source code files.
- Nostril, a Python module that infers whether a given short string of characters is likely to be random gibberish or something meaningful.
- Dassie, a database of the subject terms found in the Library of Congress Subject Headings (LCSH).
- COMBINE (COmputational Modeling in BIology NEtwork) is an international community organization that I cofounded in 2010 to coordinate development of standards and resources for computational biology.