Here are some current and recent projects and activities:
Preservation of knowledge – I'm in Caltech's Digital Library Development team and have been active in community efforts such as Data Together. Some software I developed:
- Handprint runs handwriting recognition services from Google, Microsoft and others on images of archival documents, and produces annotated overlays.
- Holdit is a program for the Caltech Library Circulation desk to generate a printable "on hold" book list by scraping the Caltech TIND server.
- eprints2bags download contents from an EPrints server and puts them in BagIt format.
CASICS – Comprehensive and Automated
Software Inventory Creation System, to catalog software by leveraging ontologies and ML. Software modules include:
- Spiral includes functions (including a novel heuristic algorithm trained on real data) for splitting function names and other identifiers in source code files.
- Nostril is a Python module that infers whether a given short string of characters is likely to be random gibberish or something meaningful.
- Dassie is a database of the subject terms found in the Library of Congress Subject Headings (LCSH).
SBML – A community standard format for exchanging computational models in biology. In addition to the actual SBML specifications, some of the notable open-source software I co-developed include the following:
- MOCCASIN, a program to convert certain classes of MATLAB ODE-based models into SBML. It does not require MATLAB. Some of its innovations include a Python-based parser for MATLAB.
- SBML Test Suite. This includes a test runner written in Java with the SWT GUI widgets and bundled as a self-contained desktop app.
- libSBML, an API library for reading, writing, manipulating, and validating SBML files and data streams in many languages including C++, Java, MATLAB, Python, R and others.
- COMBINE – The COmputational Modeling in BIology NEtwork, a community helping coordinate development of standards and resources for computational biology.