Here are some current and recent projects and activities:

  • Preservation of knowledge – I'm in Caltech's Digital Library Development team and have been active in community efforts such as Data Together. Some software I developed:
    • Handprint runs handwriting recognition services from Google, Microsoft and others on images of archival documents, and produces annotated overlays.
    • Holdit is a program for the Caltech Library Circulation desk to generate a printable "on hold" book list by scraping the Caltech TIND server.
    • eprints2bags download contents from an EPrints server and puts them in BagIt format.
  • CASICSComprehensive and Automated Software Inventory Creation System, to catalog software by leveraging ontologies and ML. Software modules include:
    • Spiral includes functions (including a novel heuristic algorithm trained on real data) for splitting function names and other identifiers in source code files.
    • Nostril is a Python module that infers whether a given short string of characters is likely to be random gibberish or something meaningful.
    • Dassie is a database of the subject terms found in the Library of Congress Subject Headings (LCSH).
  • SBML – A community standard format for exchanging computational models in biology. In addition to the actual SBML specifications, some of the notable open-source software I co-developed include the following:
    • MOCCASIN, a program to convert certain classes of MATLAB ODE-based models into SBML. It does not require MATLAB. Some of its innovations include a Python-based parser for MATLAB.
    • SBML Test Suite. This includes a test runner written in Java with the SWT GUI widgets and bundled as a self-contained desktop app.
    • libSBML, an API library for reading, writing, manipulating, and validating SBML files and data streams in many languages including C++, Java, MATLAB, Python, R and others.
  • COMBINE – The COmputational Modeling in BIology NEtwork, a community helping coordinate development of standards and resources for computational biology.