OCHRE Architecture

An efficient two-tier architecture supports all stages of research in a scalable and sustainable way

by J. David Schloen and Sandra R. Schloen (last updated November 2025)

Platform Design

The OCHRE platform has a back end and a front end, each consisting of interconnected components (see the diagram below). The back end is where the data of many different projects, publications, and collections is entered, edited, and securely stored within a core graph database and resource-file servers for images and other resources. The front end contains data that has been exported from the core database into read-only publication databases, together with a Web API and JavaScript Web apps for viewing and searching the published data.

Back End

On the back end of the OCHRE platform are: (1) the core database, which stores the alphanumeric data of all projects, publications, and collections; (2) resource servers for accessing images, maps, audio, video, etc., whose descriptions and URL’s are stored in the core database; (3) a mechanism for retrieving data on demand from external databases that have suitable API’s; and (4) an R server for advanced data analysis and visualization.

A Java client application manages all the data and provides an intuitive graphical user interface for researchers to acquire, integrate, analyze, and publish their data without having to write any code or enter cryptic commands. Only authorized members of a research or publication project who have been given a user account and password by the project director can view and edit the project’s data on the back end of the platform.

Front End

On the front end of the OCHRE platform are read-only publication databases running on a MarkLogic database server. These publication databases contain data that has been exported from the back-end core database in a form that is easy for Web apps to use. The highly atomized and multi-dimensional graph structure of the back-end core database is flattened in a publication database into “denormalized” XML documents with a simpler structure and greater redundancy of information. The authors of the data decide which, if any, of their data is made public in this way.

Access to a publication database is provided via the OCHRE Web API (application programming interface), which automatically converts the published XML data to JSON for use by Web apps. A JavaScript SDK (software development kit) with pre-built functions is provided to make it easy to fetch and display published data, which can be viewed and searched but not edited.

The CORPUS staff maintain an innovative JavaScript app that automatically generates customized websites for online publications, whose layout and style is specified in the back-end core database. Other developers can use the Web API and JavaScript SDK to write their own apps for viewing published data.

The back-end user interface makes it easy to acquire data from different sources, in any digital format. Permission to add and modify data is controlled via password-protected user accounts given to project members, who log in with varying privileges for viewing and editing data as specified by the project director. After the data has been acquired, the back-end user interface is used to integrate the data in accordance with the project’s own taxonomy and relational structures; then to analyze the data; and possibly also to publish the data on the Web. There is no requirement to publish one’s data but once it has been published it will remain permanently accessible and citable on the front end of the platform on an open-access basis, even if updated versions of the data are published subsequently (i.e., multiple editions will be kept). Finally, the University of Chicago Library will preserve the data in the core database and in the University of Chicago’s publication database and external resource server. Other institutions may also host their own publication databases and/or resource servers. Data in the OCHRE core database can also be exported to an RDF triplestore, preserving it in a standards-compliant way that is not dependent on the back-end Java application.

A Comprehensive View of the Data

OCHRE is a comprehensive computational platform for all stages of computational work, supporting the acquisition, integration, analysis, publication, and preservation of research data. The various components of the platform interact seamlessly to make it easy to work with the data and move it from one stage to the next.

The alternative is to employ an ad hoc collection of separate software tools for data management, statistics, image processing, geospatial mapping, online publication, and so on. But that approach requires cumbersome transfers of data from one piece of software to another using intermediate file formats and ad hoc scripts or cryptic command-line instructions. The result is a series of time-consuming and error-prone tasks in which it is easy to lose track of the many pieces of information accumulated in a typical project.

By contrast, OCHRE users have a comprehensive view of all their data in all stages of the project and an intuitive user interface with which to view, edit, analyze, and publish the data, without having to code their own scripts or manually transfer data from one piece of software to another. This comprehensive view of any and all kinds of data via a single piece of software is possible because the core database on the back end of the OCHRE platform implements a graph of knowledge that conforms to a foundational ontology which is universal in scope.

A Scalable and Sustainable Platform

The universal scope of the OCHRE ontology implemented in the core database makes it possible to accommodate any number of project-specific or domain-specific ontologies while faithfully preserving each project’s own terminology. This database runs on a high-performance database management system that provides extensive indexing and fast querying of the data using the XQuery querying language. The OCHRE platform is therefore highly scalable, being able to accommodate any number of projects, publications, and collections.

The platform is not only scalable but also sustainable over the long term. It is institutionally maintained by the CORPUS staff in close collaboration with the University of Chicago Library, which provides system administration, servers, and data storage for the core database, publication database, and binary resources (e.g., images, audio/video, etc.).

Other institutions may choose to host their own publication databases and resource servers. The core database can also be replicated in multiple locations. However, non-Chicago researchers are welcome to store their data at the University of Chicago, with the caveat that an additional fee may be required for storing more than one terabyte of data.

Ontological Classes of OCHRE Database Items

Theoretical Background of the OCHRE Ontology