OCHRE Analytics

Querying, analyzing, and visualizing data while tracking data provenance and ensuring reproducibility

Querying the Database

The user interface for the core database on the back end of the OCHRE platform provides an intuitive mechanism for building and executing powerful queries without requiring any coding. Researchers can construct queries with complex search criteria that consider both the intrinsic properties of database items and their extrinsic relations to other items. A query’s criteria and scope can be saved for repeated use in a Query item in the core database (see the discussion of Query items in the document on “Ontological Classes of OCHRE Database Items”).

A database query can use a thesaurus for automatic query expansion to retrieve items described using terms (variables and values) that semantically match the taxonomic terms used when specifying the query criteria. This enables semantic integration across projects and is discussed in more detail in the OCHRE Integration page of this website.

The results of a query can be saved. The list of database items retrieved by executing a query can be stored in the core database as a Set item that can be named, annotated, and time-stamped. The database items in a set, whether it is constructed manually or generated automatically by a query, may be displayed in a tabular view (data frame) and subjected to statistical methods for analyzing and visualizing the data, or displayed in a map view or network-graph view.

In addition to the basic statistical and plotting methods built into the back-end user interface of the OCHRE platform, there is the option to send data sets to an external R server for more advanced analysis and visualization, as discussed below.

Geospatial Mapping

Database items can be visualized in geospatial map views using OCHRE’s GIS capabilities, which employ ArcGIS Online and the ArcGIS Maps SDK for mapping and spatial analysis. A project can specify its own symbology so that database items (e.g., Spatial items representing buildings or artifacts) are displayed in map views using different colors, styles, and icons based on the item’s properties. OCHRE map views are interactive; clicking on a map feature will pop up a window to display information about the database item being shown in that location on the map.

Network Analysis

OCHRE database items can also be displayed in network-graph diagrams to visualize the relationships among entities belonging to any item class. Each node in a network corresponds to a database item and is “live” in the user interface, so that clicking on the node will display information about that item. Nodes can be displayed using different colors and styles depending on the class or properties of the items they represent.

OCHRE can calculate standard graph metrics like distance and centrality to identify clusters and hubs in a network. This is useful for social network analysis, for example, for researchers who want to analyze the relationships among Agent items in the database that represent individual persons or collective social groups or organizations (see the discussion of Agent items in the document on “Ontological Classes of OCHRE Database Items”).

Data Analysis and Visualization Using R

In addition to these built-in features, OCHRE can interact with an external R server to do more advanced analysis and visualization. This feature is currently under development.

R is a widely used programming language for statistical computing and data visualization. OCHRE query results can be formatted as R data frames and sent to the R server together with R commands that will execute code on the R server to perform the desired analytical procedures. The numerical and graphical results of the analysis are then sent back from the R server to OCHRE, where the user can save them in the database as named and time-stamped Resource items for later use, e.g., for publication on the Web.

In addition to the built-in R functions, there are many pre-written R packages available to perform a wide variety of procedures, ranging from simple univariate and bivariate statistics to complex multivariate statistics, as well as specialized kinds of data analysis, such as natural language processing (NLP), social network analysis (SNA), spatial analysis, and machine learning. R packages can make use of code libraries written in other languages such as FORTRAN, C/C++, Java, or Python. Thus, R provides a mechanism for running Python code, for example (e.g., the NumPy and SciPy libraries), if a project wishes to do so.

Users who know R can enter commands directly into a data-aware R console in the back-end user interface of the OCHRE platform. They can save the R commands they have entered for repeated use. Commands in the console allow them to submit data to the R server from external CSV and Excel (XLSX) files or from dynamic OCHRE queries. Outputs from the R server are then displayed to the user in a separate window. These outputs can be named and saved in the database as Resource items.

In addition to (or instead of) entering commands in the R console, a project can use JSON to script multi-step analytical workflow jobs that (1) perform OCHRE queries, (2) execute R functions to analyze the query results, and (3) specify the outputs to be returned from the R server (PDFs, images, etc.). These workflows can be named and saved by a project for use by people who do not know R or do not want to write their own scripts.

When a workflow script is executed, the user is prompted to specify any external files to be used in the analysis and to supply run-time arguments to pass to the parameters of the chosen queries and R functions, in order to customize them for the current job. The progress of the job is echoed in the R console window. Scripted workflow jobs can be chained, such that the output of one job is the in-memory input (data frame) for the next. Both the workflow scripts and the outputs can be named and saved in the database as Resource items for repeated use.

Data Provenance and Reproducibility

The ability to track the provenance of data in a granular way and to reproduce the results of queries and statistical analyses of the data are important considerations in research data management. OCHRE provides these capabilities.

Data can be imported or edited only by people who have a password-protected user account for accessing the back-end user interface (see the platform diagram below). Project directors specify which users can see their data. Different view/edit/delete privileges can be assigned to each user based on the item category. User accounts are linked to Agent items, so the users are themselves represented by database items that can be linked to other items to indicate that they are the creators of those items. Many users can edit data simultaneously, so automatic record-locking is done at the item level and a “try again later” message is displayed if an item is being edited by another user. Thus, data cannot be unknowingly overwritten in cases of contention.

Each item in the database is attributed to its creator via a link to an Agent item (see the document on “Ontological Classes of OCHRE Database Items”). If a creator is not specified, the research project as a whole (represented by a Project item) is considered to be the creator. Spatial items (i.e., database items that represent spatially situated units of observation) can record multiple observations made by different agents at different times. Each observation has its own set of variable-value properties and is stamped with the date and time of the observation.

A database query can be stored in the database as a Query item, which contains the query’s search criteria and scope and is attributed to the user who created it or to the project as a whole. Query criteria, which can be quite complex, are entered using intuitive drop-down menus via the back-end user interface. This automatically generates XQuery code (not seen by the user) to execute the query and return a set of database items that match the query criteria. The results of each execution of a query can be saved in the database in a Set item with a name, description, and time-stamp. Analytical workflows to analyze and visualize query results can also be saved (this feature is still under development).

To sum up: all users who can edit data, all the edits they make, the query criteria they devise, the results of queries, and the analyses performed on query results are all saved in the database. Thus, OCHRE not only tracks all changes to the data but can also track and reproduce what is done with the data after it has been entered.

Ontological Classes of OCHRE Database Items

Theoretical Background of the OCHRE Ontology