ICAT Server User Manual

Introduction

The ICAT server is a layer on top of a relational DBMS. The database is wrapped both as a SOAP web service and as a RESTful web service. Originally ICAT only exposed a SOAP interface but now REST interfaces are more common than SOAP. The intention is not to develop the SOAP interface further however there is no current timescale to stop supporting it.

SOAP interface

For the SOAP web service each table in the database is mapped onto a data structure exposed by the web service. When the web service interface definition (WSDL) is processed by a client then it has sufficient information to allow an object to be built on the client side matching rows of each table in the database. The convenience of using the SOAP interface depends critically upon the level of support provided by the libraries available.

In Java support is good and tools such as wsimport can be used to generate client stubs to make it very easy to use and to provide type safety. Java client libraries have been - see icat.client.

In the case of Python, Suds (or preferably the jurko fork of suds) can be used. This uses the wsdl to allow factory methods to create the objects in the python client corresponding to the rows of the database tables. In addition there is the python-icat library which s a Python package that provides a collection of modules for writing programs that access an ICAT service using the SOAP interface. It is based on Suds and extends it with ICAT specific features and also provides a level of protection from server version dependency.

SOAP is basically RPC and the client must match the server. The protocol is XML based and so is rather verbose and because you can only transfer predefined objects you cannot issue a query which returns a projection of a join as you would expect to be able to do in an RDBMS. This means that more data are transferred than needed. This means that you cannot get back a table of Facility names and Investigation names for all investigations with a DOI attached. Instead you would have to get all the matching Investigations along with the the related Facility. Not only would you get back all the columns of the Investigation and Facility tables that you don't want but all the Facility information would typically be sent many times - each time it is referenced by an Investigation. In addition the WSDL (though generated automatically from the Java code of the server) contains no code derived documentation. The calls are listed in the SOAP interface.

RESTful interface

This service acts upon HTTP methods (GET, PUT, DELETE etc.) and processes JSON input and returns JSON. The client programmer gets no help in producing these JSON messages to represent sets of database rows to send to the server nor in dealing with the JSON which is returned. Some help is provided by the rest part of the icat.client with those calls that don't pass sets of entities around. Otherwise for manipulating JSON from Java the javax.json package is recommended and from Python the json package is recommended. The javax.json package provides streaming and non-streaming libraries and the Python json library allows for conversion between JSON and Python object trees made up of lists and dictionaries.The calls are listed in the REST interface.

Session Management

The icat server is configured with a number of authenticator plugins. To login to a server requires that you identify the plugin and pass it suitable credentials. These credentials are sent as key value pairs. Mostly the keywords are username and password but this is not necessarily the case and in particular the anonymous authenticator requires no credentials.

A session will time out after a server determined period of time which reduces the risk of losing a sessionId. There are calls to logout, to refresh a session and to find out how much time is left and to find the username associated with a session.

The schema

To understand exactly how the data manipulation calls work requires an understanding of the schema. Please take a look now to make sense of the following explanation.

Each table in the database, representing a set of entities, is mapped onto a class in the API so terminology mixes OO and database concepts. Each class has uniqueness constraints, relationships and other fields. Each object is identified by a field "id" which is managed by ICAT and is returned when you create an object. This is common to all objects and is not described in the schema. The "id" field is used as the primary key in the database. There will normally be some combinations of fields, some of which may be relationships, which must be unique across all entries in the table. This is marked as "Uniqueness constraint". For Dataset this is investigation, name where investigation represents a relationship. No more than one one Dataset may exist with those two fields having the same value. These constraints are enforced by ICAT.

The relationship table is shown next. The first column shows the minimum and maximum cardinality of the relationships. A Dataset may be related to any number of DataCollectionDatasets, to at most one Sample and to exactly one DatasetType. The next column shows the name of the related class and this is followed by the name of the field which is used to represent the relationship. The basic field name is normally the name of the related class unless it is ambiguous or unnecessarily long. The field name is in the plural for "to many" relationships. The next column, "cascaded", is marked yes to show that create and delete operations are cascaded. For simplicity all "to many" relatsionships are cascaded and all to one" relationships are not so the "cascaded" column is not really needed. If a Dataset is deleted then all its DataCollectionDatasets, DatasetParameters and Datafiles are deleted at the same time by one call to ICAT. In a similar manner a tree, created in memory with a Dataset having a a set of Datafiles and Datasetparameters, can be persisted to ICAT in a single call.

Note that:

  • relationships are all "one to many" in one direction and "many to one" in the opposite direction
  • all "one to many" relationships are cascaded but no "many to one" relationships
  • there are no other kinds of relationship in the ICAT model
  • all relationships are navigable in both directions

In addition each entity has four special attributes: createId, createTime, modId and modTime which are maintained by ICAT though they may be queried. For each each entity they record who created it and when, and who modified it and when.

Logging

The server can be configured to send PubSub JMS messages to allow for monitoring its behaviour. It will send one message for each enabled successful call. The message has a text body and, in addition to the normal JMS headers has properties of:
operation
the name of the call
ip
the ip address of the client machine making the request
millis
the duration of the call in ms
start
the time of the start of the call im ms since the epoch
The text of the body contains nothing that might reveal private information. The properties make it easy to write a consumer that only selects the data it is interested in. The logging is enabled by groups of operations. Calls are classified as one of READ, WRITE, SESSION or INFO.

Notifications

Notifications may also be sent when an entity is created or updated. The JMS message is always PubSub rather than point to point. This means that there can be multiple listeners for a message. The message is an object message holding the id of the entity that has been created or updated. In addition to the normal JMS headers it has properties of:
entity
the name of the type of entity
operation
C for create or U for update

This mechanism does not leak information because all the user receives is an entity id. To read the entity with that id the user must have read access to that entity instance.

Notifications must be requested for individual entity types and for each one whether you want to know about creates, updates or both. As this is a server setting if you have users wanting to receive notifications of different entity types you might take the OR of the requests. If you are writing an automated job processing system then you may be interested to watch for creations or updates of entities of type Dataset, Datafile and Job.

There is an example MDB available which should make it easy to understand how to use the messages.