ICAT SOAP Manual

Conventions

In order to document the behaviour of the web service it is generally described in terms of calls on a Java client. Most examples are in Java but with a few Python equivalents where it is sufficiently different.

Setting up

The web service is accessed via a proxy (conventionally known in Java as a port). The proxy (here given a variable name of icat) may be obtained by the following:

URL hostUrl = new URL("https://<hostname>:8181")
URL icatUrl = new URL(hostUrl, "ICATService/ICAT?wsdl");
QName qName = new QName("http://icatproject.org", "ICATService");
ICATService service = new ICATService(icatUrl, qName);
ICAT icat = service.getICATPort();

where <hostname> should be the full name of the ICAT server. For a secure installation, just specifying localhost will not work, the name must match what is on the host certificate.

The corresponding Python suds code is:

from suds.client import Client
client = Client(url)
icat = client.service
factory = client.factory

methods may then be invoked on the icat object and entities created using the factory.

Session management

When you login to ICAT you will be given back a string, the sessionId, which must be used as the first argument of almost all ICAT calls. The only exceptions being the login call itself, getEntityInfo and getApiVersion - none of which require authentication.

String login(String plugin, Credentials credentials)

where the plugin is the mnemonic defined in the ICAT installation for the authentication plugin you wish to use and credentials is essentially a map. The names of the keys and their meaning is defined by the plugin.

This sessionId returned will be valid for a period determined by the ICAT server.

The example below shows how it works for the authn_db plugin at the time of writing, where the plugin has been given the mnemonic "db".

Credentials credentials = new Credentials();
List<Entry> entries = credentials.getEntry();
Entry e;

e = new Entry();
e.setKey("username");
e.setValue("root");
entries.add(e);
e = new Entry();
e.setKey("password");
e.setValue("secret");
entries.add(e);

String sessionId = icat.login("db", credentials);

The corresponding Python code, using the factory and icat service defined above is:

credentials = factory.create("login.credentials")

entry = factory.create("login.credentials.entry")
entry.key = "username"
entry.value = "root"
credentials.entry.append(entry)
entry = factory.create("login.credentials.entry")
entry.key = "password"
entry.value = "secret"
credentials.entry.append(entry)

sessionId = icat.login("db", credentials)

double getRemainingMinutes(String sessionId)

This returns the number of minutes left in the session. A user may have more than one session at once.

String getUserName(String sessionId)

This returns the string identifying the user of the session as provided by the authentication plugin.

void refresh(String sessionId)

This resets the time-to-live of the session as it was when the session was first obtained.

void logout(String sessionId)

This invalidates the sessionId.

Exceptions

There is only one exception thrown by ICAT. In the case of Java this is the IcatException_Exception which is a wrapper around the real exception which in turn includes an enumerated code to identify the kind of exception and the usual message. The codes and their meanings are:

BAD_PARAMETER
generally indicates a problem with the arguments made to a call.
INTERNAL
may be caused by network problems, database problems, glassfish problems or bugs in ICAT.
INSUFFICIENT_PRIVILEGES
indicates that the authorization rules have not matched your request.
NO_SUCH_OBJECT_FOUND
is thrown when something is not found.
OBJECT_ALREADY_EXISTS
is thrown when type to create something but there is already one with the same values of the constraint fields.
SESSION
is used when the sessionId you have passed into a call is not valid or if you are unable to authenticate.
VALIDATION
marks an exception which was thrown instead of placing the database in an invalid state.

For example to print what has happened you might use the following (in Java):

String sessionId;
try {
    sessionId = icat.login("db", credentials);
} catch (IcatException_Exception e) {
    IcatException ue = e.getFaultInfo();
    System.out.println("IcatException " + ue.getType() + " " + ue.getMessage()
    + (ue.getOffset() >= 0 ? " at offset " + ue.getOffset() : ""));
}

The corresponding Python code looks like:

try:
    sessionId = icat.login("db", credentials)
except WebFault as e:
    etype = e.fault.detail.IcatException.type
    message = e.fault.detail.IcatException.message
    offset = e.fault.detail.IcatException.offset
    print "IcatException " + etype + " " + message + (" at offset " + offset) if offset >= 0 else ""

Operations which work on a list of objects, such as createMany, may fail because of failure to process one of the objects. In this case the state of the database will be rolled back and the offset within the list of the entry causing the error will be stored in the IcatException. For other calls the offset will be negative, as it is with certain internal exceptions which are not associated with any specific object in a list.

Data Manipulation

Creating an Object

long create(String sessionId, EntityBaseBean bean)

To create an object in ICAT, first instantiate the object of interest, for example a Dataset, and then call the setters to set its attributes and finally make a call to create the object in ICAT.

So typical code in Java might look like:

Dataset ds = new Dataset();
ds.setName("Name of dataset");
ds.set ...
Long dsid = icat.create(sessionId, ds);

You will see that no convenient constructors are generated, rather each field of the object must be set individually.

The corresponding code in Python is shownn below. Note that the factory.create takes the lower case name "dataset" rather than "Dataset"

ds = factory.create("dataset")
ds.name = "Name of dataset"
ds. ...
dsid = icat.create(sessionId, ds);

Most fields are optional and may be left with null values, however some are compulsory and the call to create will fail if they are not set. Each object has a primary key that identifies it in the database - this is a value of type "long" that is generated by ICAT and is used to represent relationships in a regular manner.

Some fields represent attributes of the object but others are used to represent relationships. The relationships are represented in the class definitions by a variable which either holds a reference to a single object or a list of objects. In the case of a list it may be "cascaded". Consider creating a dataset with a set of datafiles. Because the relationship from dataset to datafile is cascaded they may be created in one call as outlined by the Java code below:

Dataset ds = new Dataset();
ds.setName(dsName);
ds.setType(type);
Datafile datafile = new Datafile();
datafile.setDatafileFormat(format);
datafile.setName(dfName);
ds.getDatafiles().add(datafile); // Add the datafile to the dataset
icat.create(sessionId, ds);

The call to create returns the key of the created object. If you choose to write the Java code

ds.setId(icat.create(sessionId, ds));

then the client copy of the Dataset will be updated to have the correct key value - however the keys in any other objects "within" the Dataset will still be null on the client side. In this case datafile.getId() will remain null.

It might help to understand what is happening when you call create. The client side object you pass in, and everything it refers to , is encoded as XML and sent to the server. There it is unpacked into the same set of objects and persisted in the database. The structure passed in must be a tree structure (or more correctly a DAG ) - if, for example, you modify the code above to create a dataset with one datafile and add datafile.setDataset(dataset) attempting to put in a reverse link which physically will appear in the ICAT database the call will be rejected by the client because you have a loop in the graph as you can then go backwards and forwards between datafile and dataset. If you passed in a proper DAG the database will create one row in the dataset table and one in the datafile table where the datafile entry includes the field datafile.dataset_id holding the dataset.id of the dataset entry just created. Relationships are represented in the database by holding the id value of the related object.

We now have a Java example of adding a datafile to an existing dataset, ds

Datafile datafile = new Datafile();
datafile.setDatafileFormat(format);
datafile.setName(name);
datafile.setDataset(ds); // Relate the datafile to an existing dataset
datafile.setId(icat.create(sessionId, datafile)); // Create datafile and store id on client side

This is the only way to create a datafile and add it to a dataset - you cannot do it by any operation upon the dataset. A datafile cannot be created without a dataset. After the call is made a new entry will have been been added to the datafile table in the database where again the field datafile.dataset_id in the database will have been set to the dataset.id of the related dataset which in this case may be more intuitive as it corresponds to the line of user code datafile.setDataset(ds) .

It is worth noting that the dataset object you have in memory is not affected by the call to create a new datafile. More datafiles can be created referencing the same dataset without retrieving an updated copy of the dataset.

List <Long> createMany(String sessionId, List <EntityBaseBean> beans)

This call, as its name suggests, creates many objects. It takes the list of objects to create and returns a list of ids. If any of the individual operations fail the whole call fails and the database will be unchanged. The objects to be created need not be of the same type. For an example (where they are of the same type) consider adding many Datafiles to a existing Dataset, ds in Java:

List <Datafile> dfs = new ArrayList<Datafile>();
for (int i = 0; i < n; i++) {
    Datafile datafile = new Datafile();
    datafile.setDatafileFormat(dfmt);
    datafile.setName("bill" + i);
    datafile.setDataset(ds);
    dfs.add(datafile);
}
icat.createMany(sesionId, dfs); // many datafiles are stored in one call

The equivalent Python code is:

dfs = []
for i in range(n):
    datafile = factory.create("datafile")
    datafile.datafileFormat = dfmt)
    datafile.name = "bill" + str(i)
    datafile.dataset = ds
    dfs.append(datafile)
icat.createMany(sesionId, dfs)

The only reason that this call exists is to minimize calls to ICAT as it is faster to call createMany than to make many calls to create .

Retrieving an object when you know its id

EntityBaseBean get(String sessionId, String query, long id)

The first parameter is the sessionId, the second is a query string which in the simplest case is just the name of the type of object and the last parameter is the id of the object. You may know the id of an object because you have created it or it was returned from a search call. For example in Java:

get(sessionId, "Dataset", 75L)

will return the Dataset with an id of 75 or throw an exception if it is not found. The dataset is returned on its own with no related objects; the set of related datafiles will be empty and the type (which is a related object) will be null. Often however you will want to return related objects as well. To get a Dataset with all its Datafiles you can use the query:

Dataset ds INCLUDE ds.datafiles

This uses the variable "ds" to identify the selected dataset. The Dataset field "datafiles" will be followed to include all those "Datafiles" related to the selected dataset. Those datafiles which the user is not allowed to read are silently ignored.

To get a Dataset with all its Datafiles, DatasetParameters and DatafileParameters you can use the query:

Dataset ds INCLUDE ds.datafiles df, df.parameters, ds.parameters

which has introduced another variable df to represent the set of (readable) datafiles and which are related to the initial dataset ds. The variable could be preceded by the keyword "AS" if you feel this makes it easier to read as in:

Dataset ds INCLUDE ds.datafiles AS df, df.parameters, ds.parameters

To save typing you could write the equivalent:

Dataset ds INCLUDE ds.datafiles.parameters, ds.parameters

This slightly shortened form may make it less obvious that the datafiles are also included in the set of returned objects. In fact they must be present as the returned objects always form a DAG again and the DatafileParameters need the Datafiles to connect in to the structure.

It is permissible to visit an entity type more than once in an INCLUDE - for example following a provenance chain or including the datasets with the same type as a particular dataset which would be:

Dataset ds INCLUDE ds.type t, t.datasets

There is also a special notation "INCLUDE 1" which is described in the next section.

Updating an Object

void update(String sessionId, EntityBaseBean bean)

To update an object simply retrieve it, update the fields you want to change and call update. For example to change the name of the dataset to "Fred":

Dataset ds = (Dataset) icat.get(sessionId, "Dataset INCLUDE 1", dsid);
ds.setName("Fred");
icat.update(sessionId, ds);

All "many to one" relationships, such as the relationship to the Investigation, will be updated as will any simple field values. Consequently it is essential to get the existing values for any "many to one" relationships otherwise they will be set to null. This is most reliably achieved by the notation INCLUDE 1 as shown here. The effect of the "1" is to include all "many to one" related types. "One to many" relationships, such as the set of Datafiles are ignored by the update mechanism. If your attempted change violates a uniqeness constraint an exception will be thrown.

In addition "many to one" relationships can be modified. For example to make the Dataset part of a different Investigation:

Dataset ds = (Dataset) icat.get(sessionId, "Dataset INCLUDE 1", dsid);
ds.setInvestigation(anotherInvestigation);
icat.update(sessionId, ds);

The equivalent Python code is:

ds = icat.get(sessionId, "Dataset INCLUDE 1", dsid)
ds.investigation = anotherInvestigation
icat.update(sessionId, ds);

It is sufficient for the relation object to just have the id value correct as the rest of the fields are ignored. For example if you know the id of the Investigation then you could write in Python:

ds = icat.get(sessionId, "Dataset INCLUDE 1", dsid)
inv = factory.create("investigation")
inv.id = anotherInvestigationId
ds.investigation = inv
icat.update(sessionId, ds);

Deleting an Object

void delete(String sessionId, EntityBaseBean bean)

The following Java code will get a dataset and delete it.

Dataset ds = (Dataset) icat.get(sessionId, "Dataset", dsid);
icat.delete(sessionId, ds);

All "one to many" related objects will also be deleted. In the extreme case, if you delete a facility, you lose everything associated with that facility. This privilege should not be given to many - see the authorization section later. When you get a local copy of the object to delete there is no need to use INCLUDE 1 to populate the "one to many" related objects as all cascades will be followed. In fact the only part of the object that is used by the delete call is the id. So the following Java code will have the same effect and avoids one ICAT call:

Dataset ds = new Dataset()
ds.setId(dsid);
icat.delete(sessionId, ds);

Authorization

The mechanism is rule based. Rules allow groupings of users to do things. There are four things that can be done: Create, Read, Update and Delete. It makes use of five entity types: Rule, User, Grouping, UserGroup and PublicStep. The name "Grouping" has been introduced as, the more natural, "Group" is a reserved word in JPQL. The authentication mechanism authenticates a person with a certain name and this name identifies the User in the ICAT User table. Groupings have names and the UserGroup performs the function of a "many to many" relationship between Users and Groupings. Rules are applied to Groupings. There are special "root users" with full access to all entities. The set of "root users" is a configuration parameter of the ICAT installation. Only a root user can set up the initial set of authorization rules though these rules can then allow others to manipulate rules.

Though the authorization rules are handled as efficiently as possible to get the best performance define a rule to make those entity types world readable where this makes sense. Such rules have no conditions in the "what" field and the grouping field is left null. Such rules are recognised and are cached in memory. Typically those entities which are used to implement a many to many relationship can be world readable as can those which represent types. The PublicStep mechanism is also available to cover the case where access to one entity implies that another may be accessed. For example you might define a PublicStep from Datafile to its Dataset (via the dataset field) and from a Dataset to its Investigation (via the investigation field). For other cases you will need to write rules.

Rules

By default access is denied to all objects, rules allow access. It is only necessary to be permitted by one rule where that rule is only applied to the object referenced directly in the API call. The Rule table has two exposed fields: crudFlags and what . The field crudFlags contains letters from the set "CRUD" to indicate which types of operation are being allowed (Create, Read, Update and/or Delete). The other field, what , is the rule itself. There is also a "many to one" relationship to Grouping which may be absent.

Consider the Java code:

Rule rule = new Rule();
rule.setGrouping(userOffice);
rule.setCrudFlags("CRUD");
rule.setWhat("Investigation");
icat.create(sessionId, rule);

allows members of the userOffice Grouping full access to all Investigations.

Rule rule = new Rule();
rule.setGrouping(null); // Not necessary as it will be null on a newly created rule
rule.setCrudFlags("R");
rule.setWhat("ParameterType");
icat.create(sessionId, rule);

allows any authenticated user (with a sessionId) to read Parameters. Consider a Grouping of users: fredReaders. To allow fredReaders to read a datafile with a name of "fred" we could have:

Rule rule = new Rule();
rule.setGrouping(fredReaders);
rule.setCrudFlags("R");
rule.setWhat("SELECT Datafile df WHERE df.name='fred'");
icat.create(sessionId, rule);

The what field may contain almost anything which may appear as a search query. The query, if evaluated, would return the set of objects which can be read, updated or deleted in the case of R, U and D crudFlags. For the C crudFlag which controls create operations, the call is allowed if after creation of the object it would be in the set defined by the what field. Control of update operations is more complex if the update includes modification of any field that identifies the object. In this case the user must have delete access to the object before the change and create access after the change.

To control access to the complete object the search query in the what field must return a set of objects rather than a set of fields or an aggregate function. The query must not contain INCLUDE, LIMIT nor ORDER BY clauses.

More complex restrictions can be added using other related objects. For example to allow read access to Datasets belonging to an Investigation which includes an InvestigationUser which has a user with a name matching the currently authenticated user (from the sessionId) we can have:

Rule rule = new Rule();
rule.setGrouping(null);
rule.setCrudFlags("R");
rule.setWhat("select Dataset ds, ds.investigation i, i.investigationUser iu WHERE iu.name = :user");
icat.create(sessionId, rule);

The rules shown so far permit access to all the fields of the object. However for READ and UPDATE it makes sense to control access to an individual attribute. To express this a rule may reference a specific attribute. For example the following Python:

rule = factory.create("rule")
rule.crudFlags = "U" rule.grouping = invHackers rule.what = "select Investigation inv.doi" icat.create(sessionId, rule)

allows members of the invHackers grouping to modify the doi field of all Investigations. Currently only the UPDATE operation can be controlled this way. It would be possible to also support READ however as yet no compelling Use Case has been encountered.

Rules which allow everyone to read a table are cached in memory and are good for performance. For example to give universal read access to DatasetType the following Java code could be used:

Rule rule = new Rule();
rule.setGrouping(null);
rule.setCrudFlags("R");
rule.setWhat("DatasetType");
icat.create(sessionId, rule);

PublicStep

This table has two columns (origin and field). An entry in this table affects the way in which INCLUDE authorization is carried out. Each entry permits all users to make a step from the origin entity by the specified relationship field without any further checking. This information is held in memory for speed. For those INCLUDEs that are not mentioned in the PublicStep table a full read authorization check must be made before including an object to be returned - which can be expensive. The following Python code permits the jobs relationship to be followed from an Application.

ps = factory.create("publicStep)
ps.origin = "Application"
ps.field = "jobs"
session.create(ps)

This will mean that if you have a query "SELECT Application a INCLUDE a.jobs" then the system will only return applications your are authorized to read. It will also provide all the jobs which are related to those applications provided you are authorized to do so. In the absence of a PublicStep it will check each one individually which can be time consuming but with this PublicStep in place the system will simply following the relationship without further checks.

Checking accessibility

boolean isAccessAllowed(String sessionId, EntityBaseBean bean, AccessType accessType)

This call returns true if the access to the bean specified by the accessType is permitted. For example:

Dataset ds = new Dataset();
ds.setName("Name of dataset");
ds.set ...
System.out.println(isAccessAllowed(sessionId, ds, AccessType.CREATE))

This code sets up a Dataset and then prints whether or not it would be allowed to create it.

This call is expected to be made from GUIs so that they can avoid offering operations that will fail. As such, though READ acess may be queried it is unlikely to be useful as the GUI user will not have found out about the object to be checked. If READ, DELETE or UPDATE access is queried for an object that does not exist it will return false.

In the case of CREATE, the entity is created within a database transaction, the check is made and the transaction is rolled back. Note that if a create operation would result in a duplicate this will cause an exception to be thrown.

System Information

String getApiVersion()

returns the version of the server

List<String> getEntityNames()

Returns an alphabetic list of all the entity names known to ICAT. This is of most value for tools.

EntityInfo getEntityInfo(String beanName)

returns full information about a table given its name. For example:

EntityInfo ei = icat.getEntityInfo("Investigation");
System.out.println(ei.getClassComment());
for (Constraint c : ei.getConstraints()) {
  ;  ;  ;System.out.println("Constraint columns: " + c.getFieldNames());
}
for (EntityField f : ei.getFields()) {
  ;  ;  ;System.out.println("Field names: " + f.getName());
}

Prints out some information about the Investigation table. For a list of all available fields in EntityInfo and the objects it references please consult the javadoc for EntityInfo .

Administration Calls

To be authorized to use administration calls you must be authenticated with a name listed in the rootUserNames in the run.properties file.

List<String> getProperties(String sessionId)

lists the active contents of the run.properties file. It does this by examining the properties after they have been read in so any superfluous definitions in the original properties file will not be seen. The current physical file is not re-examined

Concise Syntax

Originally ICAT did not support JPQL but instead had its own concise query syntax. The concise syntax however is not as general as JPQL and so cannot express some queries. At some time in the future support for the concise syntax may be removed - you have been warned.

If the query is simply Dataset this will return all Datasets. If the query is Dataset.name this will return all Dataset names. To get related objects returned, then use INCLUDE followed by a list of object types:

Dataset INCLUDE Datafile,DatasetParameter,DatafileParameter

The related types must be all be related to the original type or to some other type in the list. This means that you could not have "Dataset INCLUDE DatafileParameter" . In addition there must be only one route from the original type to each of the included types - i.e. you can only construct one DAG from the starting object.

You can specify an order (which may precede or follow an INCLUDE clause):

Dataset.id ORDER BY id

Restrictions can be placed on the data returned. For example:

Dataset.id [type.name IN ('GS', 'GQ')]

which could also be written:

Dataset.id [type.name = 'GS' OR type.name = 'GQ']

The restriction in the square brackets can be as complex as required - but must only refer to attributes of the object being restricted - in this case the Dataset. Expressions may use parentheses, AND, OR, <, <=, >, >=, =, <>, !=, NOT, IN, LIKE and BETWEEN. Currently the BETWEEN operator does not work on strings. This appears to be a JPA bug.

Functions: MAX, MIN, COUNT, AVG and SUM may also be used such as:

MAX (Dataset.id)

Selection may involve more than one related object. To show the relationship a "<->" token is used. For example:

Dataset.id <-> DatasetParameter[type.name = 'TIMESTAMP']

Note also here the use of the JPQL style path: type.name . This expressions means ids of Datasets which have a DatasetParameter which has a type with a name of TIMESTAMP. Multiple " <->" may appear but all the objects involved, including the first one, must be connectable in only one way . The next example show how restrictions may be applied to both the Dataset and the DatasetParameter:

Dataset.id [name like 'fred%'] <-> DatasetParameter[type.name = 'TIMESTAMP']

It is also possible to restrict the number of results returned by specifying a pair of numbers at the beginning of the query string. This construct would normally be used with an ORDER BY clause. The first number is the offset from within the full list of available results from which to start returning data and the second is the maximum number of results to return. These numbers if specified must be positive. If the offset is greater than or equal to the number of internal results then no data will be returned. The default values are 0 and "infinity". The numbers must be separated by a comma though either may be omitted. The following are all valid:

3,5 Dataset.id ORDER BY id
3, Dataset.id ORDER BY id
,5 Dataset.id ORDER BY id

To see the investigations to which you are associated you can make use of the special value :user as in:

Investigation <-> InvestigationUser [name = :user]

Time literals (which are implementation dependent in JPQL) should be expressed as shown in the next example:

Investigation [createTime > {ts 2011-01-15 00:00:00}]

The timestamp format must be exactly as shown. Literal boolean values are TRUE and FALSE as in:

Dataset [complete = TRUE]

and finally enums are expressed as shown below:

ParameterType [valueType = DATE_AND_TIME]

which is selecting those ParameterTypes which have a valueType of DATE_AND_TIME from ParameterTypeValue.