The interface

IDS plugins have three interfaces to implement. Please see the Javadoc for details. The IDS has a two level storage model with main storage and an optional archive storage. The main storage deals with individual data files whereas the archive storage deals with zip files holding all the datafiles of a dataset. In addition the ZipMapperInterface must be implemented to define the structure of the Zip file which is used for downloads and for archive storage.

IDS deployment options

Depending on the properties defined during ids.server deployment, only a subset of the methods defined in the storage interfaces may need to be implemented for real. In specific deployments, some of the calls will never be made and may therefore be provided by a dummy implementation, which in general should just throw an error. The following table shows which methods are needed in which deployments:

readOnly false true false true false true
archive storage yes yes yes yes no no
storageUnit dataset dataset datafile datafile N/A N/A
mainStorage.delete(DsInfo)
mainStorage.delete(String, String, String)
mainStorage.exists(DsInfo)
mainStorage.exists(String)
mainStorage.get(String, String, String)
mainStorage.getDatafilesToArchive(long, long)
mainStorage.getDatasetsToArchive(long, long)
mainStorage.getPath(String, String, String) if link enabled
mainStorage.put(DsInfo, String, InputStream)
mainStorage.put(InputStream, String)
archiveStorage.delete(DsInfo) N/A
archiveStorage.delete(String)
archiveStorage.get(DsInfo, Path)
archiveStorage.put(DsInfo, InputStream)
archiveStorage.put(InputStream, String)
archiveStorage.restore(MainStorageInterface, List<DfInfo>)

The first three rows list the relevant options for the deployment, see the documentation on the run.properties file in the ids.server installation guide. Here, the meaning is:

  • readOnly: the corresponding value in run.properties
  • archive storage: whether plugin.archive.class is set
  • storageUnit: the corresponding value in run.properties. This is only relevant if archive storage is used
  • link enabled: whether linkLifetimeSeconds has a value greater then zero. This is a special case as it is independent of all other options and only relevant for the mainStorage.getPath(String, String, String) method.

The remaining rows list the methods defined in the main storage and in the archive storage interface. Each column indicates whether this method will be called in a deployment having the corresponding options in the first three rows of the same column set.

In the table, it was assumed that the restore(MainStorageInterface, List<DfInfo>) method in the archive storage plugin—if present—only calls put(InputStream, String) on the main storage. If it also makes other calls, these need to be implemented as well of course.

There is another optional method mainStorage.lock(DsInfo, boolean) that is independent of the ids.server configuration. This method may be implemented if the storage plugin wants to support file system locking in the storage. Otherwise, a dummy implementation must be provided that simply returns null.

There are a two abstract implementations provided, AbstractMainStorage and AbstractArchiveStorage. These provide dummy implementations for all optional methods. It is recommended to extend these abstract classes and to implement those methods that are needed for the deployments that the plugin is intended to support.

Security considerations

When writing a plugin you have to provide methods that operate on your data: read it, check for its existence, write it, delete it and find information about it.

The call put(DsInfo dsInfo, String name, InputStream inputStream) in the MainStorageInterface is where you decide, using information about the data set that will hold the datafile and the name to be given to the datafile, a "location" value which is unique for your IDS instance and which can be used as an identifier to store and retrieve files. It will be some kind of path relative to the storage system you are using. The main danger is that someone will create a datafile object which has a location value which is not consistent with the policy you have defined in this "put" method. Such a location may cause data to be read that should be private or it could cause other people's data to be written or deleted.

The safe simple solution

The simplest way to protect against all such dangers is to enable the generation of a cryptographic hash in ids.server. This results in the computation of a cryptographic hash which is appended to the location you return before storing the value in ICAT. The hash is based on the "id" of the datafile, the "location" value as seen by the plugin and a key known only to the IDS and which is defined in the run.properties file. Each time the location from ICAT is passed to a plugin it is first check that the hash has the expected value and then the first part of the location field is passed to the plugin which will be guaranteed to be exactly what the original call to the put method returned. The plugin is not exposed to the hash at all. The value stored in icat is the location followed by the hash separated by a space. The hash value goes at the end to help with indexing in some cases.

Safe but restrictive solution

If for some reason you don't want to follow the solution described above there are two approaches which are also safe but restrictive:

  • You can make sure that only a few trusted people are authorized to create or update any datafile in ICAT.
  • If you have a system without archive storage then only three plugin calls accept the location. In each case, in addition to the location the createId and the modId of the datafile are passed in so that the plugin can see if this was a trusted person.

Parts of a solution

If you don't want the safe solution and in addition you cannot work with the restrictions in the previous section then there are some things that you can do to help make things safer:

  • You can check that the location does not try to get outside the storage area and that it conforms to your rules by matching the location against a suitable regular expression.
  • You can make sure that write operations will not overwrite existing files (for main storage).
  • You can use UUIDs in the location that the “put” call generates to make the value unguessable.
  • You can add uniqueness contraints on the Datafile.location in the database, using the following SQL statement: ALTER TABLE DATAFILE ADD CONSTRAINT UNQ_DATAFILE_LOCATION UNIQUE (LOCATION) with a further similar statement if you wish to avoid duplicate Dataset.location values. Note however that since these constraints are only enforced at the database level and not known to ICAT, you will get an INTERNAL error "Unexpected DB response" rather then an OBJECT_ALREADY_EXISTS error in the case of a violation.

Packaging

For Glassfish deployment the IDS expects to find its plugins in lib/applibs below the domain directory. The plugins must be packaged will all dependencies - because of the way the Glassfish classloader works and after deployment Glassfish must be redeployed. Please see the file storage plugin with source code as an example of how to do it.