Overview
This document is made up of four sections, describing installation , post-installation
work , performance and the admin interface
Installation
Prerequisites
- The icat distribution: icat.server-6.1.0-SNAPSHOT-distro.zip
- Java 11+
- A suitable deployed application server that supports JakartaEE 10+. Testing has been carried out with Payara Server Community 6. Other applications servers such as GlassFish and WildFly may work but have not been tested.
Glassfish/Payara installation instructions are available.
- A database as described in Database
installation instructions installed on the server
- Deployed ICAT authenticators.
- A deployed icat.lucene server of at least version 3.0.0 or Open/Elasticsearch cluster if you plan to use free-text search.
- Python 3.6+ and the suds-community package installed on the server.
Installing a group of ICATs
If your facility depends upon a single ICAT instance then
ingestion of data can be held up by a user making an expensive query.
To avoid this it is suggested that you install multiple servers
each running a Glassfish with an ICAT but all sharing one database and
one icat.lucene server. It is recommended that you install the
machines individually first and make sure that each works before tying
them together. Finally ingestion can be directed to one node and the
other nodes can be load balanced for user access by, for example, an
Apache web server.
All machines must use the same database and icat.lucene server
Authentication can either be carried out on one machine - which
has the advantage that you only have authenticator logs building up on
that machine or each machine can host its own authenticators. If an
icat.server is configured with the new style restful authenticators
then more than one equivalent authenticator server may be specified to
make the system more robust. If you use the old style where JNDI
settings must be specified it is recommended for each icat.server to
have its own authenticators.
The icat.servers are linked by specifying a cluster parameter in
the run.properties file for each machine. This parameter may be the
same for each machine as the software tries to avoid sending messages
to itself.
You could then set up an Apache front end to do load balancing. This
will probably just connect to the satellites leaving the central
machine to handle ingestion of data. See Apache
front end for one way of doing this.
Schema upgrade
Lucene indices
Any existing lucene indices should be removed. The location of
this would have been specified in the previous icat.properties file.
Ensure that the directory specified there is empty. Indices generated by
icat.lucene versions before 3 are no longer compatible.
An additional consequence of these changes to icat.lucene is that the Rules and
PublicSteps set in ICAT directly affect what metadata is returned from searches
against the search component. While the normal authorization process is applied
to the Investigation, Dataset and Datafiles that are returned as results, it is
also possible to include metadata from related entities with each result. For
example, the Instrument(s) used in an Investigation. In order to provide results
in a reasonable amount of time, the full authorization process cannot be
followed for these related entities. Instead, a related field will only be
returned if:
-
The entity in question is a "public table" - that is, there is a Rule
providing READ access to all users for that entity.
-
There are one or more PublicSteps from the
Investigation/Dataset/Datafile entity to the entity of interest.
It is entirely reasonable to decide that a PublicStep or public table Rule is
not appropriate for the entitiy in question, however be aware that this will
limit the metadata returned with the search results.
The same principle applies to the post-search faceting enabled in this release.
Only fields that are allowed to all users via one of the above methods will be
returned, to avoid exposing any unauthorized metadata.
Database schema
The direct upgrade of the database schema is supported for icat.server
4.7.0 and newer. This is done automatically: the initialization of
icat.server will create all missing tables, indices, and columns when
the new version is deployed for the first time.
For older ICAT installations, you need to upgrade to 4.7.0 first,
following the
install
instructions of that version. In a second step, you can make use of
implied schema upgrade while installing the current version.
Database triggers (optional)
If you want the new attributes fileSize and fileCount in Dataset and
Investigation to be updated automatically whenever a datafile or
dataset is added/modified/deleted, you can also set up triggers in your
database. This can be done for MySQL or MariaDB by running
mysql -u icat -p icat < create_triggers_mysql_5_0.sql
or for Oracle by running
sqlplus icat @create_triggers_oracle_5_0.sql
where in both cases it is assumed that the tables are owned by user
"icat". This automatically initializes the fileSize and fileCount of all
existing datasets and investigations. Note that ICAT should not be
running while this is done to avoid inconsistencies.
The setup.properties file
- container
-
Values must be chosen from: TargetServer
Though only Glassfish is working properly at the moment.
- home
- is the top level of the container installation. For Glassfish
it must contain "glassfish/domains" and for JBoss (wildfly) it must
contain jboss-modules.jar.
- port
- is the administration port of the container which is
typically 4848 for Glassfish and 9990 for JBoss.
- secure
- must be set to true or false. If true then only https and not
http connections will be allowed.
- db.driver
- is the name of the jdbc driver which must match the jar file
installed in the container and matching your database.
- db.url
- url to connect to your database. For example:
jdbc:mysql://localhost:3306/icat
- db.username
- username to connect to your database.
- db.password
- password to connect to your database.
- db.target
-
This is optional and may be used to control the SQL generated by the
JPA. Values must be chosen from: TargetDatabase
- db.logging
-
This is optional and if set to one of the values in Eclipse
Link logging.level controls the logging of JPA generated SQL
statements.
The logback.xml file
If you wish to modify the provided logging levels then rename
logback.xml.example to logback.xml and edit to suit your needs.
The run.properties file
- lifetimeMinutes
- Defines the lifetime of an ICAT sessionid. You should avoid
making it have a long duration as this increases the risk if it is
intercepted, lost or stolen.
- rootUserNames
- Is a space separated list of user identifiers having full
access to all tables. The format of the user identifier is determined
by the chosen authentication plugin. The authn_db and authn_ldap
plugins may be configured to either return the simple user name or to
prepend it with a name identifying the mechanism. For example if
there is a an entry "root" in the database then if the authn_db
authenticator is configured without a mechanism then the user name to
consider will be just "root", however if it has been configured with
a mechanism of "db" then the string "db/root" must be specified.
- maxEntities
- Restrict total number of entities to return in a search or
get call. This should be set as small as possible to protect the
server from running out of memory. However if you set it too small it
may prevent users from doing reasonable things.
- maxIdsInQuery
- For handling INCLUDEs, ICAT may generate queries which are
not acceptable to the database system. To avoid this problem such
queries are broken down. This is the maximum size of each chunk which
must not exceed 1000 for Oracle.
- importCacheSize
- the size of a cache used during import to avoid an excessive
number of calls to the database. The cache is dropped after each call
to import to ensure that authorization rules are enforced. As the
cache is short-lived, modifications to ICAT are unlikely to result in
stale information being used from the cache.
- exportCacheSize
- the size of a cache used during export to avoid an excessive
number of calls to the database. The cache is dropped after each call
to export to ensure that authorization rules are enforced. As the
cache is short-lived, modifications to ICAT are unlikely to result in
stale information being used from the cache.
- authn.list
- is a space separated set of mnemonics for a user to select
the authenticator in the login call. Authenticators are separate
applications which may be "Remote EJBs", identified by jndi, if
installed in the same container or they may be restful services
identified by url. The list must not reference Remote EJBs which are
not installed as these are checked when ICAT performs its
initialisation; their absence will cause ICAT to not start.
- authn.<mnemonic>.url
- is a space separated list of the urls of machines with a
restful authenticator service. This will take the form:
https://example.com:443 There must be one such entry for each restful
authenticator. If more than one url is provided the services
referenced must be functionally equivalent. An invalid URL syntax
will cause the ICAT server not to start. This must not be set for an
EJB Authenticator.
- authn.<mnemonic>.jndi
-
is the jndi name to locate an EJB authenticator. When you installed
the authenticator a message would have appeared in the server.log
stating the JNDI names. The name will start:
java:global/
There must be one such entry for each EJB authenticator. This must
not be set for a restful Authenticator.
- authn.<mnemonic>.friendly
- is optional. It gives a name that a tool might use to label
the plugin.
- authn.<mnemonic>.admin
- is optional. Set to true if you wish to indicate that this
authenticator should only be advertised to administration tools.
- notification.list
- is optional. It is a space separated set of Entity names for
which you with to generate notifications. For each one there must be
another line saying under what conditions you wish to generate a
notification for the entity.
- notification.<entity name>
- a string of letters taken from the set "C" and "U" indicating
for which operations (create and update) you wish to be notified for
that kind of operation on the entity.
- log.list
- is optional. If present it specifies a set of call types to
log via JMS calls. The types are specified by a space separated list
of values taken from READ, WRITE, SESSION, INFO.
- search.engine
- This is optional. Specifies the engine used for free-text searches.
Value should be one of LUCENE, OPENSEARCH and ELASTICSEARCH.
- search.urls
- This is optional. It is the machine url of the search engine
server if needed.
- search.populateBlockSize
- This is ignored if search.engine and search.urls are not set. The number of
entries to batch off to the search engine when populating the index.
- search.searchBlockSize
- This is ignored if search.engine and search.urls are not set. Recommend
setting search.searchBlockSize equal to maxIdsInQuery, so that all results
can be authorised at once. If search.searchBlockSize > maxIdsInQuery, then
multiple auth checks may be needed for a single search. The optimal value
depends on how likely a user's auth request fails: larger values are more
efficient when rejection is more likely.
- search.directory
- This is ignored if search.engine and search.urls are not set. Path of a
directoryholding files for requests that are queued to go the search engine.
- search.backlogHandlerIntervalSeconds
- This is ignored if search.engine and search.urls are not set. How often to
check the backlog file.
- search.enqueuedRequestIntervalSecond
- This is ignored if search.engine and search.urls are not set. How often to
transmit requests to the search engine.
- search.aggregateFilesIntervalSeconds
- This is ignored if search.engine and search.urls are not set. How often to
update file size and counts for Datasets and Investigations containing
recently modified Datafiles. If 0, then rather than being performed on timer
will update the parent documents in real time. Note that this can have a
significant performance impact.
- search.maxSearchTimeSeconds
- This is ignored if search.engine and search.urls are not set. How long to
wait before cancelling a long-running search. This can prevent badly formed
queries from blocking other searches from completing.
- search.entitiesToIndex = Datafile Dataset Investigation InvestigationUser DatafileParameter DatasetParameter InvestigationParameter Sample
- The entities to index with the search engine.
- search.units
- This is optional. Recognised unit names/symbols. Each symbol recognised by
indriya's SimpleUnitFormat should be followed by a colon, and then a comma
separated list of units measuring the same property. If the unit is simply
an alias (e.g. "K: kelvin") this is sufficient. If a conversion is required,
it should be followed by this factor (e.g. "J: eV 1.602176634e-19").
Different units can be separated by a semi-colon.
- jms.topicConnectionFactory
- This is optional and may be used to override the default
value of java:comp/DefaultJMSConnectionFactory
- key
- This is optional but if there is an IDS server in use and it
has a key for digest protection of Datafile.location then this key
value must be identical.
Check that ICAT works
A small test program,
testicat,
will have been installed for you. This is a python script which
requires that the suds client is available. This connects as one of
the root users you defined as 'rootUserNames' in the icat.properties
file. Invoke the script specifying the url of the machine on which the
ICAT service is deployed (something like https://example.com:8181),
the mnemonic for the chosen authentication plugin followed by the
credentials for one of the root user names supported by that plugin.
These credentials should be passed in as pairs of parameters with key
followed by value. For example:
testicat https://example.com:8181 db username root
password secret
It should report:
Logged in as ... with 119.9... minutes to go
Login, search,
create, delete and logout operations were all successful.
This script can be run at any time as it is almost harmless - it
simply creates a "Group" with an unlikely name and removes it again.
In case of problems, first erase the directory /tmp/suds and try
the testicat again. If it still fails, look at the log files:
server.log and icat.log which can both be found in the logs directory
below your domain. Look also at the relevant authenticator log.
Post-installation work
Fresh Install
If this is a fresh install then you can use the import facility
to do the initial icat population or you could use the icat manager to
create rules, a Facility and other high level entities.
If you are using Oracle the type NUMBER(38, 19) will have been used
for all floating point numbers. This constrains the values that can be
stored - they may be truncated or rejected. To fix this please execute
the SQL statements in
fix_floats_oracle.sql
In all cases
Populate the lucene index by using the
icatadmin tool.
Performance
To improve performance:
- Consider creating the indices defined in indices.sql. Indices
can make a huge difference to the database performance but there is
also a small cost for each index.
- Make entities readable by anyone if they contain no sensitive
information. This is generally the case for those entities that
implement an many-to-many relationship. For example InvestigationUser
relates Investigation to User but has no attributes. By making it
world readable no access to Investigation or User is granted. An in
memory cache of world readable entities is maintained by ICAT.
- Add entries to PublicStep to allow the INCLUDE mechanism to
be less costly. PublicStep is explained in the ICAT Java Client User
Manual. Its contents are also held in an in-memory cache for
performance.
The icatadmin tool
Administration operations have been added to the ICAT API and are
accessible via the icatadmin tool which will have been installed by
the
setup.py
script. It should be invoked as:
icatadmin <url> <plugin>
<credentials>... -- <command> <args>...
to run a single command or
icatadmin <url> <plugin>
<credentials>...
to be prompted for a series of commands as shown below. In
either case if you specify '-' as the password you will be prompted
for it. Note that in the single command case the "--" marker is needed
to terminate the list of credentials. For example:
icatadmin https://example.com:8181 db username root
password secret -- properties
Only users mentioned in the rootUserNames of the icat.properties file
are authorized to use this command.
- populate [--min-id 0 --max-id 1 --delete] [<entity names>...]
- re-populates lucene for the specified entity name. This is
useful if the database has been modified directly rather than by
using the ICAT API, or to backpopulate from the database after a breaking
change to the search engine. This call is asynchronous and simply places the
request in a set of entity types to be populated. By default runs over all
relevant entities, or names can be provided as arguments. Also has the
options "min-id" to specify a non-inclusive lower limit, and "max-id" for an
inclusive upper limit on the operation. If documents are found in this
range, then the operation will not proceed, unless "delete" is also
specified - in which case all existing documents are cleared first.
To find what it is doing please use the "populating" operation described
below. The new lucene index will not be seen until it is completely rebuilt.
While the index is being rebuilt ICAT can be used as normal as any lucene
updates are stored to be applied later.
- populating
- returns a list of entity types to be processed for populating
lucene. Normally the first item returned will be being processed
currently.
- commit
- instructs lucene to update indices. Normally this is not
needed as it is will be done periodically according to the value of
lucene.commitSeconds
- clear
- stops any population and clears all the lucene indices.