Batch module

iKnowBase comes with a batch module used for processing certain off-line and near-line tasks, such as email processing and file format conversion. The batch server is implemented as a java module in the iknowbase-webapp application.

Currently, the Batch Server handles these services:

The batch server is enabled by default, but may be disabled as needed. In particular, a larger site with multiple servers might want to disable batch processing on the servers handling public traffic. Disabling the batch module will implicitly also disable all services described below.

The BatchServerConfiguration accepts these configuration properties:

Property name Description
com.iknowbase.batch.enabled Toggles whether the batch server modules are available.

ContentIndexer

When using the Solr search engine for content search, the iKnowBase database repository will send indexing requests to the batch server, which will then submit content to the Solr server for actual indexing.

The ContentIndexerConfiguration accepts these configuration properties:

Property name Description
com.iknowbase.batch.contentIndexer.enabled Toggles whether to start the contentIndexer. The legal values are either “true” or “false”.
com.iknowbase.batch.contentIndexer.dequeTimeoutSeconds Number of seconds each dequeue() shall wait before recycyling.
com.iknowbase.batch.contentIndexer.spawnPolicy Decides when the pageEngine starts listening for a new message. Use “immediate” for parallel processing, or “delayed” for serial processing.

iKnowBase is capable of submitting content to multiple search engines for indexing. The administrative interface uses a logical “search engine name” when mapping content for searching; for each such there is a corresponding SearchEngineConfiguration accepting these configuration properties:

com.iknowbase.searchEngine. <searchEngineName>.index.type Type of index server. Currently “SOLR” is the only supported value.
com.iknowbase.searchEngine. <searchEngineName>.index.URL URL to index server, e.g. http://solr.example.com/solr/CoreName.
com.iknowbase.searchEngine. <searchEngineName>.index.connectionTimeoutMillis Max number of milliseconds to wait for the connection.
com.iknowbase.searchEngine. <searchEngineName>.index.operationTimeoutMillis Max number of milliseconds to wait for the operation.
com.iknowbase.searchEngine. <searchEngineName>.index.commitWithinSeconds Max number of seconds before the index server commits an update to the search index.

EmailReader

Most of the EmailReader configuration is performed in the Development Studio, where you define the various email accounts that you want to process, along with the pl/sql packages you want to use for processing the actual messages.

By default, any running iKnowBase batch server will process email messages. This is nice, unless you happen to have multiple BatchServers installed and running in parallel, which might lead to multiple batch servers accessing the same email account at the same time. This is a potential source of trouble, so we recommend that you configure the EmailReader so that only one instance is active at any time.

The EmailReaderConfiguration accepts these configuration properties:

Property name Description
com.iknowbase.batch.emailReader.enabled Toggles whether the emailReader is enabled or not.

EmailSender

EmailSender is the preferred method for sending emails and is set as default for new installations. An alternative method is available through the iKnowBase repository, see IKB_GLOBAL_PREFS.

The EmailSender configuration is performed using configuration properties, where you define profiles and settings used for sending emails. Whether emails are sent using this service or sent using iKnowBase Repository is controlled by iKnowBase Global Preferences in the iKnowBase Repository.

The EmailSenderConfiguration accepts these configuration properties:

Property name Description
com.iknowbase.batch.emailSender.enabled Toggles whether to start the fileConverter. The legal values are either “true” or “false”.
com.iknowbase.batch.emailSender.dequeTimeoutSeconds Number of seconds each dequeue() shall wait before recycyling.
com.iknowbase.batch.emailSender.spawnPolicy Decides when the pageEngine starts listening for a new message. Use “immediate” for parallel processing, or “delayed” for serial processing.

FileConverter

The FileConverter is a service that converts documents from a number of file formats, to PDF, HTML or a number of image formats.

Note that the FileConverter service is licensed separately from the core iKnowBase product.

Understanding the FileConverter

The FileConverter installs as a service

Usage of the FileConverter works like this:

The process above implies that for the FileConverter to work, you also need to install a separate Outside In program to the server.

Installing Outside In technology

The Outside In programs are delivered separately from iKnowBase, in a zip-file that will typically be named something like fileConverter-linux-x86-64-outsidein-835.zip. Install this file using the following steps:

$ cd /opt/iknowbase
$ unzip fileConverter-linux-x86-64-outsidein-835.zip

Configuration properties

After installing the outside in technology, you must configure the file converter. The FileConverterConfiguration accepts these configuration properties:

Property name Description
com.iknowbase.batch.fileConverter.enabled Toggles whether to start the fileConverter. The legal values are either “true” or “false”.
com.iknowbase.batch.fileConverter.dequeTimeoutSeconds Number of seconds each dequeue() shall wait before recycyling.
com.iknowbase.batch.fileConverter.spawnPolicy Decides when the fileConverter starts listening for a new message. Use “immediate” for parallel processing, or “delayed” for serial processing.
com.iknowbase.batch.fileConverter.outsideInDirectory Location of outside in installation. File Converter is disabled when this is not set.
com.iknowbase.batch.fileConverter.replyMessageExpirationSeconds Number of seconds each reply message shall be valid, before expiring.

Testing and troubleshooting

Running tests

The first step is to verify that the converstion program itself runs. Go to the installation directory, and verify that you may run document conversion from the command line:

$ ./exsimple Test.docx Test.pdf pdf.cfg
EX_CALLBACK_ID_PAGECOUNT: The File had 5 pages.
Export successful: 1 output file(s) created.

The second step is to run a “local” conversion from the web-application. Using a browser, open the “/ikbBatch” application. In the tab named “fileconverter”, you will find a number of links for test conversions. They will convert from a Microsoft Word document and a Microsoft PowerPoint presentation, to a number of export formats. Clicking on these will run the server-side conversion, and return the converted document. Using the tests named “Test.docx (local)” and “Test.pptx (local)” will run the test locally, without any database involvment.

The third step is to run a “queue based” conversion. The procedure is the same as above. Using the tests named “Test.docx (queue)” and “Test.pptx (queue)” will send the document through the database for conversion, the same way as most production usage will work.

Missing libraries

A common problem is for conversion to image formats to fail under Linux, due to missing libraries:

$ ./exsimple Test.docx Test.pdf pdf.cfg
./exsimple: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory
./exsimple: error while loading shared libraries: libXm.so.3: cannot open shared object file: No such file or directory

Search for the missing file using the “locate”-command, as shown below. If the file is missing, or only available as a stub, the proper library must be installed.

Missing fonts

Another common problem is missing fonts:

[root@ip-10-53-107-93 fileConverter]# ./exsimple Test.docx Test.pdf pdf.cfg
EX_CALLBACK_ID_PAGECOUNT: The File had 1 page.
EXRunExport() failed: The font directory does not contain any font files or the directory is invalid (0x0B02)

This can often be fixed by installing the liberation fonts:

$ yum install liberation-fonts-common liberation-mono-fonts liberation-sans-fonts liberation-serif-fonts libreoffice-opensymbol-fonts

ImageEditor

The ImageEditor service performs image operations such as resize, rotate, flip and crop. It is the preferred method for image operations and is set as default for new installations. An alternative method is available through the iKnowBase repository, see IKB_GLOBAL_PREFS.

The ImageEditorConfiguration accepts these configuration properties:

Property name Description
com.iknowbase.batch.pageEngine.enabled Toggles whether to start the fileConverter. The legal values are either “true” or “false”.
com.iknowbase.batch.pageEngine.dequeTimeoutSeconds Number of seconds each dequeue() shall wait before recycyling.
com.iknowbase.batch.pageEngine.spawnPolicy Decides when the pageEngine starts listening for a new message. Use “immediate” for parallel processing, or “delayed” for serial processing.
com.iknowbase.batch.pageEngine.replyMessageExpirationSeconds Number of seconds each reply message shall be valid, before expiring.

Note that the most image editing features are currently managed from the database, which can also use Oracle ORDSYS for image manipulation. Use the Database Global Preferences to enable the use of the batch ImageEditor for image editing.

PageEngine

The ikbBatch module contains a page engine server, which can be configured to listen for page rendering requests. This is used by for example the newsletter module, which asks the batch module to render a page to be used as the content.

The BatchPageEngineConfiguration accepts these configuration properties:

Property name Description
com.iknowbase.batch.pageEngine.enabled Toggles whether to start the fileConverter. The legal values are either “true” or “false”.
com.iknowbase.batch.pageEngine.dequeTimeoutSeconds Number of seconds each dequeue() shall wait before recycyling.
com.iknowbase.batch.pageEngine.spawnPolicy Decides when the pageEngine starts listening for a new message. Use “immediate” for parallel processing, or “delayed” for serial processing.
com.iknowbase.batch.pageEngine.replyMessageExpirationSeconds Number of seconds each reply message shall be valid, before expiring.