Using Apache Solr for Content Search

Concept

iKnowBase comes with ready-to-use components for integration with the Apache Solr open source enterprise search platform ( http://lucene.apache.org/solr/).

When using iKnowBase together with Apache Solr, the following components will be in use:

The Apache Solr server is installed on one or more computers. Inside Apache Solr, an iKnowBase search component is installed to handle security considerations
The database repository and ikbBatch application cooperates to send index updates from iKnowBase to Apache Solr
The ikbViewer application has user interface components to easily generate search requests and navigate the search result

The process for indexing works somewhat like this:

The content is inserted, updated or deleted in iKnowBase (1). This can be done thru Forms, Service API or any other user interface made to update content in iKnowBase.
The document may be triggered by a Solr event if the event condition matches the document (2).
If triggered, an “index request” is queued to the Oracle Database AQ-system (3). The operation will either be update or delete.
The ikbBatch application listens to index requests (4).
If the document is new or updated, ikbBatch will retrieve the document (5), as described in the Solr Configuration. The Solr Configuration defines how the document should be represented in Apache Solr and maps the document/attribute values to Solr fields.
Finally, ikbBatch creates and sends “update” or “delete” messages to Apache Solr for updating the Solr index (6).

The process for search works somewhat like this:

Whenever a user initiates a search operation (7), the user interface component creates a SolrQuery and submits to an iKnowBase SolrSearchClient component for execution.
The search client adds security related information to the request, signs the request using a secure Message Authentication Code, and sends the request to Apache Solr.
Inside Apache Solr, the iKnowBase search component verifies the security information, and adds the required search conditions (8).
Apache Solr returns a normal search response which is rendered by the user interface components in iKnowBase.
A result set from SOLR can easily be merged with a viewer. This will enhance the final result and you don’t have to store everything in SOLR.

Installation and setup

See the Installation Guide for installation guidelines.
After you’re done with the installation you are ready to index your content.

iKnowBase security

The Apache Solr Search Server, which is distributed together with iKnowBase, includes iKnowBase Solr components which handle security in terms of authorized access to documents. The iKnowBase security search component is configured in the configuration file “solrconfig.xml”, and configured for use by the search handler “/select”. Search handlers, which use this component, will load security information from iKnowBase and filter the result set by iKnowBase access control lists.

Note: If you will configure new search handlers, include the iKnowBase security search component to ensure authorized access to iKnowBase data. For information regarding security in autocomplete operations, see Configuring search suggestions below.

Configuring the indexing process

Before you start indexing your content you need to decide the following:

What kind of documents/information should be indexed? You need to investigate your content and define what to index. When you have defined all indexable content you will need to represent the set in one or more Indexing events in Development Studio (available under the Advanced tab), see Development Reference . An Indexing event will trigger all documents with the given condition and issue an update statement to the Content Indexer.
When you have decided which documents to be indexed, you then have to define what to index within a document. You must create a Solr configuration in Development Studio (available under the Advanced tab), where you define all the attributes to be indexed and how they should be represented in Apache Solr, see Development Reference . Should the attribute be indexed itself? Should we be able to search for the attribute value text when we do a freetext search? Should it be possible to display the value in the result set? The Solr configuration screen will help you by presenting the most common attributes in your database, they are normally good candidates for indexing.

Building a search page

To build a search page, you will use a Groovy-based template (HtmlViewer, ScriptViewer or ScriptAction), where the iKnowBase SolrSearchClient component provides access to the Solrj library.

The basic flow for a search page is as follows, see below for examples:

Acquire a search client from the available beans
Acquire a SolrQuery using the search client. The search client can either provide a new SolrQuery every time, or provide a SolrQuery which is stored in the user session and can be reused. Using a session-based SolrQuery avoids having to send all query configuration on the URL, as they current state is already present.
Use the search client to apply URL-parameters to the SolrQuery, or set up the required parameters manually

Create unique package- and class names in every template to avoid conflicts with other solr templates. Use a descriptive name and avoid using com.iknowbase.

The package declaration defines where the objects declared in the file are stored, e.g “facets”, “sortFields” og “SolrjSearchClass”.
The “searchClients.default” defines the SOLR search instance to be used (Defined in SOLR Configuration)

.bc package no.customer.intranet;
import org.apache.solr.client.solrj.SolrQuery;
def facets = [ “type_${context.language}”, “status_${context.language}”, “title”];
def sortFields = [ “document_id”, “type_${context.language}”, “updated_index_store_date” ];
new SolrjSearchClass (searchClients.default, html, context, facets, sortFields).run();

To merge a result set with a iKnowBase SOLR Viewer you first need to define a SOLR viewer with a SOLR presentation style. Then in the Groovy based template, you combine like this :

def rowset = this.searchClient.getRowSet(response, "<External key to the SOLR Viewer");
    for (int i=0 ; i<response.results.size() ; i++) {
        renderDocument (response.results[i], rowset.rows[i]);
    }

In renderDocument, you will have access to data from both SOLR and the Viewer. e.g.

def renderDocument(document, ikbRow) {
       ul {
                li("description=${document.description}");
                li("ikb.title=${ikbRow.document.title}");
                ikbRow.items.each { key, item ->
                    li {
                        mkp.yield(key + ": ");
                        mkp.yieldUnescaped (item.getAsString() ?: "")
                    }
                }
            }

In terms of documentation, you will be interested in the following:

The Solrj javadoc
The iKnowBase PresentationServices API, in particular the javadoc for SolrSearchClient

Sample search page structure

A transport set is provided as an example where a basic search page is included with faceting, ordering and autocomplete based on a facet search. Import the file
etc/EXP-IKB_MASTER_67-F250F353F65159F3E040000A18007A88-iKnowBase-Demo-SolrExample.dmp from /ikbStudio/advanced/importjobs. It contains the “essentials” for using a search client in the page /demo/solr.

Configuring search suggestions

Search suggestions (autocomplete) can be implemented in several ways.

On the provided sample page, see chapter Sample search page structure, autocomplete is implemented using a faceted search. On the search page there is an Ext JS combobox which loads data through an iKnowBase script action. The script action forwards an autocomplete request, which actually is a faceted search, to Solr. Note: The property facetMinCount is set to 1 to prevent unauthorized access to data, ie. a facet is only returned if it is in use in one of the documents available to the user.

The Solr Suggester component provides an alternative way to implement autocomplete. Note: The Solr Suggester provides no mechanism to add a security filter and the user may get access to unathorized data, ie. if the completion comes from the title index, the user may see a document title even though he is not authoarized to view the document.

Monitoring the Solr solution

A few key items to check:

On the ikbBatch page, check that the content indexing queue is emptied.
Monitor the solr-instance using the console at http://<hostname>:<solr-port>/solr/#/
Optimize the solr index using the console at http://<hostname>:<solr-port>/solr/#/~cores/<corename>

Using Apache Solr for Content Search
Previous		Next
Advanced Configuration		Using iKnowBase Instant

Previous	Top	Next
Advanced Configuration		Using iKnowBase Instant