eZ Platform Document Field Mappers

von Mario Blažek - September 28, 2020

eZ Publish/Platform and Solr search have a long and established partnership from the early days of the eZ existence. In the legacy days, we were using the old and proven eZ Find extension. With the release of eZ Platform, we switched to the all-new Solr search engine. Back then, this new search engine provided only a minimal set of features. New features are added regularly. One of the most exciting features is the document field mappers, which we are going to explore more in detail.

Demystifying the Content and Location search

Before going more in detail, we need to demystify the content and location search. One could say that content is searching for content and location for locations. Yes, true, but it is not that simple.

There is a whole range of differences from the developer standpoint. First of all, to build the list of the search criteria we use the Criterion value objects. While most of them can be used for either Content or Location search, there are some that are Location specific. Criterion like Priority, IsMainLocation and Depth can be only used in the context of Location search. When you think about it, there is no point in searching for Content with the main location, as the Content is only one, but can have multiple or no Locations. The same thing goes with the Sort clauses. Sort clauses like Id, Depth, IsMainLocation, Path, Priority and Visibility can be only used with Location search.

The next obvious difference is the usage of the Query value objects. The criterion objects for Content search are attached to the instance of eZ\Publish\API\Repository\Values\Content\Query value object, but in the case of Location search, we need to use the eZ\Publish\API\Repository\Values\Content\LocationQuery.

The SearchService

The main and responsible service for performing search operations is the SearchService. It exposes a very powerful search API, allowing full-text search and querying the content Repository using mentioned Criterion value objects and Sort Clauses. It handles those objects across different search engine backends, like Apache Solr, Elastic, which is available in the enterprise package or so-called Legacy search engine which uses a database-powered search engine for basic needs.

Most important methods of the SearchService are findContent() and findLocations() which accept the instance of Query in case of findContent(), the content search, and LocationQuery instance in case of findLocations(), the so-called location search.

Document field mappers

As mentioned above, the Solr search engine goes hand in hand with the eZ Platform and as a developer, you will often find the need to index some additional data in the Solr search engine.

To effectively use the document mappers we first need to understand how the data is indexed with Solr. The Solr stores data in a hierarchy of documents. The documents are the basic building blocks of the Solr index. Documents, per se don’t have a defined hierarchical structure, that’s where the blocks jump in to allow us to define a structure. There are some search engines available that provide the structure out of the box, but Solr is not one of them. As for the eZ Platform, documents are indexed per translation, as Content blocks, and in Solr, a block is a nested document structure. In a content block, the parent document represents the Content, and Locations are indexed as child documents of the Content. To avoid duplication, full-text data is indexed on the Content document only.

With this in mind, the Solr search engine provides this extension points to index additional data on the following:

all block documents - Content and nested Locations for all translations
all block documents for a specific translation
Content documents
Content documents for a specific translation
Location documents

To see more in detail which classes must be extended and which Symfony dependency injection tags used, please consult the official documentation.

A practical example - a child indexer

Let’s imagine that we have received a request from the client to index the article’s children content items of type image onto the article document itself. The idea is to improve search for article content items by indexing title and caption fields to article documents in Solr.

On article view, we are using child images to build a slider. So it would be appropriate for child images to participate in enriching the article’s Solr documents. This example is based on the Netgen Media site installation, which is free and open-source.

In the previous chapter, we have described all the available extension points for document field mappers and now we need to decide which one is going to be implemented in our case. As we only have a single language site, we can skip the translation based mappers, so that's two out. Next, the plan is only to use the content search, which means that we are targeting only content documents, thus our mapper of choice is the ContentFieldMapper, tagged with ezpublish.search.solr.field_mapper.content tag.

Article children mapper

First, let’s create the ArticleChildrenMapper class that extends the ContentFieldMapper class from EzSystems\EzPlatformSolrSearchEngine\FieldMapper namespace. Our job is to implement two methods. The accept() method, which decides if our mapper is going to be used for a given Content, and mapFields() method, which is responsible for mapping our data into Solr fields.

If you look more closely, you will notice that those methods are not receiving the Content value objects from the Repository API, as we used to, but rather from the SPI layer. This is the design choice that eZ Platform devs took to put the document mappers in the SPI layer. From the Onion architecture standpoint, the SPI is the lower layer than the Repository API and it should not be aware of the existence of the API. The second thing is, the SPI does not handle permissions, so we don’t have to take care of that when implementing the custom document mappers. Furthermore, the SPI layer is not meant to be used in the userland, so working with it is not so straightforward as with the Repository API.

From the standpoint of the SPI layer, we can’t just inject the Repository services into our mapper, like ContentService and LocationService and work with them, but we need to use their counterparts from the SPI layer. In our mapper, we will work with, Content, Locations, and ContentType, so in the Repository API we would use:

eZ\Publish\API\Repository\ContentService
eZ\Publish\API\Repository\LocationService
eZ\Publish\API\Repository\ContentTypeService

But in the SPI we should use the following:

eZ\Publish\SPI\Persistence\Content\Handler as ContentHandler
eZ\Publish\SPI\Persistence\Content\Location\Handler as LocationHandler
eZ\Publish\SPI\Persistence\Content\Type\Handler as ContentTypeHandler

If you compare those services, you will notice that they differ in methods, they are not 1:1 mapped.

Now, when we know all this, let’s work on our implementation. Obviously, the first step is to activate our mapper when the article is being processed. For that purpose, the accept() method is going to return true when the article's content type id is matched. For the sake of this example, I have put the article content type ID as a constant. A better approach would be to inject the ConfigResolver and fetch the parameter from there.

So far so good, this mapper is going to be used when the article is being processed, now let’s jump to the more complicated stuff, mapping our data to Solr fields. In the mapFields() methods we need to load all children images of the current article and process their name and caption fields. The loadSubtreeIds() method from LocationHandler will help us here. It will return all location IDs in that subtree, not the Location value objects. For these IDs, we need to load the Content and Location by using the load() methods from the respective handlers. Again, have in mind, those value objects are from the SPI layer, not the Repository API. Now, we come to the part when the SPI layer is a bit cumbersome, retrieving the field values. The SPI Content values object holds the field values mapped by the field IDs, not the field identifiers. That’s why we first need to load the respective ContentType, find fields by the identifiers on the ContentType, and then finally retrieve the required fields from the Content. For this purpose, I have created the extractField() and getFieldDefinitionId() helper methods.

The name field is a standard text line, which does not require any additional processing, but the caption field is an XML Text field. It must be translated from the XML format into the standard text string. This is done with the help of the process() and extractText() helper methods.

In the end, our responsibility is to map this data into the eZ\Publish\SPI\Search\Field value objects. Field value object requires three things, the name of the field in the Solr schema, in our case we map data to the full-text field meta_content__text directly, then the data we prepared to be indexed, and as the third argument we pass the type of the Solr field, which is identified by another value object, the eZ\Publish\SPI\Search\FieldType\TextField in our case. You can check the full list of available Solr field types here.

The last step in our document mapper can be improved a bit. Now we are directly targeting the full-text field in the Solr scheme, which is fine for this case, but if Solr schema is changed sometime in the future by the core devs of the eZ Platform, it would definitely break our mapper. For this purpose, it is recommended to use the eZ\Publish\SPI\Search\FieldType\FullTextField, which will handle the name of the full-text search automatically. So just for the reference, in our case it would look like this:

$fields[] = new Search\Field(
    'fulltext',
    $name->value->data,
    Search\FieldType\FullTextField()
);

$fields[] = new Search\Field(
    'fulltext',
    $this->process($caption),
    Search\FieldType\FullTextField()
);

The ‘fulltext’ identifier is irrelevant in this case, and basically, we can put whatever we want here as this value is not used. Here is the complete implementation of our field mapper:

<?php

declare(strict_types=1);

namespace AppBundle\Query\Index;

use eZ\Publish\SPI\Persistence\Content as SPIContent;
use EzSystems\EzPlatformSolrSearchEngine\FieldMapper\ContentFieldMapper;
use eZ\Publish\SPI\Persistence\Content\Handler as ContentHandler;
use eZ\Publish\SPI\Persistence\Content\Type\Handler as ContentTypeHandler;
use eZ\Publish\SPI\Persistence\Content\Location\Handler as LocationHandler;
use eZ\Publish\SPI\Persistence\Content;
use eZ\Publish\SPI\Persistence\Content\Type as ContentType;
use eZ\Publish\SPI\Persistence\Content\Field as ContentField;
use eZ\Publish\SPI\Search;
use RuntimeException;
use DOMDocument;
use DOMNode;

class ArticleChildrenMapper extends ContentFieldMapper
{
    private const NG_ARTICLE_CONTENT_TYPE_ID = 44;
    private const IMAGE_CONTENT_TYPE = 'image';

    /**
    * @var \eZ\Publish\SPI\Persistence\Content\Type\Handler
    */
    protected $contentHandler;

    /**
    * @var \eZ\Publish\SPI\Persistence\Content\Location\Handler
    */
    protected $locationHandler;

    /**
    * @var \eZ\Publish\SPI\Persistence\Content\Type\Handler
    */
    protected $contentTypeHandler;

    public function __construct(
        ContentHandler $contentHandler,
        LocationHandler $locationHandler,
        ContentTypeHandler $contentTypeHandler
    )
    {
        $this->contentHandler = $contentHandler;
        $this->locationHandler = $locationHandler;
        $this->contentTypeHandler = $contentTypeHandler;
    }

    public function accept(SPIContent $content)
    {
        return $content->versionInfo->contentInfo->contentTypeId == self::NG_ARTICLE_CONTENT_TYPE_ID;
    }

    public function mapFields(SPIContent $content)
    {
        $mainLocationId = $content->versionInfo->contentInfo->mainLocationId;
        $mainContentId = $content->versionInfo->contentInfo->id;

        $result = $this->locationHandler->loadSubtreeIds($mainLocationId);

        $fields = [];
        foreach ($result as $locationId => $contentId) {

            if ((int)$contentId === (int)$mainContentId) {
                continue;
            }

            $location = $this->locationHandler->load($locationId);
            $content = $this->contentHandler->load($location->contentId);
            $contentType = $this->contentTypeHandler->load($content->versionInfo->contentInfo->contentTypeId);

            if ($contentType->identifier !== self::IMAGE_CONTENT_TYPE) {
                continue;
            }

            $name = $this->extractField($content, $contentType, 'name');
            $caption = $this->extractField($content, $contentType, 'caption');

            $fields[] = new Search\Field(
                'meta_content__text',
                $name->value->data,
                new Search\FieldType\TextField()
            );

           $fields[] = new Search\Field(
               'meta_content__text',
               $this->process($caption),
               new Search\FieldType\TextField()
           );
        }

        return $fields;
    }

    private function extractField(Content $content, ContentType $contentType, $identifier): ContentField
    {
        $fieldDefinitionId = $this->getFieldDefinitionId($contentType, $identifier);
        foreach ($content->fields as $field) {
            if ($field->fieldDefinitionId === $fieldDefinitionId) {
                return $field;
            }
        }
        throw new RuntimeException(
            "Could not extract field '{$identifier}'"
        );
    }

    private function getFieldDefinitionId(ContentType $contentType, $identifier): int
    {
        foreach ($contentType->fieldDefinitions as $fieldDefinition) {
            if ($fieldDefinition->identifier === $identifier) {
                return $fieldDefinition->id;
            }
        }
        throw new RuntimeException(
            "Could not extract field definition '{$identifier}'"
        );
    }

    private function process(ContentField $field): string
    {
        $document = new DOMDocument();
        $document->loadXML($field->value->data);

        return $this->extractText($document->documentElement);
    }

    private function extractText(DOMNode $node): string
    {
        $text = '';

        if ($node->childNodes) {
            foreach ($node->childNodes as $child) {
                $text .= $this->extractText($child);
            }
        } else {
            $text .= $node->nodeValue . ' ';
        }

        return trim($text);
    }
}

And of course the service definition:

services:
    app.index.article.content_field:
        class: AppBundle\Query\Index\ArticleChildrenMapper
        arguments:
            - '@ezpublish.spi.persistence.content_handler'
            - '@ezpublish.spi.persistence.location_handler'
            - '@ezpublish.spi.persistence.content_type_handler'
        tags:
            - { name: ezpublish.search.solr.field_mapper.content }

Now, it is the proper time to reindex the content in Solr:

php bin/console ezplatform:reindex --siteaccess=ngadminui

If everything is running smoothly we can proceed to the implementation of the search controller.

TranslationFieldMapper

Handling translations in document mappers is out of the scope of this blog post, but I just want to make a small digression. By extending the ContentTranslationFieldMapper for example, the only difference is the language parameter both in accept() and mapFields() methods. Another difference is that this translation based mapper is going to run for each translation of the provided content. So if we have an article with English and German translation, this would force our mapper to run twice, once for English and the second time for German, with the language parameter representing the current translation being processed.

In this case, the base mapper is going to prepare the proper content translations based on the language parameter. Again, for similar purposes in the context of the Repository API we would use the TranslationHelper service, which takes the context of the current siteacess and languages. The SPI layer is not aware of the concept of site accesses. Everything else will be the same as in the case of our Article children mapper.

Search page implementation

Now let’s create a simple search controller for the article search page. For the query type, we are going to reuse the existing one provided by the NetgenSiteBundle, the NetgenSite:Search query type. Results are going to be paginated via Pagerfanta, but that is all very familiar for eZ developers. So, here is the search controller implementation:

<?php

declare(strict_types=1);

namespace AppBundle\Controller;

use Netgen\Bundle\EzPlatformSiteApiBundle\Controller\Controller;
use Netgen\EzPlatformSiteApi\Core\Site\Pagination\Pagerfanta\FindAdapter;
use Pagerfanta\Pagerfanta;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;

class SearchController extends Controller
{
    public function __invoke(Request $request)
    {
        $queryType = $this->getQueryTypeRegistry()->getQueryType('NetgenSite:Search');

        $searchText = trim($request->query->get('searchText', ''));

        if (empty($searchText)) {
            return $this->render(
                "@ezdesign/search/article_search.html.twig",
                [
                    'search_text' => '',
                    'pager' => null,
                ]
            );
        }

        $query = $queryType->getQuery([
            'search_text' => $searchText,
            'content_types' => ['ng_article'],
        ]);

        $pager = new Pagerfanta(
            new FindAdapter(
                $query,
                $this->getSite()->getFindService()
            )
        );

        $pager->setNormalizeOutOfRangePages(true);
        $pager->setMaxPerPage((int) 12);

        $currentPage = $request->query->getInt('page', 1);
        $pager->setCurrentPage($currentPage > 0 ? $currentPage : 1);

        return $this->render(
            "@ezdesign/search/article_search.html.twig",
            [
                'search_text' => $searchText,
                'pager' => $pager,
            ]
        );
    }
}

It is defined as a service, of course:

services:
    app.controller.search:
        class: AppBundle\Controller\SearchController
        parent: netgen.ezplatform_site.controller.base

Let’s map our controller to the /article/search route:

app.route.article.search:
    path: /article/search
    defaults:
        _controller: "app.controller.search"
    methods: [GET]

And finally, the search page template:

{% extends nglayouts.layoutTemplate %}

{% block pre_content %}
    <form action="{{ path('app.route.article.search') }}" method="get" class="form-search">
        <header class="full-page-header full-search-header">
            <div class="container">
                <div class="search-inputs">
                    <div class="input-group"> 
                        <input type="text" value="{{ search_text }}" name="searchText" id="Search" class="form-control" placeholder="{{ 'ngsite.search.placeholder'|trans }}" />
                        <button type="submit" class="btn btn-dark">{{ 'ngsite.search.button'|trans }}</button>
                    </div>

                    {% if search_text is not empty %}
                        {% if pager.nbResults == 0 %}
                            <div class="result-message result-message-error">
                                <h2>{{ 'ngsite.search.no_results'|trans({'%searchText%': search_text}) }}</h2>
                            </div>
                        {% else %}
                            <div class="result-message result-message-success">
                                <h2>{{ 'ngsite.search.results'|trans({'%searchText%': search_text, '%searchCount%': pager.nbResults}) }}</h2>
                            </div>
                        {% endif %}
                    {% endif %}
                </div>
            </div>
        </header>

        <div class="full-search-results">
            <div class="container">
                {% if search_text is not empty %}
                    <div class="row">
                        <div class="col-xs-12">
                            {% if pager.nbResults > 0 %}
                                {% if pager.haveToPaginate() %}
                                    {{ pagerfanta(pager, 'ngsite') }}
                                {% endif %}

                                <div id="search-result">
                                    {% for search_hit in pager.currentPageResults.searchHits %}
                                        {% set score = null %}

                                        {% if search_hit.score is not null %}
                                            {% set score = (search_hit.score * 100)|round %}
                                        {% endif %}

                                        {% include '@ezdesign/parts/ng_view_content.html.twig' with {
                                            content: search_hit.valueObject.content,
                                            location: search_hit.valueObject,
                                            view_type: 'search',
                                            params: { 'score_percent': score }
                                        } only %}
                                    {% endfor %}
                                </div>

                                {% if pager.haveToPaginate() %}
                                    {{ pagerfanta(pager, 'ngsite') }}
                                {% endif %}
                            {% else %}
                                <ul class="full-no-results-list">
                                    <li>{{ 'ngsite.search.no_results.check_spelling'|trans }}</li>
                                    <li>{{ 'ngsite.search.no_results.change_keywords'|trans }}</li>
                                    <li>{{ 'ngsite.search.no_results.less_specific_keywords'|trans }}</li>
                                    <li>{{ 'ngsite.search.no_results.reduce_keywords'|trans }}</li>
                                </ul>
                            {% endif %}
                        </div>
                    </div>
                {% endif %}
            </div>
        </div>
    </form>
{% endblock %}

The complete code example you can find at this repository.

Conclusion

The document field mappers are a very cool feature of the eZ Platform Solr search engine, allowing developers to customize and improve the search on eZ Platform based websites. Hopefully, the document mapper implemented as an example in this blog post managed to demystify the design choice of document mappers and everything related to them. It takes a bit of time to get used to the SPI layer, but we can master it with a bit of practice.

Netgen Stack now supports eZ Platform version 3

von Ivo Lukač - September 23, 2020