History: Unified Index Comparison
Preview of version: 34
Unified Index Comparison
The Search and List from Unified Index has support for multiple engines. While all of them offer the same functionality and connect to various functionality such as the content search, PluginList, PluginCustomSearch and various others, they will have different performance characteristics and some may offer additional features.
As a general rule, the engine can simply be switched and the index rebuilt without any additional change to the configuration.
Overview
The unified index engines are:
- MariaDB/MySQL Full Text Search (is the default for 12.x onwards)
- introduced in Tiki12
- Within MariaDB/MySQL, additional memory required
- Fast indexing (can be 10 times faster than Zend_Search_Lucene), slower/unstable query speed
- No configuration required
- Not customizable
- Elasticsearch
- introduced in Tiki12
- Independent Java server(s), horizontally scalable
- Feature-rich
- Fast indexing, fast/stable query speed, decent/good results
- Typically, Elasticsearch is set up as a cluster on different servers than Tiki (or using a third-party service), but it is also possible to install on the same server.
- Customizable
- Manticore Search, new in Tiki25
- Feature-rich
- Very fast
- Written in C++
- Customizable
- Can be set up as a cluster
- Requires small amounts of RAM (compared to Elasticsearch)
- This is the default setup for WikiSuite once Tiki25 is released, and it is an option of the installer.
- Zend_Search_Lucene (PHP Implementation) Removed: Last version Tiki21
- ''introduced in Tiki7
- "This component is no longer maintained; the last PHP version it was tested against is 5.3."
- Complete PHP implementation
- CPU and Disk I/O bound
- Slowest indexing, stable query speed, decent results
- Customizable
- Requires file permission configuration (Apache needs to be able to write to the file)
The system is designed for maintaining an autonomy vis à vis the engines. So more can be added later. No long-term data is stored in the indexes and it's fairly easy to switch from one to another. The next logical addition is OpenSearch. Please contact Marc Laporte if you have specific needs.
Limitations
The Zend_Search_Lucene implementations's primary limitation is indexing performance.
- It is heavily bound on I/O disk operations and indexing will consume important amounts of PHP memory.
- Query caching may cause the disk to use large amounts of space.
- When the search query is too wide and provides over 1000 documents (configurable), it will not fully explore all possibilities to keep ranking time reasonable.
The MySQL implementation has several limitations:
- Words with 3 or less characters will not be indexed unless the MySQL server configuration is modified.
- MySQL comes with an extensive list of English stop words, preventing many queries from working.
- MySQL can use a single index at a time. Depending on the query, performance can vary significantly.
- MySQL has several limitations on the number of columns and indexes it can contain. Complex sites with many different query patterns may hit those limitations.
- No support for field boosting, such as providing more relevance for hits on the title.
- There is a limitation on the number of tracker fields. The limitation is quite high (2000+), but when you hit it, you need to move to Zend_Search_Lucene because MySQL/MariaDB has a hard limit. It is not possible to know in advance the precise number of maximum fields because some tracker field types require more than one MySQL column.
Elasticsearch requires a dedicated environment to be installed and works better with multiple instances running. It does not have known limitations.
Extra features
- Stored Search
- Only supported by Elasticsearch and Manticore
- Faceted search (dynamic filters applicable on search results)
- Only supported by Elasticsearch and Manticore
- Module More Like This
- Only supported by Elasticsearch
- Federated Search
- Only supported by Elasticsearch (Manticore on roadmap)
Selection guidelines
Tiki 24 and before
- Small sites, simple functionality: MySQL Full Text Search
- Medium or large sites, advanced functionality: Elasticsearch
Tiki 25 and up
- Small sites, simple functionality: MySQL Full Text Search
- If you are already using Elasticsearch and are happy with it: Elasticsearch
- Medium or large sites, advanced functionality: Manticore
Engine-specific notes
Zend_Search_Lucene (PHP Implementation)
The default implementation is based on Zend_Search_Lucene, a PHP implementation of the Java Lucene index engine. The engine has no external dependencies and can run on all hosts. However, some configuration may be required to reach acceptable performance.
The time required to build a complete index will vary depending on the content of the site. As a reference,
- doc.tiki.org (this site), with over 1400 pages, reindexes in around 3 minutes
- themes.tiki.org, with some pages and several hundreds of forum posts, reindexes in around 15 seconds
- corporate intranets with several gigabytes of data in file galleries can take over an hour
Alexander Veremy, the author of the component provided some insight on how to adjust the parameters.
MySQL Full Text Search
No notes.
Elasticsearch
No notes.
Differences of results between engines
A tool has been created to compare the results, and make sure all is OK: https://gitlab.com/tikiwiki/tiki/-/merge_requests/940/