Loading...
 
Skip to main content

History: Search and List from Unified Index

Source of version: 118

Copy to clipboard
            ! Unified Index
This applies to the ((search)) and capabilities in [https://www.tiki.org|Tiki] to filter (search filters, permissions, etc) content. It is important if not critical to keep a [#Rebuild_search_index|fresh Unified-Index] as Tiki rely on the freshness of the Unified-Index to display and filter content.

{maketoc}

Tiki can support multiple search engines internally. Each of those will have different capabilities and limitations. The default engine should provide capabilities good enough for small and medium sites. Larger sites may need additional infrastructure to get the most performance. Please see: ((Unified Index Comparison))

!! Fields

Below is a matrix between the fields and the object types.

Legend:
^X - Available
/ - Static value
? - Depends on the data^

{FANCYTABLE(head=" Field | Type  | Tokenized* | Sortable | wiki page | forum post | blog post | article | file | trackeritem | sheet | comment | user | Available in Tiki version" sortable="type:reset" sortList="[0,y],[1,0]" tsortcolumns="type:text;group:letter|type:word;group:word;0sortList" tsfilteroptions="type:reset" tspaginate="max:75")}
__object_type__			| Generic | | X|X|X|X|X|X|X|X|X|X|7
__object_id__			| Generic | | X|X|X|X|X|X|X|X|X|X|7
__title__			| Generic | X|X|X|X|X|X|X|X|X|?|X|7
__title_initial__		| Generic | | |X|X|X|X|X|X|?|?| |?
__title_firstword__		| Generic | | |X|X|X|X|X|X|?|?| |?
__language__			| Generic | | X|X|/|/|X|/|?|/|/| |7
__creation_date__		| Generic | | X|X|X|X|X|X|X| |X|X|7/15
__modification_date__	| Generic | | X|X|X|X|X|X|X|X|X| |7
__contributors__		| Generic | |  | X|X|X|X|X|X|X|X| |7
__description__			| Generic |X| | X| |  | X|X| | X| | |7
__contents__			| Generic |X| | X|X|X|X|X|X|X|X|X|7
__wiki_content__		| Specific | |  | X| |  |  |  |  |  |  |X|7
__wiki_uptodateness__	| Specific | | X|X| |  |  |  |  |  |  | |7
__wiki_approval_state__	| Specific | |  |X| |  |  |  |  |  |  | |11
__post_content__		| Specific | |  |  | X| |  |  |  |  | | |7
__post_snippet__		| Specific | |  |  | X| |  |  |  |  | | |14
__parent_thread_id__%%%''(not to be confused with parent_object_id)''| Specific | |  |  | X| |  |  |  |  |  | |8
__root_thread_id__		| Specific | |  |  | X| |  |  |  |  | | |14
__parent_contributors__	| Specific | |  |  | X| |  |  |  |  | | |14
__blog_id__				| Specific | | X| |  | X| |  |  |  |  | |7
__blog_excerpt__		| Specific | |  |  |  | X| |  |  |  |  | |7
__blog_content__		| Specific | |  |  |  | X| |  |  |  |  | |7
__topic_id__			| Specific | | X| |  |  | X| |  |  |  | |7
__article_content__		| Specific | |  |  |  |  | X| |  |  |  | |7
__article_topline__		| Specific | |  |  |  |  | X| |  |  |  | |7
__article_subtitle__	| Specific | |  |  |  |  | X| |  |  |  | |7
__article_author__		| Specific | |  |  |  |  | X| |  |  |  | |9
__article_type__		| Specific | |  |  |  |  | X| |  |  |  | |9
__article_heading__%%%''available as description''| Specific | |  |  |  |  | X| |  |  |  | |9
__published__			| Specific | |  |  |  |  | X| |  |  |  | |13
__sitetitle__			| Specific | |  |  |  |  | X| |  |  |  | |13
__siteurl__				| Specific | |  |  |  |  | X| |  |  |  | |13
__gallery_id__			| Specific | | X| |  |  |  | X| |  |  | |7
__filename_id__			| Specific | | X| |  |  |  | X| |  |  | |7
__filetype__			| Specific |X|X| |  |  |  | X| |  |  | |7
__filesize__			| Specific |X|X| |  |  |  | X| |  |  | |15
__file_comment__		| Specific | |  |  |  |  |  | X| |  |  | |7
__file_content__		| Specific | |  |  |  |  |  | X| |  |  | |7
__tracker_id__			| Specific | | X| |  |  |  |  | X| |  | |7
__tracker_status__		| Specific | | X| |  |  |  |  | X| |  | |7
__tracker_field_''PERMNAME/ID''__%%%''(see below for more details)''| Specific |X|?| |  |  |  |  | X| |  | |7
__sheet_content__		| Specific | |  |  |  |  |  |  |  | X| | |7
__comment_content__		| Specific |X| |  |  |  |  |  |  |  | X| |7
__user_country__		| Specific |X|X|  |  |  |  |  |  |  |  |X|10
__groups__				| Specific |X|X|  |  |  |  |  |  |  |  |X|?
__hits__				| Specific | | |  |X| |  |  |  |  |  | |15
__lastpost_title__		| Specific | | |  |X| |  |  |  |  |  | |15
__lastpost_modification_date__| Specific | | |  |X| |  |  |  |  |  | |15
__lastpost_contributors__| Specific | | |  |X| |  |  |  |  |  | |15
__lastpost_post_content__| Specific | | |  |X| |  |  |  |  |  | |15
__lastpost_post_snippet__| Specific | | |  |X| |  |  |  |  |  | |15
__lastpost_hits__		| Specific | | |  |X| |  |  |  |  |  | |15
__lastpost_thread_id__	| Specific | | |  |X| |  |  |  |  |  | |15
__view_permission__		| Internal | |  | X| |  | X| |  | X| | |7
__parent_object_type__	| Internal | | X| | X|X|X|X|X| | X| |7
__parent_object_id__	| Internal | | X| | X|X|X|X|X| | X| |7
__parent_view_permission__| Internal | |  |  | X|X|X|X|X| | X| |7
__global_view_permission__| Internal | |  |  |  |  |  |  |  |  | X| |7
__hash__				| Internal | |  | X| |  |  |  |  |  |  | |7
__url__					| Internal | |  | X| |  |  |  |  |  | X| |7
__categories__			| Global | |  | X|X|X|X|X|X|X|X| |7
__deep_categories__		| Global | |  | X|X|X|X|X|X|X|X| |7
__allowed_groups__		| Global | |  | X|X|X|X|X|X|X|X| |7
__freetags__			| Global | |  | X|X|X|X|X|X|X|X| |7
__freetags_text__		| Global | |  | X|X|X|X|X|X|X|X| |7
__adv_rating_''ID''__	| Global | | X|X|X|X|X|X|X|X|X|X|7
__comment_count__		| Global | | X|X|X|X|X|X|X|X|X| |8
__relations__			| Global | |  | X|X|X|X|X|X|X|X|X|8
__attachments__			| Global | |  | X|X|X|X|X|X|X|X| |7
__attachment_contents__	| Global |X| | X|X|X|X|X|X|X|X| |7
__geo_located__			| Global | |  | X|X|X|X|X|X|X|X|X|9
__geo_location__		| Global | |  | X|X|X|X|X|X|X|X|X|9
__visits__				| Global | | X|X| |  |  | X| |  |  | |9.2
{FANCYTABLE}
~np~*)~/np~ Tokenized - as in decomposed in words for full text search

!! Searchable and not searchable
Some fields are indexed and are searchable and other are not searchable (more explanation is required). It will influence results that will be outputted when using the plugins List, CustomSearch and ListExecute (probably).

For exemple, deepcategories (child from a parent category) are indexed but not searchable. When you set one of those plugins you need to specify the -+searchable_only+- parameter as follow, -+searchable_only="0"+-

The plugin should start as follow.
{CODE()}
{LIST(searchable_only="0")}
{CODE}

!! Tracker Fields
^In general, tracker fields are indexed as __tracker_field_''PERMNAME/ID''__. ''PERMNAME/ID'' is your tracker field permanent name or ID. However, many tracker field types have additional useful variants of the main field (see below) that are indexed for each field.^
The indexing for tracker fields will vary depending on the field type. As a general rule, tracker_field_''PERMNAME/ID'' will be used as the field and will be sortable. However, there are a few exceptions:
* Image and File fields are not indexed
* TextArea is not sortable

!!! Multilingual fields are indexed as multiple fields
*The main one (tracker_field_''PERMNAME/ID'') contains all languages
* tracker_field_''PERMNAME/ID''_''lang'' contains one language only (tracker_field_12_fr for example)

!!! Rating and related fields store as multiple fields
* tracker_field_''PERMNAME/ID'' contains the average
* tracker_field_''PERMNAME/ID''_sum contains the vote totals
* tracker_field_''PERMNAME/ID''_count contains the number of votes

!!! Items List and Item Link fields
* tracker_field_''PERMNAME/ID''_text contains the text instead of the IDs of the linked/listed items

!!! Language of the tracker item
*If a language field is set for the tracker item, that language is indexed as the item language, i.e. the __language__ field.

__Some used in buildQuery/tiki-searchindex.php (need explanation on whether these are real fields or just helpers):__
__type__: refers to __object_type__
__deep__: if this is set, __categories__ will be considered __deep-categories__
__autocomplete__: Will search for items with title starting with this

!! Rebuild search index
Tiki is applying filters (and partially permissions to view) and in some case reading directly information from the Unified-Index. This allow very good performances and a lot of flexibility. Therefor, it is important to keep a fresh Unified-Index. ((Features)) but also ((plugins)) and ((modules)) will also relies on the freshness of the Unified-Index to display updated content (pages content, items, user informations, etc).

You can see when the index was last fully rebuilt usually by start a rebuild process from he Tiki interface (see below). 

In recent Tiki version the index is stored in the database and you can rebuild (refresh) the Unified-Index with different method. 

{REMARKSBOX(type="warning" title="Outdated information; In older Tiki version")}The unified-index is stored in your server disk and accessible in your Tiki directory -+temp/unified-index/+-. While the rebuild is occuring, a directory -+temp/unified-index-new/+- will appear (This is to permit the existing index to be used until the new one is ready). If -+temp/unified-index-new/+- doesn't disappear after the indexing, something must have gone wrong. You can delete it and try the re-indexing again. You may want to run ''sh setup.sh'' to make sure the permissions are OK.{REMARKSBOX}

!!! From the Tiki interface
It is possible to rebuild the Unified-Index from a browser. However due to possible timeout during long operation it is recommended to do this ONLY on relatively small Tiki (based on the number and the size of your Tiki objects). For medium to high load sites, it is strongly recommended do that operation from the command line (much less limited), see the next paragraph.

* From the admin dashboard, on the main tiki menu click on the item Control Panels, then click on the Search panel. At the top of the page you will find a __Rebuild Index__ button.
* From your URL address bar tiki-admin.php?page=search&rebuild=now

!!! From the command line
The search index can be rebuilt manually from the command line or you can use the Tiki ((Scheduler)) since ((Tiki17))  - see [Scheduler#Rebuild_the_Unified-Index_with_log|Rebuild the Unified Index with log sample] or a ((Cron)) job since ((Tiki9)) to runs the command automatically - see ((Cron Job to Rebuild Search Index)).

Below are the commands that may be used to rebuild the index. Check ((Console)) documentation.
{REMARKSBOX(type=note title=Note)}These commands need to be run as the web server user when using the Lucene implementation, often ''www-data'' or ''nobody'' depending on your server setup, otherwise the web server cannot update and maintain the index. This means prepending the following commands with "''sudo -u www-data ''".{REMARKSBOX}

{REMARKSBOX(type=note title=Note)}''All commands below assume you are already in the Tiki root directory.''{REMARKSBOX}

You can also rebuild it using the unified {DIV(type=span class=inline_syntax)}console.php{DIV} command, with the appropriate parameters. For example:

!!!! Basic command
{CODE()}
php console.php index:rebuild

<-- or -->

php console.php i:r

<-- or add visual verbose, to follow what is going on -->

php console.php i:r -p

{CODE}
{REMARKSBOX(type=note title=Note)}''The syntax "i:r" is equivalent to "index:rebuild".''{REMARKSBOX}

!!!!Multitiki sites
For multitiki sites, you can rebuild with commands like:
{CODE(colors="perl")}
php console.php index:rebuild --site=site1.example.com  
php console.php index:rebuild --site=site2.example.com  
...
{CODE}

!!!!Successful rebuild
If the rebuild is successful a message like the following will be produced (for cron jobs, this can usually be sent to you via email as part of automatically running the command):
{CODE()}
 Started rebuilding index...

Unified search
--------------

Engine: MySQL, version 5.7.34
Indexed
  trackeritem: 210
  tracker: 13
  trackerfield: 104
  article: 67
  file: 285
  file gallery: 20
  comment: 0
  user: 8
  group: 7
  wiki page: 55
  category: 18
Rebuilding index done
Index: index_6291d4b25591f

Execution Statistics
---------------------

Execution time: 16 secs
Current Memory usage: 51.0 MiB
Memory peak usage before indexing: 22.5 MiB
Memory peak usage after indexing: 51.2 MiB
Number of queries: 12383
{CODE}

!!!!Troubleshooting
If the rebuild is unsuccessful, instead of the above message you may get a message that indicates there has been an internal server error, or it may say "Rebuild in progress." This may be because the rebuild process uses more memory or takes more time than allowed by the server's php settings. Such settings can be changed on the fly as part of the rebuild command - examples of how to do this are shown below. 

!!!!! Not enough memory
__Increase memory limit__
One way to increase memory is to change the {DIV(type=span class=inline_syntax)}memory_limit{DIV} php setting as follows (this example changes the memory limit to 4 gigabytes while the rebuild process is running):
{CODE()}
php -dmemory_limit=4G console.php i:r --log
{CODE}
You could also direct php to use a specific {DIV(type=span class=inline_syntax)}php.ini{DIV} file, where there may be a higher memory limit setting or no limit. In this case you would use the {DIV(type=span class=inline_syntax)}-c{DIV} parameter followed by the path to the {DIV(type=span class=inline_syntax)}php.ini{DIV} file, as in the example below:
{CODE()}
php -c /etc/php5/cli/php.ini console.php i:r --log
{CODE}

!!!!! Timeout error
__Increase maximum execution time__
Getting an internal server error may indicate the rebuild process takes longer than the {DIV(type=span class=inline_syntax)}max_execution_time{DIV} php setting. That can be increased as part of the command as shown below where the max execution time is set to 300 seconds, or 5 minutes. (This command is also increasing the memory limit as described above):
{CODE()}
php -dmemory_limit=4G -dmax_execution_time=300 console.php i:r --log
{CODE}

!!!!! Force rebuild
When the rebuild is unsuccessful with a "Rebuild in progress" message, this usually means that the rebuild failed previously in the middle of the process, leaving a temporary folder called {DIV(type=span class=inline_syntax)}temp/unified-index-new{DIV} on the server. When a new rebuild is started and the program sees this folder, it thinks there is a rebuild already in progress and will stop. You can either delete this folder before rebuilding again or include the {DIV(type=span class=inline_syntax)}--force{DIV} parameter in the rebuild command as follows:
{CODE()}
php -dmemory_limit=4G -dmax_execution_time=300 console.php i:r --force --log
{CODE}

!!!!! Log to review possible problems with content
{CODE()}
php console.php i:r -p --log
{CODE}

!!!!! Several index_ files in the database
Typically happening when using the same database over and over you may end with several indexes in your database.
We provide a developer tool to clean properly this issue.
You need to be the web user and use bash to run the script.
{CODE()}
bash doc/devtools/clear_mysql_search_indexes.bash
{CODE}

You need to rebuild your index after the operation.

!!!!! MySQL limits for very big numbers of tracker fields

!!!!!! How to detect
If ''Unified Search'' is configured with the ''MySQL Full Text Search'' engine, you might encounter a failure to reindex with the following symptoms:
* -+ php console index:rebuild +- stops without displaying the list of indexing statistics:
{CODE()}php console index:rebuild
[14 mai 2020 09:26 EDT] Started rebuilding index...
Unified search engine: MySQL, version 10.1.45-MariaDB
(it takes a very long time and in the end nothing shows up)
{CODE}

If you investigate with producing a log it ends this way:
{CODE()}php console.php index:rebuild --log 
[14 mai 2020 16:10 EDT] Started rebuilding index... logging to file: temp/Search_Indexer_mysql_database_name_console.log
Unified search engine: MySQL, version 10.1.45-MariaDB
(it takes a very long time and in the end nothing shows up)

tail temp/Search_Indexer_mysql_database_name_console.log 
…
2020-05-14T16:21:47-04:00 ERR (3): Indexing failed while processing "2512" (type trackeritem) with the error "Could not perform index modification: Too many columns"
2020-05-14T16:21:47-04:00 ERR (3): WARNING: PDO::query(): SQLSTATE[HY000]: General error: 1117 Too many columns {"code":null,"file":"/path/to/tikiroot/lib/core/TikiDb/Pdo.php","line":104}
2020-05-14T16:21:47-04:00 INFO (6): addDocument trackeritem 2513 {"memoryUsage":"46.5 MiB"}
2020-05-14T16:21:47-04:00 ERR (3): Indexing failed while processing "2513" (type trackeritem) with the error "Could not perform index modification: Too many columns"
2020-05-14T16:21:47-04:00 ERR (3): WARNING: PDO::query(): SQLSTATE[HY000]: General error: 1117 Too many columns {"code":null,"file":"/path/to/tikiroot/lib/core/TikiDb/Pdo.php","line":104}
2020-05-14T16:21:47-04:00 INFO (6): addDocument trackeritem 2514 {"memoryUsage":"46.5 MiB"}
{CODE}

!!!!!!- Technical explanation (thanks Victor)
What does matter is this: https://dev.mysql.com/doc/refman/5.7/en/create-table-files.html#limits-frm-file

According to those calculations, our lengthy tracker field names each prefixed with 'tracker_field_' are the reason for the early limit hit. frm files impose a 65K bytes of limit for the table definition. For example, 1500 fields with the formula from the page results in around 1500*17 + 1500 * (avg length of the column name in bytes + 1). Considering 'tracker_field_' is already 14 bytes, there is not much space left for actual column names.

…
However, now seeing this:
https://dev.mysql.com/doc/refman/8.0/en/data-dictionary-file-removal.html

I previously saw that version 8 did not have the limits section and decided to see where is MySQL going to... so it seems, they removed this 64KB frm file limit in version 8. Not sure about mariadb but it must be going in the same direction? I couldn't find relevant docs on their site.

!!!!!! Solutions

The following options in -+ Control panels → Search +- help for this situation:
* __MySQL use short field names__
+ ''Due to frm file constraints, number of search fields that one index can hold is usually limited to about 1500. This can be exceeded if you have numerous tracker fields. Enabling this option will try to shorten the field names internally that should allow you to use 300-500 more fields. Switching this option requires full index rebuild.''
+ Summary: This makes no difference in how you use Tiki.
* __Don't index non searchable fields__
+ ''Indexing will skip adding all tracker fields that are not marked as "searchable". This will free index space but also make it impossible to use those fields in search index queries.''
+ Summary: You need to review the ''Searchable'' property of you tracker fields. A full index rebuild will be necessary after changes in the tracker fields ''Searchable'' properties.

Hint: If the first option is enough for your site, you may ignore the second one.

!!!!! Indexing log common error troubleshooting

When rebuilding your index use the following command to create a log file.
Note you need to point to the PHP for the PHP version your Tiki uses.
{CODE()}
php console.php i:r -p --log
{CODE}
 In the log you may see some error, this is an attempt to help and diagnose the possible issues.

|| Error | Possible fix
addDocument article 275 %%% ERR (3): NOTICE: Trying to access array offset on value of type bool.../...lib/core/Search/ContentSource/ArticleSource.php","line":53 | Edit article275, Advanced tab check the URL validity in the "Source" field.||


!! Unified Index storage in the database (MySQL full-text search)
By design the unified-index store data in the MySQL database using MyISAM no matter if your Tiki uses innoDB or MyISAM (not all versions of InnoDB offer FULLTEXT).

!!! Duplicate unified-index table
For some reasons you may found that your database has several index (tables like index_...alphanumeric...).
This happen when they were issues rebuilding the index or when moving and upgrading the database (Tiki upgrade).

You can delete those tables (always consider saving a backup of your data prior any work) and rebuild the unified-index. Tiki will recreate and relink everything properly. You can find information about the unified-index in use at the Search control panel, General settings under Unified search index. 

!! Related
* ((Cron Job to Rebuild Search Index))
* ((PluginList))
* ((PluginCustomSearch))
* ((dev:Unified Search)) 

!! Developer Notes
See [https://dev.tiki.org/Unified-Index#Developer_Notes].

-=alias names for this page=-
(alias(Unified Search)) | (alias(UnifiedSearch)) | (alias(Enterprise search)) | (alias(Search Index)) | (alias(SearchIndex)) | (alias(UnifiedIndex)) | (alias(IndexRebuild)) | (alias(Index Rebuild))
~tc~ (alias(Unified Index)) (alias(Check-file-indexing)) ~/tc~
        

History

Advanced
Information Version
King David KAMBALE 133
Jonny Bradley remove mouseover from inside the fancy table heading param which seems to now break in 26.x (and replaced badly worded * note) 132
Bruno Kambere 131
Bruno Kambere 130
Marc Laporte 129
Marc Laporte 128
Marc Laporte Put in PluginMouseover 127
Marc Laporte 126
Marc Laporte 125
Marc Laporte Moved to dedicated page 124
Marc Laporte Move to own page 123
Marc Laporte Thank you Victor 122
Marc Laporte 121
Marc Laporte Index is being rebuilt at the moment and cannot start another rebuild process. 120
Bernard Sfez / Tiki Specialist 119
Bernard Sfez / Tiki Specialist Adding information about searchable and not searchable fields 118
Bernard Sfez / Tiki Specialist Fixing typos and links 117
Bernard Sfez / Tiki Specialist Improved the information especially related to rebuilding the index 116
Bernard Sfez / Tiki Specialist Removing fancy table and table sorter... not working (due to quotes in a cell ?) 115
Bernard Sfez / Tiki Specialist 114
Bernard Sfez / Tiki Specialist 113
Bernard Sfez / Tiki Specialist Link to Tiki main page (SEO) and adding an alias of a "lost" page (see gardeners blog) 112
Bernard Sfez / Tiki Specialist 111
Bernard Sfez / Tiki Specialist Adding troubleshooting from log output 110
Bernard Sfez / Tiki Specialist 109
Bernard Sfez / Tiki Specialist Adding information about clear_mysql_search_indexes.bash 108
Bernard Sfez / Tiki Specialist Added information about storage of the unified index in the database 107
Marc Laporte 106
luciash d' being 🧙 Page renamed from Unified Index to Search and List from Unified Index. Semantic alias redirect created 105
Bernard Sfez / Tiki Specialist adding title_initial and title_firstword 104
luciash d' being 🧙 103
Jean-Marc Libs 102
Jean-Marc Libs 101
Jean-Marc Libs 100
luciash d' being 🧙 formatting 99
luciash d' being 🧙 formatting 98
luciash d' being 🧙 moved to dev 97
luciash d' being 🧙 96
luciash d' being 🧙 95
Jonny Bradley autotoc off 94
Marc Laporte AutoTOC 93
drsassafras removed tiki 9 & 10 specific info 92
Marc Laporte 91
Philippe Cloutier typos 90
Xavier de Pedro 89
Xavier de Pedro 88
Xavier de Pedro 87
Xavier de Pedro 86
Xavier de Pedro 85
Xavier de Pedro 84