Ignore character accents when indexing text - Endeca Record Search Best Practice

How to ignore character accents when indexing text? 
By Default café is indexed separately from cafe. 
Setting --diacritic-folding flag at Dgidx will index café as cafe and setting it at Dgraph level makes search matches with either term (Cafe or café).

How to configure at both Dgidx and Dgraph level?
Dgidx:
Using the --diacritic-folding flag on Dgidx causes accented characters to be mapped to simple ASCII equivalents.

Dgraphs:  
Using the --diacritic-folding flag on the Dgraph allows Anglicized search queries such as cafe to match against result text containing international characters (accented) such as café.

1 comment:

  1. In MDEX Engine 6.2.0, a number of flags have been removed from the Dgraph

    The --diacritic-folding flag has been removed from the Dgraph The --diacritic-folding flag to the Dgraph is no longer necessary to match Anglicized search queries such as cafe against result text containing international characters (accented) such as café. You must still specify the --diacritic-folding flag to Dgidx to map the international characters to their simple ASCII equivalents.

    Source: https://docs.oracle.com/cd/E48636_01/Mdex.6412/pdf/MDEXMigrationGuide.pdf

    ReplyDelete