Showing posts with label Endeca Design and Best Practices. Show all posts
Showing posts with label Endeca Design and Best Practices. Show all posts

What can Partial update do and can't do? - Endeca partial Update

What can Partial update do and can't do ? 

Endeca Property: 
 - Change, Remove, Update the property values.
 - Cannot Add, Remove, Change a property itself 

Dimension:
 - Change, Remove, Update dimension values. 
 - Cannot Add, Remove, Change a dimension itself
 - Cannot Add, Remove something in the hierarchy of dimension values

Oracle recommendations - Multi-site Operations setup

Here is the standard recommendations from Oracle on multi-site Operations setup:


SEO URL optimization, Endeca sitemap generation, SEO meta data best practices - Endeca SEO capabilities

Customers should use both Oracle Commerce Platform SEO and Guided Search and Experience Manager SEO capabilities as complements of each other in order to manage SEO for the entire site. There are multiple elements of SEO—URL optimization, site map generation, SEO meta data—that are outlined below.

URL Optimization
   1) Pages that have a MDEX navigation state should use the URL Optimization API included with the Assembler
   2) Use Oracle Commerce Platform URL Recoding for all other Oracle Commerce pages
Site Map Generation
   3) Customers should use components included in both Oracle Commerce Platform and Guided  

Site Map Generation
Customers should use components included in both Oracle Commerce Platform and Guided Search for site map generation.
    1) Use the Guided Search Site Map Generator to generate the links to pages for navigation states that can include search terms or dimension intersections. This has a significant SEO benefit because the long-tail will be included in the site map, and natural search results can bring site visitors deep into the catalog (and closer to conversion).
    2) Use the Oracle Commerce Platform SitemapGeneratorService to generate the sitemap for all other Oracle Commerce pages

SEO Meta Data
   1) For pages managed in Experience Manager, set keywords in the pages and cartridges.
   2) For other pages, set up keywords in the BCC. With properly coded pages, the Oracle Commerce Platform will automatically add these keywords to pages.

Oracle Endeca UI design pattern Library - Search and Browse features

Link for Oracle Endeca UI design pattern Library for search and browse features:
http://www.oracle.com/webfolder/ux/applications/uxd/endeca/content/library/en/home/patterns.html

Endeca Experience Manager - Preview & Audit Workflow

Endeca OOTB Preview Workflow steps:
     1)  BCC Merchandiser will change the content(Category/Product/SKU etc..) and push it to the Stage workflow 
     2)  BCC Merchandiser use workflow and push content from Stage to Production
     3)  Run Endeca Index. It can be kicked off in two ways  
                       - On BCC project deployment
                       - Scheduled indexing (http://ravihonakamble.blogspot.com/2015/07/how-to-schedule-endeca-index-using-atg.html)
     4) An Authoring MDEX, where Workbench users preview & Audit their Experience Manager content, Thesaurus, etc. 
     5) Validate content on Stage Preview ATG Store instance pointing at Authoring MDEX
     6) Promotion process from Authoring to Live/Production MDEX takes place when you run the /opt/app/endeca/apps/APPNAME/control/promote_content.sh shell script
  7) Endeca XM Content available to the Live user through ATG Live Store  instances

Here is the Endeca Content promotion flow diagram from Oracle:

Using Load balancer with Endeca - Best Practices

For all deployment architecture oracle recommends below best practices:  

1) Use load balancers with the MDEX Engine to increase throughput and ensure availability in the  event of hardware failure. 

2) Oracle recommends including two hardware-based load-balancing switches configured redundantly in your configuration. Having two load balancers ensures their availability in the event of a load balancer hardware failure.

3) Use http://[host]:[port]/admin?op=ping to check Dgraph availability in load balancer script and if it is not running then Load Balancer directs queries to the MDEX that is currently available.

4) Use "least connections"model algorithm for balancing traffic to the Dgraphs  

5) Make sure the response from Dgraphs is redirected directly to the client tier and does not pass back to the load balancer hardware.

Endeca Dimension search Performance Tuning - Best Practices

Performance of dimension search directly depends on number of dimension values and size of the resulting set of matching dimension values. If Dimension values returned are more than 1000 then it will cause big performance hit.
Compound dimension search is more expensive than non-compound request. if you are not using compound dimension feature then disable it. Here is the link on how to disable it:
http://ravihonakamble.blogspot.com/2015/07/endeca-few-items-on-dgrpah-side-to.html
 
In both the cases, You can limit the results by using advanced dimension search parameters.
Ex: Di parameter to specify the specific dimension or list of Dimension value Id's. 

D=avengers&Di=11378
D= Dimension Query Search Term
Di= One or more ID's of dimensions to search against

Endeca Search and performance Impact - Record Search|Wildcard Search|Boolean Search|Phrase Search

Record Search:
Record search is an indexed feature, each property enabled for record search increases the size of the Dgraph process. Only properties that are needed by an application for record searching should be configured.
Use Endeca Set Selection feature :  http://ravihonakamble.blogspot.com/2015/06/endeca-select-feature-aka-set-selection.html

Wildcard Search:
If wildcard search is enabled in the MDEX Engine (even if it is not used by the users), it increases the time and disk space required for indexing. Therefore, consider first the business requirements for your Endeca application to decide whether you need to use wildcard search. 
Recommendations:
1) Avoid wildcard searches with one non-wildcarded character, such as a* , since they are more expensive to process
2) Parse the queries to calculate their search term length to avoid very low information queries, such  as "a*". Avoid MDEX Engine wildcarding queries that contain fewer than 3 non-wildcarded characters. 
3) Remove all non-searchable characters from each wildcard query before issuing it to the MDEX Engine 
4) Exclude wildcard queries with quoted phrase searches.
FYI 
If search queries contain only wildcards and punctuation, such as *.* , the MDEX Engine rejects them for performance reasons and returns no results. 

Boolean Search:
 The performance of Boolean search is a function of the number of terms and operators in the query and also the number of records associated with each term in the query. As the number of records increases and as the number of terms and operators increase, queries become more expensive. If you notice unexpected behavior while using Boolean search, use the Dgraph-v flag when starting the Dgraph. This flag prints detailed output to stderr describing the running Boolean query process.

Phrase Search:
The cost of phrase search operations depends mostly on how frequently the query words appear in the data and the number of words in the phrase. You can improve performance of phrase search by limiting the number of words in a phrase with the --phrase_max <num> flag for the Dgraph
Using this flag improves performance of text search with phrases. The default number is 10. If the maximum number of words in a phrase is exceeded, the phrase is truncated to the maximum word count and a warning is logged.

Endeca Select feature aka set Selection - Controlling Endeca Record Values

The Select feature allows you to select specific keys (Endeca properties and/or dimensions) from the data so that only a subset of values will be transferred for Endeca records in a query result set.

This functionality prevents the transferring of unneeded properties and dimension values when they will not be used by the front-end Web application.

It therefore makes the application more efficient because the unneeded data does not take up network bandwidth and memory on the application server.

There are two possible scenarios:
1)  If you are making independent Endeca queries then you can use below API:

// Create a query
ENEQuery usq = new UrlENEQuery(request.getQueryString(),"UTF-8");
// Create an empty selection list
FieldList fieldList = new FieldList();
// Add an Endeca property to the list
fieldList .addField("P_DisplayName");
// Add an Endeca dimension to the list
fieldList .addField("P_RepositroyId");
// Add the selection list to the query
usq.setSelection(fieldList );
// Make the MDEX Engine query
ENEQueryResults qr = nec.query(usq);

2)If you are using ATG-Endeca integration then use below component and add only required fields.
 /atg/endeca/assembler/cartridge/handler/config/ResultsListConfig.properties
 fieldNames=\
  P_DisplayName,\
  P_RepositroyId


Endeca Derived properties to get Min and Max price for Sale/Clearance/List/MSRP prices - Endeca Features

A derived property is a property that is calculated by applying a function to properties or Dimension values from each member record of an aggregated record. Derived properties are created by Forge, based on the configuration settings in the Derived_props.xml file. After a derived property is created, the resultant derived property is assigned to the aggregated record.

Problem Statement:
Consider one product with 5 SKU's and each SKU has sale,List,clearance and MSRP price tagged and requirement is to traverse through all the prices tagged on SKU's and get the MIN and MAX for sale, Clearance, List  and MSRP.


Solution:
1) Change Derived_props.xml file in pipeline

/opt/app/endeca/apps/CRS/config/pipeline/Derived_props.xml
<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <!DOCTYPE DERIVED_PROPS SYSTEM "derived_props.dtd"> <DERIVED_PROPS>
  <DERIVED_PROP DERIVE_FROM="sku.listPrice" FCN="MAX" NAME="P_SKU_ListPrice_Max"/>
  <DERIVED_PROP DERIVE_FROM="sku.listPrice" FCN="MIN" NAME="P_SKU_ListPrice_Min"/>
  <DERIVED_PROP DERIVE_FROM="sku.salePrice" FCN="MAX" NAME="P_SKU_SalePrice_Max"/>
  <DERIVED_PROP DERIVE_FROM="sku.salePrice" FCN="MIN" NAME="P_SKU_SalePrice_Min"/>
  <DERIVED_PROP DERIVE_FROM="sku.MSRP" FCN="MAX" NAME="P_SKU_MSRP_Max"/>
  <DERIVED_PROP DERIVE_FROM="sku.MSRP" FCN="MIN" NAME="P_SKU_MSRP_Min"/>
  <DERIVED_PROP DERIVE_FROM="sku.clearencePrice" FCN="MAX" NAME="P_SKU_ClearencePrice_Max"/>
  <DERIVED_PROP DERIVE_FROM="sku.clearencePrice" FCN="MIN" NAME="P_SKU_ClearencePrice_Min"/>
</DERIVED_PROPS>

2) Run baseline index
3) Check reference application for Derived properties
Endeca reference Application:

8056678  (5 Records)
DERIVED PROPERTIES:
P_SKU_ListPrice_Max:  39.990000
P_SKU_ListPrice_Min:  38.990000
P_SKU_SalePrice_Max:  39.990000
P_SKU_SalePrice_Min:  38.990000
P_SKU_MSRP_Max:  55.000000
P_SKU_MSRP_Min:  54.000000
P_SKU_ClearencePrice_Max: 
P_SKU_ClearencePrice_Min: 
REPRESENTATIVE REC PROPERTIES:
allAncestors.repositoryId:  cat130021
allAncestors.repositoryId:  cat130052
allAncestors.repositoryId:  cat130155
allAncestors.repositoryId:  rootCategory
FYI - Derived properties can be only created on Aggregated records.
Valid functions are MIN, MAX, AVG, or SUM

Endeca data model design - Pros/Cons

Problem Statement:
         Consider  ~30,000 sku’s that are each represented uniquely across up to 8000 stores.   Each store could have up to 20 unique fields for each product. These include various prices, sale prices, coupon codes and on hand inventory which are specific for every store.

This totals up to:

30,000 skus * 8000 stores * 20 unique fields = 4.8 BILLION cells of data that need to be stored in Endeca.

Solution:

There are two viable data models available:

1) “Wide” model that consists of adding store-specific attributes to each base product record. This equates to 30,000 rows of data (one for each product) where each row has 160,000 columns of data.

2) “Multi-Record” model that consists of a full copy of each product record with one store’s data attached. This equates to 240 million rows of data for each product-store combination where each row has 20 columns of data.

Wide record  pro’s and con’s:

Pros:
1) Simple, performant queries

Cons:
1) High indexing time
2) Complex process to dynamically create attributes
3) Complex dimension mapping for precise values like price
4) Indexing scales poorly for >100k properties
5) Complex display logic

Multi-Record pro’s and con’s:

Pros:
1) Simple, Performant Queries
2) Simple Indexing Logic
3) Simple Dimension Mapping for Precise Values
4) Simple Attribute Display Logic

Cons:
1) Large Index Size:  Memory and Disk Footprint
2) Possible Run-time Performance Issues From Inadequate Memory

Tips to Improve performance of ATG-Endeca integration environment (Version 3.1.1 and 3.1.2)

1) Make Sure below patch is applied
     
 Patch 17342677  - It reduces the number of supplemental objects that are returned with the queries and fixes an XML Parser locking problem.

2) Check the properties being returned by Endeca ( Apply Endeca setSelection feature)
     In Assembler, you can select which properties are returned back with the search results
     Include only properties that are required on the application. Here is the ATG component path
  /atg/endeca/assembler/cartridge/handler/config/ResultsListConfig.properties
 Refer - http://ravihonakamble.blogspot.com/2015/06/endeca-select-feature-aka-set-selection.html 

3) Disable Endeca preview on Production
Use/dyn/admin/nucleus/atg/endeca/assembler/cartridge/manager/AssemblerSettings/ component and  set previewEnabled = false

4) Configure records Per Aggregate Record set to 1
 atg/endeca/assembler/cartridge/handler/config/ResultsListConfig
   # For aggregate records, sets the number of sub records that should be included in the results
 subRecordsPerAggregateRecord=ONE  


5) Ensure non-Endeca URLs don’t hit Assembler
Use /atg/endeca/assembler/AssemblerPipelineServlet.ignoreRequestURIPattern component to set URL patterns