Endeca data model design - Pros/Cons

Problem Statement:
         Consider  ~30,000 sku’s that are each represented uniquely across up to 8000 stores.   Each store could have up to 20 unique fields for each product. These include various prices, sale prices, coupon codes and on hand inventory which are specific for every store.

This totals up to:

30,000 skus * 8000 stores * 20 unique fields = 4.8 BILLION cells of data that need to be stored in Endeca.

Solution:

There are two viable data models available:

1) “Wide” model that consists of adding store-specific attributes to each base product record. This equates to 30,000 rows of data (one for each product) where each row has 160,000 columns of data.

2) “Multi-Record” model that consists of a full copy of each product record with one store’s data attached. This equates to 240 million rows of data for each product-store combination where each row has 20 columns of data.

Wide record  pro’s and con’s:

Pros:
1) Simple, performant queries

Cons:
1) High indexing time
2) Complex process to dynamically create attributes
3) Complex dimension mapping for precise values like price
4) Indexing scales poorly for >100k properties
5) Complex display logic

Multi-Record pro’s and con’s:

Pros:
1) Simple, Performant Queries
2) Simple Indexing Logic
3) Simple Dimension Mapping for Precise Values
4) Simple Attribute Display Logic

Cons:
1) Large Index Size:  Memory and Disk Footprint
2) Possible Run-time Performance Issues From Inadequate Memory

4 comments:

  1. Hi Ravi,

    It's really nice post.In context with ATG -Endeca Integration , I have to index around 4.5 millions skus. After lots of googling I found one way to handle the indexing of this many records is , we need to use shard (partitioning of data set on the basis of unique property like repository ID ) indexing approach.
    But here issue is I haven't got any reference document or oracle support post yet. If you are having any idea (any docs or post) then could you please share it.
    Is there any Pros/Cons are there if I am following the shard approach ??


    Thanks,
    swapnil

    ReplyDelete
    Replies
    1. Hi Swapnil,

      I recommend to involve Oracle Endeca consultant or some good experienced Endeca Architect. This needs detailed analysis of data model with proper Hardware design. For your requirement 4.5 Million SKU's is not a huge unless it has Multi store requirement with unique price and inventory. It needs proper record design, Indexing optimization and Query Level optimization.

      Regards,
      Ravi Honakamble

      Delete
  2. cheap oakley sunglasses, combining elegant style and cutting-edge technology, a variety of styles of cheap oakley sport sunglasses, the pointer walks between your exclusive taste style.

    ReplyDelete