Endeca CAS | Multiple Record store merge - Oracle Commerce 11.1

Multiple record store could be used to merge data. This merging would be work as switch join.  Find out the steps below to use multiple record store :-

 1. Open <Endeca App Path>\APPNAME\config\cas\last-mile-crawl.xml file and add all record stores. Find out sample below. Multiple record store is highlighted below. We can add any number of record stores to merge.
                                <sourceConfig>
                                                <moduleId>
                                                                <id>com.endeca.cas.source.RecordStoreMerger</id>
                                                </moduleId>
                                                <moduleProperties>
                                                                <moduleProperty>
                                                                                <key>dataRecordStores</key>
                                                                                <value>APPNAME-data</value>
                                                                                <value>APPNAME-web-crawl</value>
                                                                </moduleProperty>
                                                                <moduleProperty>
                                                                                <key>dimensionValueRecordStores</key>
                                                                                <value>APPNAME-dimvals</value>
                                                                </moduleProperty>
                                                </moduleProperties>
                                                <excludeFilters />
                                                <includeFilters />
                                </sourceConfig>
  2. Run following command to update configuration in CAS
a.       <Installpath>\CAS\11.1.0\bin >cas-cmd.bat updateCrawls -f <Endeca App Path>\APPNAME\config\cas\last-mile-crawl.xml
  3.  Run Baseline.

Note :- Endeca CAS uses record.id as unique identifier. We can define our own. Suppose two record appears in record store A and B with same record id . CAS would discard one of the record.

37 comments:

  1. Hi Ravi,

    In the context of your above thread I am having some below queries,

    I got success to add both the record-store's (using CAS based app deployment model) and I used Datasource approach of the workbench to achieve my configuration , with the help of Manipulators(Modifying script) and (Filter Scripts) on data Source.

    Where I have to specify my other properties from the record's such as,

    1. I have to map my "record.id" property with endeca property name "Product.RepositoryId" , (As I am using Product.RepositoryId as a rollupKey )
    2. I need to define some properties as a searchable, dimension etc. where I can configure them?
    3. If suppose I want to rename my source i/p property with new endeca property name.

    In short , in terms of older developer studio based pipeline approach we used to do that with Dimensions, Properties , PropertyMapper section . So now where we should do all that??

    Could you please provide your valuable inputs.

    Thanks,
    Swapnil

    ReplyDelete
  2. Hi Swapnil,

    You can find details about how to create dimension and properties in forge-less approach

    http://ravihonakamble.blogspot.com/2015/07/oracle-commerce-11x-how-to-define.html

    Let me know if you have any doubts during setup.

    Regards,
    Ravi

    ReplyDelete
  3. Hi Ravi,

    Thanks, for your reply. I have defined below mapping,

    "document.text" : {
    "propertyDataType" : "ALPHA",
    "mergeAction" : "ADD",
    "sourcePropertyNames" : ["Endeca.Document.Text"],
    "jcr:primaryType" : "endeca:property",
    "isRecordSearchEnabled" : true,
    "isRecordFilterable" : true
    },
    "document.name" : {
    "propertyDataType" : "ALPHA",
    "mergeAction" : "ADD",
    "sourcePropertyNames" : ["Endeca.FileSystem.Name"],
    "jcr:primaryType" : "endeca:property",
    "isRecordSearchEnabled" : true,
    "isRecordFilterable" : true
    }


    I tried the above mappings and able to index the data. But I am bit confused when I was trying to search any contents inside the JSP_reference application for "document.text" property/(This is full crawl contents of the PDF file).
    I am facing following issues:-

    1. Even my record contains the Rollup_key , when I am searching for the contents with (Property=All , match _mode=All , term=Contents) in Jsp_ref application , it's not populating result.
    But when I am using (Property=document.text, match_mode=All , term=Contents) , It's giving me proper result.

    So do you know why this misbehave at the MDEX level? Any solution to overcome this situation?

    2. I am using PDF files for crawl having size more than 2MB each. So is it happening because of search data is large in size..?

    3. Do I need to specify some extra parameters during the property(document.text) formation apart from the above formation.?

    Thanks,
    Swapnil

    ReplyDelete
  4. Hi Ravi,

    Thanks, for your reply. I have used below mappings,
    "document.text" : {
    "propertyDataType" : "ALPHA",
    "mergeAction" : "ADD",
    "sourcePropertyNames" : ["Endeca.Document.Text"],
    "jcr:primaryType" : "endeca:property",
    "isRecordSearchEnabled" : true,
    "isRecordFilterable" : true
    },
    "document.name" : {
    "propertyDataType" : "ALPHA",
    "mergeAction" : "ADD",
    "sourcePropertyNames" : ["Endeca.FileSystem.Name"],
    "jcr:primaryType" : "endeca:property",
    "isRecordSearchEnabled" : true,
    "isRecordFilterable" : true
    }



    I tried the above mappings and able to index the data. But I am bit confused when I was trying to search any contents inside the JSP_reference application for "document.text" property/(This is full crawl contents of the PDF file).
    I am facing following issues:-

    1. Even my record contains the Rollup_key , when I am searching for the contents with (Property=All , match _mode=All , term=Contents) in Jsp_ref application , it's not populating result.
    But when I am using (Property=document.text, match_mode=All , term=Contents) , It's giving me proper result.

    So do you know why this misbehave at the MDEX level? Any solution to overcome this situation?

    2. I am using PDF files for crawl having size more than 2MB each. So is it happening because of search data is large in size..?

    3. Do I need to specify some extra parameters during the property(document.text) formation apart from the above formation.?


    Thanks,
    Swapnil

    ReplyDelete
  5. Hi Swapnil,

    You need to add above two properties as part of "All" search interface to search across those fields.

    Here are the steps for forge-less approach:

    1) Go to MDEX folder
    /opt/app/endeca/apps/CRS/config/mdex

    2) Add newly created properties in below two files:
    CRS.recsearch_config.xml
    CRS.recsearch_indexes.xml

    3) Run indexing

    I suggest to create separate search interface for Crawled content and use it rather using All search interface.

    Let me know if you are still facing issues.

    Regards,
    Ravi

    ReplyDelete
  6. Hi Ravi,

    Thank you very much ...! Really appreciate your helpful move. It worked very well for me .Just wanted to know in terms of CAS-Based approach which Oracle Guided Search Document(11.1) specifies all this configuration details (Which previously we are configuring through Developer Studio pipeline configuration).

    Thanks,
    Swapnil

    ReplyDelete
  7. Hi Swapnil,

    Glad that it worked for you :).

    You can find more details in Endeca Developer Guide. Here is documentation related blog where you will find version specific links:.

    http://ravihonakamble.blogspot.com/search/label/MDEX%20documentation

    Regards,
    Ravi

    ReplyDelete
  8. Hi Ravi,

    Your blog is really helpful.

    I am working with the above approach to show content from WebCenter Sites to CRS, in Virtual Machine (Demo Machine)

    I have configured everything properly.

    records from WCS are in StoreContentRepository and indexing is done properly.

    My "endeca_jspref" application is showing results for WCS contents very well.. I have removed filters from CRS.
    But on hitting search on OOTB CRS, I am not getting proper result list.

    Its giving me 3 records for single article (Not Consolidated), In short Not applying the Rollup key during search.

    Other records(like OOTB Products, PDF records) are working fine..

    Do you have any suggestions or check,s for me.

    Regards,

    Shailesh Mane

    ReplyDelete
    Replies
    1. Hi,

      Found solution for that..

      Adding and making new roll up key with name 'product.repositoryId' in store-content-article-config.xml solved my problem..

      We need to do this because CRS consolidate results according to produst.repositoryId (Correct me if Im wrong)

      Thanks,

      Shailesh Mane

      Delete
    2. Hi Shailesh,

      For article records CRS has defined separate roll-up key and you can use it to fetch results. CRS has common key between all three records that you were seeing and used it as roll up key.

      Regards,
      Ravi

      Delete
  9. Hi Ravi,

    Is there any way to apply one or more roll-up key during querying the MDEX? Is there any OOTB CRS supported component available.. I am asking this because I have seen there are around 4 rollup key's applied at "Index-Config.json" file in CRS. But in Shailesh's context he is configuring one by tweaking the respective SCAC,SCIC ect. files , as described above.

    Do we have any other workaround for achieving this.

    Records,
    Swapnil

    ReplyDelete
    Replies
    1. Hi Swapnil,

      It always depends on what requirement you have and how you are going to design it. You can definitely define multiple roll up keys. Here is one version of records from various sources:

      http://www.walgreens.com/search/results.jsp?recType=product&Ntt=soap

      All three tabs will make separate call to Endeca to get results based on record type defined and if needed it can be rolled up based on common key.

      Regards,
      Ravi

      Delete
  10. Hi Ravi,

    The situation is, I'm trying to add derived properties according to your post
    http://ravihonakamble.blogspot.com/2015/07/oracle-commerce-11x-how-to-define.html
    The .derived_props.xml file is added into /apps/appname/config/mdex, according to my last-mile-crawl configuration.
    And I've added a roll-up key in endeca_jspref.

    Next, I run to run baseline index, but the CAS output log shows:
    INFO [cas] [cas-appname-last-mile-crawl-worker-3] com.endeca.itl.executor.output.mdex.MdexConfigurationTransformer.[appname-last-mile-crawl]: skipping copying of non-MDEX config file: .derived_props.xml
    So in the folder /apps/appname/data/cas_output, appname.derived_props.xml isn't update. (I change other file in mdex folder and it got changed in cas_output)
    I also tried and same thing happens to:
    CRS.recsearch_config.xml
    CRS.recsearch_indexes.xml

    So now sadly, when I click on one roll-up key, the "DERIVED PROPERTIES:" Label shows empty result.
    Would you suggest anything that I can do?
    Thanks,
    Leung

    ReplyDelete
    Replies
    1. I've also tried to put derived_props.xml in folder pipeline only, and then run baseline index. It doesn't help to display the derived prop either.

      Delete
    2. Are you using FORGE based approach or FORGE-less approach?

      Above thread is for FORGE based approach and if you are doing FORGE-less then use below doc:

      http://docs.oracle.com/cd/E55323_01/Common.111/pdf/CommerceAdminGuide.pdf#page=144

      Regards,
      Ravi

      Delete
  11. Hi Ravi,

    I need some help.
    I am trying to combine the WEB crawler output with the atg product catalog output in Oracle guided search CAS based Indexing approach.
    But I have seen that the WEB crawler output doesn't contains any property like - "Record.id"
    Instead it is creating the "Endeca.Id".
    In my case I need to map the "Endeca.id"(from Web Crawler Output) to the "Record.id".
    OR
    Want to add new property at the time of crawling.

    Could you please suggest me the best way to achieve this.

    Regards,
    Shailesh

    ReplyDelete
    Replies
    1. Hi Shailesh,

      Were you able to resolve this issue? I am having same issue now. Any help is appreciated!

      Delete
  12. Hi Ravi,

    Very nice content helped me lot in learning.I have a query regarding migration,How to migrate dimension data from 11.0 to 11.1(forge less), is there any utility to generate input file for 11.1 CAS from 11.0 dimension export.

    ReplyDelete
  13. Hi Ravi,

    How Record Manipulators are used in CAS based. Please let me know the steps.

    Thanks
    Sandeep Dandin

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Hi Ravi,

    It has been nice going through your blogs. Although, I am stuck in one basic question.

    In forge we can create record-> manipulators where we write tag. Do we have similar feature in forge less approach. If not then what is the work around. Quick response will be well appreciated.

    Sumit

    ReplyDelete
    Replies
    1. Hi Sumit,

      There are many ways to handle it. Here are two different ways:

      1) ATG side using components injected into /atg/commerce/endeca/index/ProductCatalogSimpleIndexingAdmin/

      2) For Forgeless use CAS based Record Manipulators

      Regards,
      Ravi

      Delete
    2. Hi Ravi,

      I was wondering if we have something similar to record assembler in CAS based. I came to know that CAS record merger only uses switch join which means two record stores should have same property names. In record assembler we use left join. Can you help me on this.

      Sumit

      Delete
    3. Note that the only thing that is missing from the Forge is the ability to join multiple record store instances. Record store merger can only perform a switch join, which is a union of records and cannot achieve a left or right join, combining the source records into one record. If at all such a join is required, between data from multiple sources, it should be accomplished externally in ATG source side before loading data into record stores.


      Regards,
      Ravi

      Delete
  16. Hi Ravi,

    I have seen this note:
    Endeca CAS uses record.id as unique identifier. We can define our own.

    Please tell me, how and where can I do this and where can I overwrite existing configurations?

    Kindest regards,
    Heiko

    ReplyDelete
  17. Hi Ravi,

    I have a question. I am using web crawler to crawl data from some internal sites. TRecords are generated with Endeca.Id as unique key. I have added my crawler record store instance to last-mile-crawl of my endeca application. But these web crawler records are getting skipped during base line. Other record store instances in the last-mile crawl has record.id as unique key and those are getting indexed. Can you suggest me what to do?

    ReplyDelete
    Replies
    1. Adding modifying script manipulator in the crawl and adding a new prop as record.id resolved my issue.

      idPropertyValue = record.getPropertySingleValue("Endeca.Id");
      record.addPropertyValue(new PropertyValue("record.id", idPropertyValue.value));
      logger.info("Processed Record:" + idPropertyValue.Value);

      May be it works for you as well.

      Delete
    2. Hey Thank you so much. It worked well. Able to index both data sources now

      Delete
  18. This comment has been removed by the author.

    ReplyDelete
  19. Hi Ravi,

    I am trying to migrate webcrawling from CAS-Forge to only CAS. In previous one, I have record manipulator to ensure it removes error pages etc. Can i achieve the same thing directly in CAS. Currenly, I am writing the webcrawl output directly to recordstore. Please let me know if it can be done.

    Regards,
    Sumit

    ReplyDelete
    Replies
    1. Problem resolved. Added manipulators in last-mile-crawl.

      Delete
  20. Hi Ravi,
    Can you tell me how I can create a delta pipeline in conjunction with my existing baseline pipeline? I want to do an incremental crawl from db but not sure how to proceed. Please help.

    thanks

    ReplyDelete
  21. Hi Ravi

    Could ypu help me out here please?
    Can you tell me which two queries can be fired together, that is reccord and aggregation,record and navigation,record and dimension, navigation and dimension?

    ReplyDelete
  22. Hi Ravi,

    Can you please help me in the below content?

    I am trying to merge the file system record stores(generated with ID Endeca.id) and OOTB data record stores(generated with common.id).

    While indexing the data, I am getting an issue like, "missing source key common.id".

    Tried multiple ways to resolve this. I modified the data-recordstore.xml under the APPNAME/config/cas/ by changing the record spec id as 'Endeca.Id'. But it is failing for other data source.

    help you please suggest? Should I write a manipulator to change the record id as 'common.id' for both?

    Thanks,
    Prasanthi b

    ReplyDelete
    Replies
    1. Resolved the issue using the CAS's ModifyScriptManipulator.

      Delete
  23. Hi ravi,

    I have a issue regarding my indexing, my baseline indexing is getting completed successfully but m not able to see any product in endeca jspref.
    can you please guide me how to resolve this issue.

    thanks in advance

    ReplyDelete
  24. Hi Ravi,

    I have issue with endeca indexing. I create new InterestOutputConfig component and add it into ProductCatalogSimpleIndexingAdmin component. After indexing i can see correct in dyn/admin at InterestOutputConfig component by generate ID.

    But that new record type not shown in jspref. Also i indexed list of interest with product, that also shown in ProductCatalogOutputConfig component. but not in jspref.

    Can you please guide or suggest any thing is missing.

    Thanks in advance

    ReplyDelete