BODL, Massload, DataLoad : WCS Dataload Options

Posted by Unknown on
In previous releases of WCS, mass load utility was the primary tool available out of the box to load data into WCS database. In the beta release of WCS 7, IBM had floated new dataload option known as Data Load Utility, it is important to know these definitions as BODL is an asset that has been developed by IBM Software Services for WebSphere to address some of the shortcomings of id resolver/massload and Data load utility  is inline with the BODL architecture and technically BODL is not very different from Data Load utility.

Officially IBM still recommends to make use of massload based dataload approach for less commonly used data and to use data load utility to efficiently load product, price, and inventory data.


Overview of Dataload Utility



1. The DataReader is a Java component which does the job of reading the source file, in case of CSV reader you would define the source data structure into a configuration file and DataReader will use this informtation to load the data.

The Data Reader component implements a next() method which returns one chunk of data read from a data source. this can definitely not scale up well for high volume of data as every row of record is a new Object going into your JVM heap. 

From my expierence I believe that the parsing of a source file using a Java component is always a bad choice for high volume dataloads, compared to this a SQL*Loader is a high-speed data loading utility that loads data from external files into tables in an Oracle database.

2. Mediators: Mediators are available out of the box, CatalogMediator
for instance if you are loading catalog table it populates the physical object of CATALOG table from the catalog logical object.

3. Dataload utility in my opinion is still not a serious contender for Dataload, as it has very limited support in terms of readers, supports very limited Business objects, I could not figure out how can one write custom logic, as it seems to do one to one mapping of source to destination field.
Overview of BODL
  • Mostly an asset of IBM services group, it is freely available for WCS customers only.
  • BODL is a set of Java files which work very similar to Data load utility, it has more readers, can be customized and surprisingly it can handle more components that Data load utility (I would imagine IBM will release all of the BODL features into Dataload utlity in future versions).
  • Publicly IBM does not provide any documentation or sample code on BODL, this is available only by request for WCS customers.

Overview of Massload

  • Source data should be converted to XML format (This should be based on the DTD generated using DTDgen utility of WCS)
  • Genrerate xml should be Id resolved using idresgen utility.
  • Mass load the idresolved xml file using massloader utility.
  • Can have serious performance issues, in case you are processing large record sets, this is not the most efficient data load options.
  • Debugging errors is very difficult with this dataload option.

Custom Dataload Options

  • Some of the commercial ETL tools may not be good fit as WCS primary key generation and translation logic may get very dirty.
  • From my experience Java is not my most preferred language for high volume data processing and data translation, if you understand the WCS data model well, you could write your own custom data processing tools using SQL Loader / PLSQL / Python scripts, which ever technology you use, at the end of the day you are inserting records into WCS tables using SQL.
References

Dataload architecture Overview
Overview of Massload Utilities
Dataload Best Practices




17 comments:

  1. Hi,
    From your evaluation,is it an easy migration from BODL to DataLoad? .I'm currently supposed to migrate some WCS batch processes to BODL and our environment is WCS6 so wanted to get a feel of how close it's to DataLoad

    ReplyDelete
  2. Hi,

    I regularly read you blog for Websphere Commerce. I am working on WCS 6. One of the major issue that we face regularly is upload of images and EAR update. The ScheduledContentManagedFileEARUpdate simply fails to run sometimes.

    It would be very helpful, if you can publish something relating to EAR Update i.e what are the tables that are updated while updating the EAR, what are the classes etc. The data model for EARUpdate

    Thanks

    ReplyDelete
  3. I'm glad to see that you find this blog interesting, I have added your request in my queue of TODO's for next month.

    ReplyDelete
  4. Mr Anonymous, I don't think it is going to be easy task to migrate your BODL code to Dataload, architecturally they are similar but I'm still not able to figure out how WCS 7 Dataload supports transformation logic for dataload.

    ReplyDelete
  5. Hi Hari, I just read through your blogs recently, many topics are very informative especially for people like me who is working in WCS. I am currently working in WCS V7. I would like know whether there is any OOB option available in WCS to create multiple wish list for registered customer.
    By seeing the DB schema, it seems to be possible to have multiple wish list. IITEMLIST,IITEM and CITEMLIST are the tables.
    Any help is much appreciated.

    ReplyDelete
  6. I am currently working in WCS V7.I am trying to load data into two tables (CATENTRY and CATENTDESC) by running BODL(data load utility). I have batchSQL enabled. It is trying to do the load in batch SQL process on tables CATENTRY and CATENTDESC. It is loading data successful.

    I am trying to do the load in batch SQL process on other table CATENTDESC. It is loading data successful. But I am trying to check data in the SHORTDESCRIPTION column and it is not loading data in these two fields (SHORTDESCRIPTION and LONGDESCRIPTION). Any help is much appreciated.

    Thanks
    karthik

    ReplyDelete
  7. Hi i got trained for basics things in WCS for ten days and need to take up WCS6 certification.. What are the ways i need to prepare myself for this exam...

    ReplyDelete
  8. Thanks for sharing your post and it was superb .I would like to hear more from you in future too.

    ReplyDelete
  9. Funny how this gets to be a thread dump on a bunch of different topics. For a more wide distribution I would put your post into the WebSphere newsgroup. BODL, Dataload, Massload
    are all flavors of the same loading dilema that has been plaguing people loading for years. BODL is actually better in some ways and then still has the main achilles heel of requiring you to program a solution. This is why I wrote DynaLoad years ago to do this and it still keeps paying for itself on our services projects. DynaLoad uses properties files and placeholders to allow you to load data into specific fields in the stock schema as well as custom tables.

    The key to most of this is that BODL is completely unsupported by IBM. So if you use BODL without IBM services, when/if you accidentally nuke your data, you're on your own.

    Dataload is the wrap that was done for the catalog upload feature using CSV, but both this, massload and BODL all have to be used in such a way as to handle any special characters and that tied into the display JSP that you are outputting to make sure that characters that are supposed to be in the output are not html-encoded and vice versa.

    Converting from dataload, massload and BODL are all time intensive processes. One of the other typically major shortcomings of most of this is that it does not use the native dataloading utilities. Another reason to consider what it is you have to load. If you have low volume and simple datasets any of the three will work, but BODL will be the most complicated to implement (java coding). If your using high volume datasets then you probaly want to use a combination, and maybe even something else like writing to the Oracle Data Pump format, or into the DB2 load format.

    ReplyDelete
  10. Well said George, and I have seen your DynaLoad logic in previous implementation, I think one thing that just doesn't work with BODL, Massload, Data loader etc... is that they are all java based.

    I hope IBM some day comes with other OOB alternates like they do for SiteMaps.. E.g. a scripting based dataload utility, it should also support some sort of orchestration built into it, to my knowledge we end up doing these with MQ, custom logic and what not...

    ReplyDelete
  11. Hi,

    I used the data load utility but catalog is showing in management center but not showing in my site...any solution

    ReplyDelete
    Replies
    1. you can check if those items are in published state and mapped to some navigation category on your site.

      Delete
  12. Below query is in hung state during data load.

    select USERS_ID from WCSRT.USERS a where ((FIELD3=:1 ) OR ((:"SYS_B_0" = :2) AND (FIELD3 IS NULL)));

    ReplyDelete
  13. I want to do Post-processing dataload audit and notification, once the OOTB data load are done then we have to check the data like price of the product if it is empty then create the list of the product and send mail to user with those items list. How we can do this. Any Idea...

    ReplyDelete
  14. Nicely synthesized overview of the debate thus far. Looking forward to seeing how the conversation evolves further from here. Some keen observations made and food for thought offered up. Not definitive by any means but an insightful contribution. clear tarpaulin

    ReplyDelete