DBLookup Optimization - Now 700% Faster (version 3.2.90k)
By Michael Bock, Founder & CEO - October 13, 2014
We have been working hard to optimize our DBLookup formulas. They provide really good functionality and flexibility when you need to include data from related tables without much work, but they did slow down the execution of transfers to some extend.
DBLookup formulas were all executed in the central service (actually all formulas were only executed in the central service). If a RapidiConnector is used, then the central would ask the connector to retrieve the data from the database. It would do that for each field you retrieve from the database using a DBLookup formula (it would send a message to the RapidiConnector, and the connector would ask the database for the value, and send it back to the central).
This can be quite time consuming and makes these transfers relatively slow. Specifically with MS Dynamics AX2012 Standard transfers we need to use a lot of DBLookups because a lot of data resides in sub-tables (e.g. address fields are no more available on the Customer record itself, but you need to go through the Party table and get the data from related tables or views).
DBLookup formulas are cached (and have been all the time - nothing new there). The cache works so, that every time a field value is retrieved for some combination of table and key field/value, then this result is stored in cache and returned next time the same DBLookup is performed. Cache is on the Transfer level - i.e. when the Transfer has finished, the cache is deleted. The cache is good especially when you retrieve the same field values again and again - for example using DBLookup to get the Id for a salesperson/user and using that Customers or Invoices (asuming that you have many customers with the same Salesperson). But when retrieving for example several address fields for each Customer, a lot of time was spend in communication - we needed to do better.
Goals for the Optimization
We had some discussion on what we could do - there where a number of different suggestions ranging from manually (or automatically) building new Views in the database to read all data needed at the same time (which is a perfect solution for execution speed, but is time consuming for each installation and hard to maintain) or creating new ways to setup which fields to read and from where to read them (a new kind of Gather transfer). These solutions were all good, but they would also imply changes to all existing configurations for them to leverage the new features and implementers would have to learn new ways of doing things etc.
So in order to leverage our existing configurations and provide full value to both existing and new customers, we chose to do all the work in the background and just make our current flexible DBLookup formula much faster.
Our goals were:
Make DBLookup able to retrieve multiple field values in one request to the database (for each table and using the same keys/values). This will save a lot of communication between the Connector and the Customers database system when more fields are retrieved from the same table.
Make DBLookup execute on the RapidiConnector and return the result to the central server for use on the central server (saving a lot of communication between central and connector).
Everything should work without changing existing Transfers ! So that we get the benefits in an existing service by just upgrading the central service and the connector.
What has been changed
A number of things have been changed in the code to accomplish this optimization of the DBLookup formulas.
1) DBLookup now works more on a record level, than on the field level. It learns from the DBLookups that it executes and dynamically builds a list of fields that it needs to retrieve from a given table and with a given set of key fields. This means that if you have:
DBLookup(‘SOURCEDS’,’Contact’,’CustomerNo’,”No”,’FirstName’) and also
in you transfer, then DBLookup will learn (when the second formula is executed) that you are really interested in retrieving both FirstName and LastName from the Contact table. For all subseqvent lookups to the Contact table, it will start by retrieving both fields and keep them both in cache. When the second DBLookup then executes, it can get the LastName from cache and a request to the database is saved.
2) Formulas can now execute on the RapidiConnector and the result is sent to the central service along with the rest of the data retrieved from the Source database.
The RapidiConnector will execute all formulas specified in the Field List and Link List if it is told to do so (this is the default setting, but it can be disabled if needed). The result of the formula execution is stored as part of the record that is read from the Source and will just follow the record back to the central service without any extra communication. The central service will then use these formula result values instead of executing the formulas itself (as it did before).
Some formulas cannot (by nature) execute on the RapidiConnector. This would for example be the case for a DBLookup using DESTDS to retrieve a value from the Destination database/system. The RapidiConnector reading from the Source database/system, would not know how and is not connected to the Destination system and hence it cannot execute this formula. When this is the case, the formula result is marked as "not executed" and it is left to the central service to actually execute this formula at a later stage.
More than 700% Performance Gain in Real Life Scenarios
The two first significant changes to how formulas are executed and how DBLookup formulas cache their results are together giving some very significant performance gains. When you use DBLookup to retrieve 4-5 fields from a related table, we have for example seen an initial upload of a large number of records get approximately 7 times faster.
The performance gain that you will see in your service and with your setup depends of course on your transfer setup. But if you are using DBLookup to get additional data from your SOURCE, then you should see performance gains. If you use DBLookup to get more than one field from the same table and are reading data through the RapidiConnector, then the performance gains will be significant !
You need to upgrade to 3.2.90k on both central service and on RapidiConnector. Please contact our support to help you analyse the benefits and do the upgrade.
About the author
Michael Bock, Founder & CEO
Michael founded Rapidi on technological excellence, fantastic customer service and continuous improvement. A data integration specialist since 1987, he remains focused on creating technology that solves real business problems.