When tuning a Dataflow, many developers are looking for that “Silver Bullet” – a quick and easy solution. My experience is that there are few “Silver Bullets” in Data Services and redesigning a Dataflow usually provides the best results. Effective tuning requires a data set and hardware that actually mimics production. Tuning on a development server with 5% of the production data, 8 GB of RAM, one CPU and maybe two cores won’t show you how a Dataflow that uses Degrees of Parallelism greater than one will actually run on a production server with eight CPUs. Developers have to take into account not only the hardware the job server is running on but the hardware the database server is running on. Spawning multiple threads on single CPU database server won’t have the same effects as on a multi-CPU database server.
If we can’t benchmark on data and hardware similar to production then we’ll need to have a better understanding of what the Dataflow is really doing and when certain features are going to be effective.
Number of Loaders
The Number of Loaders feature creates buffers for the database operations. The default is one, meaning that you have a single buffer for the rows going to the target table. The Dataflow writes to these buffers in a round robin manner. If you set up 5 loaders and the commit row size is 1,000 then 5,000 rows will be buffered up. Once the buffer is full it is flushed to the database. Because of the round robin allocation, all five buffers are sent to the database at almost the same time.
The benefit of five buffers is that five processes are writing rows to the database instead of one. The downside is that five processes are writing rows to the database. If your database can’t handle that much activity then your overall processing time could increase instead.
The buffers are being written to the database in parallel, as long as the Rows per commit setting is high enough.
To test this I created a table that has an insert trigger on it that sets an UPDATED_DT (timestamp(9) data type) and UPDATED_BY column from a trigger using SYSTIMESTAMP and SYS_CONTEXT(‘USER_ENV’,’SID’), respectively. The Dataflow target table has Number of Loaders set to 5 and the Rows per commit set to 1,000. I then processed 30,000 rows through the Dataflow.
After loading the data I queried the table, sorting by UPDATED_DT DESC. The UPDATED_BY column value was mixed. See the screenshot below. The results tell me that each of the five buffers wrote rows to the database in parallel. If the five buffers were running in series then the UPDATED_BY values would be the same for each set of 1,000 rows in the result set.
Interestingly enough, I found that the Rows per commit setting affected how the buffers were written to the database. If the setting was only 100 rows then the buffers were written in series. The same setup used above was used to test the lower Rows per commit value.
Some tables are just slow to insert. This could be due to triggers, a large number of indexes or very wide rows. Using multiple parallel processes to perform the inserts may improve the overall execution time in these cases.
Number of Loaders will work for all three row operation types: insert, update and delete.
Certain table designs may not lend themselves to multiple loaders. Clustered indexes in Microsoft SQL Server and Index Organized Tables (IOT) in Oracle are one example. Within the table the rows are physically sorted in the order of the primary key. If the data is inserted out of order then the database has to bounce between pages/blocks and may perform a number of page/block splits to insert data. On the other hand, if the data is sorted in the Dataflow to match the primary key of the target table and a single loader is used then the database won’t have to do anything special with the data. It will be inserted in the same order it is received.
Certain index types could prevent you from using Number of Loaders greater than one. Oracle bitmap indexes can be very finicky about being updated by parallel processes. In a high transaction load environment I saw deadlocks on bitmap indexes when the Number of Loaders was more than one. Regular b-tree indexes worked fine.
The Number of Loaders feature is turned off if the Bulk Loader option is turned on. It also doesn’t work if the full operation of the Dataflow is pushed down to the database.
When the Number of Loaders is increased from the default of one remember that each additional buffer increases the amount of RAM used by the Dataflow. If a single loader uses a commit setting of 5,000 then consider reducing the Rows per commit setting by the Number of Loaders. If five loaders are used then consider changing the commit setting to 1,000. The Rows per commit setting can have a significant impact on performance. Be sure to test with both higher and lower values.
Number of Loaders is one of my “last resort” tuning features. Once I’ve done all I can to tune a Dataflow and it isn’t as fast as I need it to be I’ll begin testing different Number of Loader values. Because of the amount of test setup time required to run multiple tests this feature often isn’t worth the bang for the buck. There are many tuning areas that are more effective and I’ll try those first. It’s very rare that I’ll have to resort to setting Number of Loaders greater than one.
If I could redesign the Number of Loaders feature I would eliminate the round robin allocation. The Dataflow would fill buffer #1 and then flush the buffer to the database and then begin filling buffer #2. I suspect that the way it currently works, while the buffers are being flushed to the database the rest of the Dataflow is taking a nap. If buffers were filled one at a time then it might be possible to be writing to the database and keep processing incoming data at the same time. Only when all buffers are still flushing to the database would the Dataflow take a nap.