Product Export

Table of Contents

Overview
ProductExportProcessor
Export Row Converters
- Implementation Details
Performance
- Individual export
- Concurrent exports

Overview

The Catalog Service provides out-of-box product export facilities that can produce a flat file such as a CSV. This can be triggered through the Product Export Endpoint.

After the export is initiated, an event will be produced on the processExportRequestOutput message channel that can then be consumed by on the processExportRequestInput channel. ProcessExportRequestListener typically handles consuming these messages and then deferring processing the requests to the any configured ExportProcessors. In the case of a product export, this would be the ProductExportProcessor.

ProductExportProcessor

The ProductExportProcessor handles converting a Product into a flat file representation. It reads all the entities to be converted then defers the actual conversion into maps to ProductExportRowProducer. Each row is converted by an export row converter.

Export Row Converters

The export row converters handle converting a single entity instance into a map structure that represents a single row in a flat file. The mapping of entity fields to file headers are defined by ExportSpecifications. Thus, for each entity that should be included in the flat file should have both an ExportSpecification and an export row converter.

Note

There are several out-of-the-box converters that have been implemented in order to transform a Java POJO into a Map<String, String> that can be written as a row for a file. These are all found in the com.broadleafcommerce.catalog.dataexport.converter package, and they implement Converter<[POJO type], Map<String, String>>.

The out-of-box converters are (with respective specifications):

AttributeChoiceValueExportRowConverter
CategoryProductExportRowConverter
DimensionsExportRowConverter
IncludedProductExportRowConverter
ProductAssetExportRowConverter
ProductExportRowConverter
ProductOptionExportRowConverter
SpecificItemChoiceExportRowConverter
VariantExportRowConverter
WeightExportRowConverter

Implementation Details

Special Cases

While most fields follow the format of single-cell/single-value, there are certain "embedded collection" fields which are represented as a multi-valued cell. Generally speaking, multi-valued cells contain a string that is separated by special delimiters that define the boundaries between each collection element.

Attributes

Product attributes are an example of a special-case field that is exported into a multi-valued cell. Out of the box, the ProductExportRowConverter will export the entire Map<String, Attribute> map into a single cell. Each key/value pair is separated by a :, and each entry is separated by a |. For example, a simple output could be this: attrKey1:attrVal1|attrKey2:attrVal2.

If the value inside the Attribute is a Collection or an Array, the elements will be joined together with , as the delimiter. For example, this could be a potential output: singularAttr:singularVal|collectionAttr:collectionElem1,collectionElem2,collectionElem3|arrayAttr:arrayElem1,arrayElem2,arrayElem3.

Note

Out-of-the-box, only these two "multi-element" types are properly supported. Anything other than Collection and Array is not supported. Customizing this behavior is quite simple. You may extend ProductExportRowConverter and override the getMultiValRepresentationOfAttributes() method to change how the transformation occurs.

Tip	If you plan to import files that you export, the import logic should also be adjusted to support parsing your custom syntax/format.

Performance

We have done some performance testing on the product export process.

These tests have been performed on a 2017 15" MacBook Pro with 16GB RAM and a 2.8GHz Intel Core i7 with 4 cores.
Each product in the trials had 20 variants, 4 category-product relationships, and 10 product assets (34 total dependents per product).
The batch size used in all trials was 100, meaning 100 products would be processed at a time.

Individual export

Summary

For a 500 product export run via Docker, we found that the average time from 5 export attempts was 30.8 seconds. If run directly from the commandline via java -jar, the average time from 5 export attempts was 39.2 seconds.

Detailed Results

Varying numbers of products

Note	These trials were run with the debugger active.

Table 1. Trials with Varying Product Counts (run with debugger)
Name	Number of Products	Number of Variants per Product	Number of Category Products per Product	Number of ProductAssets per Product	Time Taken	Note
Export with 20 Products	20	20	4	10	3 seconds	Data insert of the items occurred at the beginning of this run.
Export with 100 Products	100	20	4	10	13 seconds	Data insert of the items occurred at the beginning of this run.
Export with 500 Products Attempt 1	500	20	4	10	84 seconds	Data insert of the items occurred at the beginning of this run.
Export with 500 Products Attempt 2	500	20	4	10	81 seconds	Uses the data already inserted in the first 500 product run.
Export with 500 Products Attempt 3 (ran outside of debugger)	500	20	4	10	65 seconds	Uses the data already inserted in the first 500 product run. The dashed line in this graph is also different, as it indicates when the export finished, not when it started.

500 product export

Note	These trials were run without any debugger active, and were all executed on the exact same data set.

Table 2. Individual 500 Product Exports (run without debugger)
Name	Number of Products	Number of Variants per Product	Number of Category Products per Product	Number of ProductAssets per Product	Time Taken	Note
Export of 500 Products, Run in Docker, Attempt 1	500	20	4	10	47 seconds	Attempts 1-5 of "run in docker" were all run back to back.
Export of 500 Products, Run in Docker, Attempt 2	500	20	4	10	30 seconds	Attempts 1-5 of "run in docker" were all run back to back.
Export of 500 Products, Run in Docker, Attempt 3	500	20	4	10	25 seconds	Attempts 1-5 of "run in docker" were all run back to back.
Export of 500 Products, Run in Docker, Attempt 4	500	20	4	10	27 seconds	Attempts 1-5 of "run in docker" were all run back to back.
Export of 500 Products, Run in Docker, Attempt 5	500	20	4	10	25 seconds	Attempts 1-5 of "run in docker" were all run back to back.
Export of 500 Products, Run with "java -jar" from the commandline, Attempt 1	500	20	4	10	45 seconds	Attempts 1-5 of "run with java -jar" were all run back to back.
Export of 500 Products, Run with "java -jar" from the commandline, Attempt 2	500	20	4	10	38 seconds	Attempts 1-5 of "run with java -jar" were all run back to back.
Export of 500 Products, Run with "java -jar" from the commandline, Attempt 3	500	20	4	10	41 seconds	Attempts 1-5 of "run with java -jar" were all run back to back.
Export of 500 Products, Run with "java -jar" from the commandline, Attempt 4	500	20	4	10	36 seconds	Attempts 1-5 of "run with java -jar" were all run back to back.
Export of 500 Products, Run with "java -jar" from the commandline, Attempt 5	500	20	4	10	36 seconds	Attempts 1-5 of "run with java -jar" were all run back to back.

Concurrent exports

We ran some tests where 10 exports of 500 products were initiated at the same time. The variable in these trials was the spring.cloud.stream.bindings.process-export-request-input.consumer.concurrency property, which defined how many exports were processed simultaneously.

Note	These trials were run without any debugger active, and were all executed on the exact same data set.

Note	The `MaxHeapSize` limit was set to `-Xmx1536m` for all trials.

Summary

The optimal concurrency setting appears to be 5.

With concurrency of 5, all 10 exports completed in only 72% of the time taken by a concurrency of 2 when run in docker, at the cost of 25% more peak memory usage. When run directly from the commandline, the process completed in only 62% of the time taken by the concurrency of 2, at the cost of 27% more peak memory usage.

Increasing the concurrency to 10 had diminishing returns. All 10 exports completed in 98% (in docker) and 90% (directly from the commandline) of the time taken by a concurrency of 5. This was at the cost of 20% and 9% more memory usage, respectively.

Detailed Results

Table 3. Concurrent 500 Product Exports (run without debugger)
Trial Name	Total Time Taken	GCEasy Report	Number of Products per Export	Number of Variants per Product per Export	Number of Category Products per Product per Export	Number of ProductAssets per Product per Export
10 Exports Run in Docker, Concurrency of 2	184 seconds	Link	500	20	4	10
10 Exports Run in Docker, Concurrency of 5	133 seconds	Link	500	20	4	10
10 Exports Run in Docker, Concurrency of 10	131 seconds	Link	500	20	4	10
10 Exports Run with "java -jar" from the commandline, Concurrency of 2	251 seconds	Link	500	20	4	10
10 Exports Run with "java -jar" from the commandline, Concurrency of 5	155 seconds	Link	500	20	4	10
10 Exports Run with "java -jar" from the commandline, Concurrency of 10	140 seconds	Link	500	20	4	10