Broadleaf Microservices
  • v1.0.0-latest-prod

Import

Overview

This service includes out-of-box import implementations.

Complete Product Import

(since 1.8.1)

High Level Overview

The import type is PRODUCT, and the corresponding specification in ImportServices is CompleteProductImportSpecification.

  • This is intended to be used for creating and updating products in the data store.

    • This also includes creates/updates of related entities such as variants, category-product relationships, assets, and translations

  • The syntax is simple and opinionated.

    • This import does not capture the complete product domain - only the most commonly useful fields.

    • The import only supports the creation of 'standard' and 'variant-based' products.

    • This import does not require 'correlation IDs' or 'operation types'.

  • Each product row must provide either an 'external ID' or a 'resource tier ID' (the context ID), which helps uniquely identify the product record and determine its existence in the data store (this ensures accurate 'create' vs 'update' operation type determination). Similarly, for variants, either a 'SKU', 'external ID', or 'resource tier ID' must be provided.

Key Components

  • CompleteProductImportBatchHandler is the central component responsible for processing batches and persisting the necessary products, variants, category-products, assets, and translations.

  • ProductRowConverter is responsible for converting product rows into Product instances.

  • VariantRowConverter is responsible for converting variant rows into Variant instances.

  • CategoryProductColumnConverter is responsible for converting category-related columns from the product rows into CategoryProduct instances.

  • ProductImageColumnConverter is responsible for converting asset-related columns from the product rows into ProductAsset instances.

  • CatalogTranslationColumnConverter is responsible for converting translation-related columns from the product and variant rows into Translation instances.

Field Syntax and Processing

Simple Fields

Most fields are simple, and can fit into the 'one cell, one value' pattern. For these, the row converter components typically rely on Jackson’s ObjectMapper to reflectively set values on the POJO instance.

The expectation is for the specification to map the user-friendly header names into field names matching the POJO field name (ex: Namename to match the property on Product). This allows the reflective mapping to work correctly.

Complex Fields

There are a handful of fields that are too complex to fit neatly into the 'one cell, one value' pattern. For such fields, the row converter components expect a certain structure with special delimiters to define complex values in a cell.

Note
See complex collections update semantics for more on updates.

Collections

  • Product Keywords

    • All keywords should be provided in the cell, with | used as the separator between keywords.

    • Ex: keyword 1|keyword 2

  • Product and Variant Attributes

    • Attributes should be provided in the cell, with :: used as the value assignment separator, and | used as the separator between attribute entries.

    • The keys are the keys in the attributes map, and the values are the value field on Attribute.

    • Ex: exampleAttribute1::Attribute 1 Value|exampleAttribute2::Attribute 2 Value

  • Product and Variant Fulfillment Flat Rates

    • All fulfillment flat rates should be provided in the cell, with :: used as the value assignment separator, and | used as the separator between fulfillment flat rate entries.

    • The keys are the fulfillment flat rate options, and the value should be the corresponding monetary value of the option.

    • Ex: FIXED_STANDARD::1.11|FIXED_PRIORITY::2.12|FIXED_EXPRESS::3.13

  • Product Options and Variant Option Values

    • The product options column value is different for product rows and variant rows, but their values are ultimately tightly correlated.

      Note
      This tight coupling results in update restrictions.

      The options column on product rows will define all the ProductOption instances on the parent product. The data provided here will result in the creation of 'variant distinguishing' options, each with an AttributeChoice that sets its allowed values from all the matching entries provided in the variant rows' options column values. The options column on variant rows will contribute to the parent product option’s allowed values, and also define the option values on the variant itself.

    • In product rows, each individual 'option token' should be separated by |.

    • Each individual 'option token' provided in product rows supports a handful of properties, and some are optional with defaults.

      • Within a particular individual option, :: is used as the value assignment separator for properties and their values, and ; is used as the property separator between properties.

      • Allowed properties:

        • label → this is the label used for the product option itself.

        • required → whether the attribute choice should be marked required

        • type → the type to set on the attribute choice

        • (since 1.8.5) attributeName → the name to set on the attribute choice

          Note
          Before 1.8.5, the attribute choice name was taken directly from label
      • Example individual option values:

        • label::Shirt Size;required::true;type::Size;attributeName::size

          • All properties explicitly provided

        • label::Shirt Size;required::true;type::Size

          • Implies attributeName from label`

        • label::Shirt Size;type::Size;attributeName::size

          • Implies required is false

        • label::Size

          • Implies required is false, and attributeName will be implied from label. Furthermore, since the effective attributeName matches one of DefaultAttributeChoiceType, the value of type will also be set to match it.

        • Size

          • Implies required is false, and label and attributeName will be set to the given value. Furthermore, type will be set to match the effective attributeName since it matches one of DefaultAttributeChoiceType.

      • Note that type is only optional in situations where the given/effective attributeName value matches (ignoring case) one of DefaultAttributeChoiceType. This prevents inadvertent creation/usage of random types. If custom types need to be used, they should be explicitly provided.

    • In variant rows, each individual 'option value token' should be separated by |.

    • In variant rows, a few different syntaxes are accepted for each individual 'option value token'.

      • The simple syntax just has a single {key}::{value} pair. The key should match an attribute choice name from the corresponding product option, and the value should be set to the value for that attribute appropriate for the variant. In this case, the display label for the value will be the same as the value itself.

        • Example of an individual option value token with simple syntax: Shirt Color::Red

          • The referenced attribute choice name is Shirt Color, the choice value is Red, and the display label for that value will also be set to Red.

      • (since 1.8.3) The complex syntax requires supplying option value properties explicitly, with ; as the property separator between properties.

        • A few different properties must be supplied:

          • attributeName → (always required) this should refer to an attribute choice name from the parent product option

          • value → (always required) this should be the actual value for that attribute appropriate for the variant

          • valueLabel → (optional, defaulted to value) this should be the display label to use for the value

        • Example of an individual option value token with complex syntax: attributeName::Shirt Color;value::Red;valueLabel::Red (Deep Red)

      • Regardless of which syntax is used to define the option value token, the 'allowed values' on the parent product option will be built using the details supplied in the 'first encounter' of a particular variant option value.

        • These semantics have the advantage of not requiring every single variant to provide all the same details for a choice value. The details just need to be in the first use of that value.

        • For example, let’s say you had the following variant rows (in order)

          • variant 1: attributeName::Size;value::Small;valueLabel::Small (Ages 5-6)|attributeName::Color;value::Red;valueLabel::Red (Deep Red)

          • variant 2: Size::Small|Color::Black

          • variant 3: attributeName::Size;value::Medium;valueLabel::Medium (Ages 7-10)|Color::Red

          • variant 4: Size::Medium|Color::Black

        • This would result in the following allowed values on the parent option:

          • Allowed values of attribute Size

            • value: Small, label: Small (Ages 5-6) (first defined by variant 1)

            • value: Medium, label: Medium (Ages 7-10) (first defined by variant 3)

          • Allowed values of attribute Color

            • value: Red, label: Red (Deep Red) (first defined by variant 1)

            • value: Black, label: Black (first defined by variant 2)

    • Example 1

      • Product Row: label::Shirt Color;type::Color;required::true|label::Shirt Size;type::Size;required::true

      • Variant Row 1: attributeName::Shirt Color;value::Red;valueLabel::Red (Deep Red)|Shirt Size::Small

      • Variant Row 2: Shirt Color::Black|attributeName::Shirt Size;value::Medium;valueLabel::Medium (Ages 7-10)

    • Example 2

      • Product Row: label::Shirt Color;type::Color;required::true;attributeName::color|label::Shirt Size;type::Size;required::true;attributeName::size

      • Variant Row 1: attributeName::color;value::red;valueLabel::Red (Deep Red)|Shirt Size::Small

      • Variant Row 2: Shirt Color::Black|attributeName::size;value::medium;valueLabel::Medium (Ages 7-10)

  • Product and Variant Translations

    • Translations are defined as columns matching the following syntax in the input file: {headerName} TN:{locale}. For example: Name TN:es.

    • After the header-field mapping is completed, the field names will look like translation::{fieldName}::{locale}. For example: translation::name::es.

    • The cell value should just contain the translated value for that particular field

    • See TranslationDynamicHeaderFieldMapping in Import Services, and CatalogTranslationColumnConverter in Catalog Services

  • Category Product Relationships

    • Category relationships are defined under the category::1, category::2, category::3 columns

    • The category::1 value will be set as the primary category-product relationship

    • Cell values should just contain the names of the categories with which to establish relationships

  • Product Images

    • The assumption is that the values provided here reference an image-type asset that has been uploaded to Broadleaf’s Asset Service in a tenant/application matching the importing context.

    • Product images are defined under the image::1, image::2, image::3 columns

    • The image::1 value will be set as the primary image

    • The cell value can contain various properties, where :: is used as the value assignment separator, and ; is used as the separator between properties.

      • Allowed properties:

        • url → this is the url of the referenced asset

        • tags → tags to set on the product asset

          • The value should be a comma-separated list of tags (ex: tag1,tag2).

        • title

        • altText

      • Examples

        • /my-asset-url.jpg

          • url will be set to the given value, no other properties set

        • title::Bottle of Sweet Death;url::/Sweet-Death-Sauce-Bottle.jpg;altText::Sweet Death Sauce;tags::foo,bar,baz

          • All provided properties will be set to the provided values

Creates and Updates

The import implementation allows creates and updates of products and their related entities. It does not delete any entities.

By default, the CompleteProductImportSpecification designates all fields required for new entity creation as required. This means even for 'updates', those fields must be provided.

Note

See the update implementation example for an alternative.

As discussed elsewhere, identifier fields are expected to be provided on each product/variant row. CompleteProductImportBatchHandler eagerly bulk-fetches entities by these identifiers, and the converters subsequently use the results to determine whether a particular row should be treated as a 'create' (entity was not found in the datastore by any of the provided identifiers) or an 'update' (entity was found in the datastore by provided identifiers). To prevent inadvertent creates/updates, it is critical to ensure any provided identifier values are accurate.

Update Considerations

The 'Complete Product Import' specification and handler implementation are intended to support simple update cases, and come with restrictions. If a more complex update use-case is required, it’s recommended to create a dedicated new import specification/handler for that purpose.

Update Semantics
Complex Collections

Product and variant both have complex collection fields, and update semantics for them vary in the import mechanism.

  • Product Keywords

    • If supplied, all existing keywords will be replaced with the input.

    • If not supplied, existing keywords will not be changed.

  • Product and Variant Attributes

    • Attributes follow granular update semantics, and are not fully replaced as a whole. For each input attribute, if the key matches an existing attribute, the value will be updated in-place for that specific attribute. If an input attribute key does not match an existing attribute, it will be newly added to the attributes.

    • If no values are supplied, existing data will not be changed.

  • Product and Variant Fulfillment Flat Rates

    • If supplied, all existing fulfillment flat rates will be replaced with the input.

    • If not supplied, existing fulfillment flat rates will not be changed.

  • Product Options and Variant Option Values

    • If supplied, all existing product options and variant option values will be replaced with the input.

    • If not supplied, existing product options and variant option values will not be changed.

  • Product and Variant Translations

    • Translations are uniquely identified for an entity by field and locale.

  • Category Product Relationships

    • Uniquely identified by category ID and product ID.

    • If primary is determined to have changed based on the category::1 column, the previous primary will be unset.

  • Product Images

    • Uniquely identified by URL.

    • If primary is determined to have changed based on the image::1 column, the previous primary will be unset.

Update Restrictions
  • Product Options and Variant Option Values are closely related and the converter/handler implementations have auto-parsing and generation logic to process them.

    • For each product option, the allowed values for that option are set only from the variant rows provided in the input file. This means if a variant row is omitted during an update, its option values will not be present, and the parent product option’s allowed values will not include them.

      • Thus, ensure all variant rows are included in updates when values are provided in the product’s product options column.

      • If just intending to update a handful of fields on an existing product/variant, ensure the product options column is empty on the parent product. The product options column is always required on variant rows (since that’s the majority use-case), but setting it to match existing values should suffice. This way, existing option settings are not changed, and other fields can be freely updated.

Example Update Implementation with Shared Handler

Implementations who want to support a granular update use-case (ex: update only two fields) can consider introducing a new, dedicated import type and specification requiring just those specific fields (and identifier fields), and then extend/override CompleteProductImportBatchHandler to additionally declare support for that new import type. This way, the need to supply many otherwise unchanged fields is eliminated, and the handler implementation is relatively unchanged. In many cases it can help bypass the update restrictions.

Note
In more complex cases, it may be preferable to introduce a new handler implementation altogether for this purpose.

In this example, we’ll create a simple new import that allows updating product 'name' and 'is online'.

Define a new specification

In ImportServices, define your new import specification.

Specification
import org.apache.commons.lang3.StringUtils;

import com.broadleafcommerce.dataimport.domain.ImportFieldConfig;
import com.broadleafcommerce.dataimport.service.normalizer.ImportDataNormalizer;
import com.broadleafcommerce.dataimport.service.validation.BooleanValidator;

import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import lombok.Getter;

public class UpdateProductNameAndOnlineSpecification extends DefaultSpecification
        implements GlobalImportSpecification {

    private static String UPDATE_PRODUCT_NAME_AND_ONLINE_IMPORT_TYPE =
            "UPDATE_PRODUCT_NAME_AND_ONLINE";
    private static String UPDATE_PRODUCT_NAME_AND_ONLINE_SPECIFICATION_NAME =
            "Update Product Name and Online";
    private static String PRODUCT_ROW_TYPE = "PRODUCT";

    @Getter(onMethod_ = @Override)
    private final List<ImportDataNormalizer> importDataNormalizers;

    public UpdateProductNameAndOnlineSpecification(List<ImportDataNormalizer> normalizers,
            List<String> requiredAuthorities,
            List<String> requiredScopes) {
        super(UPDATE_PRODUCT_NAME_AND_ONLINE_IMPORT_TYPE,
                requiredAuthorities,
                requiredScopes,
                UPDATE_PRODUCT_NAME_AND_ONLINE_SPECIFICATION_NAME);
        this.importDataNormalizers = normalizers;
    }

    @Override
    public boolean canHandle(String importType) {
        return StringUtils.equals(importType, UPDATE_PRODUCT_NAME_AND_ONLINE_IMPORT_TYPE);
    }

    @Override
    public String getMainRecordType() {
        /*
         * This defaults to match the import type, but in our case, the row type is different from
         * the import type, so we must override this value.
         */
        return PRODUCT_ROW_TYPE;
    }

    @Override
    public boolean isCatalogDiscriminated() {
        /*
         * The entities we're dealing with are catalog-discriminated, so we indicate this to ensure
         * correct context information is available.
         */
        return true;
    }

    @Override
    public boolean isSandboxDiscriminated() {
        /*
         * The entities we're dealing with are sandbox-discriminated, so we indicate this to ensure
         * correct context information is available.
         */
        return true;
    }

    @Override
    public boolean shouldAutoGenerateOperationTypeForEachRecord(String rowType) {
        /*
         * Since we're requiring an external ID to be provided, the resource tier handler will be
         * able to check for each record's existence in the datastore and determine whether it needs
         * to be created or updated. The import service doesn't have to do anything eagerly.
         */
        return false;
    }

    @Override
    public boolean shouldAutoGenerateResourceTierIdForEachRecord(String rowType) {
        /*
         * Since we're requiring an external ID to be provided, 'resource tier ID' becomes
         * irrelevant and unnecessary to deal with. We already have a mechanism to uniquely identify
         * a record, so the import service does not need to eagerly generate or deal with resource
         * tier IDs.
         */
        return false;
    }

    @Override
    public boolean shouldAllowUnmappedHeaders(String rowType) {
        /*
         * We only want to honor headers that we've explicitly defined mappings for, and ignore any
         * columns we don't recognize.
         */
        return false;
    }

    @Override
    protected void populateHeaderFieldConfigsByRowType(
            Map<String, Map<String, ImportFieldConfig>> headerFieldConfigsByRowType) {
        /*
         * Since we don't have more than one row type in this import, we just place all field
         * configurations under the 'main' record type, which in this case is product. This is
         * particularly important in the case where no row type column is provided in the input
         * file.
         */
        headerFieldConfigsByRowType.put(getMainRecordType(), fieldConfigurationsForProductRow());
    }

    private Map<String, ImportFieldConfig> fieldConfigurationsForProductRow() {
        /*
         * Using a LinkedHashMap provides some consistency in ordering semantics, which can be
         * convenient in some cases.
         */
        Map<String, ImportFieldConfig> fieldConfigurationByHeader = new LinkedHashMap<>();
        fieldConfigurationByHeader.put("External ID", new ImportFieldConfig("externalId", true));
        fieldConfigurationByHeader.put("Name", new ImportFieldConfig("name", false));
        fieldConfigurationByHeader.put("Is Online", new ImportFieldConfig("online", false)
                .withValidator(new BooleanValidator()));
        return fieldConfigurationByHeader;
    }
}
Tip
Remember to register this as a bean!
Define a new ImportBatchIndexableTypeMapping

(since ImportServices 1.8.2)

In ImportServices, to ensure re-indexing occurs as expected, you will need to register a new ImportBatchIndexableTypeMapping from your new import type (UPDATE_PRODUCT_NAME_AND_ONLINE) to the PRODUCT batch indexable type as discussed in this documentation.

Example
import com.broadleafcommerce.data.tracking.core.messaging.search.BatchIndexableType;
import com.broadleafcommerce.dataimport.domain.Import;

import org.springframework.lang.Nullable;


public class UpdateProductNameAndOnlineBatchIndexableTypeMapping
        implements ImportBatchIndexableTypeMapping {

    @Override
    @Nullable
    public String getBatchIndexableType(Import importEntity) {
        if ("UPDATE_PRODUCT_NAME_AND_ONLINE".equals(importEntity.getType())) {
            return BatchIndexableType.PRODUCT.name();
        }
        return null;
    }
}
Tip
Remember to register this as a bean!
Define an example for the new import

In import services, introduce an example import file for users to reference when using the new implementation.

In this example, the file will look something like this.

Table 1. Example File
External ID Name Is Online

prodExtId1

Update Name 1

false

prodExtId2

Update Name 2

true

Extend the existing handler implementation

In the resource-tier service, introduce the extension of CompleteProductImportBatchHandler which will declare support for the new type.

Extended Import Batch Handler
import static com.broadleafcommerce.catalog.provider.RouteConstants.Persistence.CATALOG_ROUTE_KEY;

import com.broadleafcommerce.catalog.dataimport.converter.CatalogTranslationColumnConverter;
import com.broadleafcommerce.catalog.dataimport.converter.CategoryProductColumnConverter;
import com.broadleafcommerce.catalog.dataimport.converter.ProductImageColumnConverter;
import com.broadleafcommerce.catalog.dataimport.converter.ProductRowConverter;
import com.broadleafcommerce.catalog.dataimport.converter.VariantRowConverter;
import com.broadleafcommerce.catalog.domain.CategoryProduct;
import com.broadleafcommerce.catalog.domain.asset.ProductAsset;
import com.broadleafcommerce.catalog.domain.category.Category;
import com.broadleafcommerce.catalog.domain.product.Product;
import com.broadleafcommerce.catalog.domain.product.Variant;
import com.broadleafcommerce.catalog.service.CategoryProductService;
import com.broadleafcommerce.catalog.service.CategoryService;
import com.broadleafcommerce.catalog.service.asset.ProductAssetService;
import com.broadleafcommerce.catalog.service.product.ProductService;
import com.broadleafcommerce.catalog.service.product.VariantService;
import com.broadleafcommerce.common.extension.data.DataRouteByKey;
import com.broadleafcommerce.data.tracking.core.web.ContextRequestHydrator;
import com.broadleafcommerce.translation.domain.Translation;
import com.broadleafcommerce.translation.service.TranslationEntityService;

import java.util.HashSet;
import java.util.Set;


@DataRouteByKey(CATALOG_ROUTE_KEY)
public class ExtendedCompleteProductImportBatchHandler extends CompleteProductImportBatchHandler {

    public ExtendedCompleteProductImportBatchHandler(
            ProductRowConverter productConverter,
            CategoryProductColumnConverter categoryProductColumnConverter,
            ProductImageColumnConverter productImageColumnConverter,
            VariantRowConverter variantConverter,
            CatalogTranslationColumnConverter catalogTranslationColumnConverter,
            ProductService<Product> productService,
            VariantService<Variant> variantService,
            ProductAssetService<ProductAsset> productAssetService,
            CategoryService<Category> categoryService,
            CategoryProductService<CategoryProduct> categoryProductService,
            TranslationEntityService<Translation> translationEntityService,
            int batchSize,
            ContextRequestHydrator hydrator) {
        super(productConverter, categoryProductColumnConverter, productImageColumnConverter,
                variantConverter, catalogTranslationColumnConverter, productService,
                variantService, productAssetService, categoryService, categoryProductService,
                translationEntityService, batchSize, hydrator);
    }

    @Override
    protected Set<String> getSupportedImportTypes() {
        Set<String> supported = new HashSet<>(super.getSupportedImportTypes());
        supported.add("UPDATE_PRODUCT_NAME_AND_ONLINE");
        return supported;
    }
}
Tip
Remember to register this as a bean!

(Deprecated) Product Import

Warning
The documentation in this section as well as the corresponding functionality have been deprecated as of 1.8.1 in favor of the Complete Product Import.

Conversion to Java POJOs

There are several out-of-the-box converters that have been implemented in order to transform a BatchRecord containing a Map<String, String> from a file into a Java POJO. These are all found in the com.broadleafcommerce.catalog.dataimport.converter package, and they implement Converter<BatchRecord, [Java POJO]>.

Note

Provided out-of-box are the following converters for their respective entities:

  • AttributeChoiceValueConverter

  • CategoryProductConverter

  • IncludedProductConverter

  • ProductConverter

  • ProductAssetConverter

  • ProductOptionConverter

  • SpecificItemChoiceConverter

  • VariantConverter

Implementation Details

Attributes

Product attributes are an example of a special-case field that is imported from a multi-valued cell. Out of the box, the ProductConverter will expect the entire Map<String, Attribute> to be found in a single cell under the attributes header. Each key/value pair must be separated by a :, and each entry must be separated by a |. For example, a simple input could be this: attrKey1:attrVal1|attrKey2:attrVal2. That translates to a Map<String, Attribute> with two entries, one with a key attrKey and value attrVal1, and one with a key attrKey2 and value attrVal2.

Out-of-the-box, the converter will only interpret values directly as strings. There is no support to import attribute values as any other type. The only exception to this rule is if the value found is a comma-separated string.

For example, this could be a potential input:

singularAttr:singularVal|collectionAttr:collectionElem1,collectionElem2,collectionElem3|arrayAttr:arrayElem1,arrayElem2,arrayElem3
  • For the singularAttr attribute, the converter will create a String attribute value equalling singularVal.

  • For the collectionAttr attribute, the converter will create a List<Object> attribute value containing the elements collectionElem1, collectionElem2, and collectionElem3.

  • For the arrayAttr attribute, the converter will create a List<Object> attribute value containing the elements arrayElem1, arrayElem2, and arrayElem3.

Note

In all cases, the collection elements themselves will each still just be interpreted as a String. The converter will always create a List for any collection values: It will not create any other type like Set. As a result of this syntax, commas are treated as a special character, so the values supplied for attributes cannot contain commas.

Customizing this behavior is quite simple. You may extend ProductConverter and override the #removeAndParseAttributes() method to change how the attributes are parsed.

Tip
If you plan to import files that you export, the export logic should also be adjusted to support producing your custom syntax/format.

Performance

  • These tests have been performed on a 2017 15" MacBook Pro with 16GB RAM and a 2.8GHz Intel Core i7 with 4 cores.

  • The data store used was MongoDB version 3.6.5.

  • The message broker used was Kafka version 5.3.1.

  • The batch size used in all trials was 100, meaning the import was broken into batches consisting of 100 products and all of their category relationships, variants, assets, options and included products. Each batch was parsed and sent to Kafka as a single message. As a result, for each message it consumed, the resource-tier service (Catalog Services) would process 100 products and all of their category relationships, variants, assets, options and included products.

    • In order to support sending and receiving all of these records within a single message, the message size properties were adjusted to allow messages up to 5MB in size.

      • Kafka Properties

        KAFKA_REPLICA_FETCH_MAX_BYTES: 5242880 # 5 megabytes
        KAFKA_MESSAGE_MAX_BYTES: 5242880 # 5 megabytes
        KAFKA_CONSUMER_MAX_PARTITION_FETCH_BYTES: 5242880 # 5 megabytes
        CONNECT_CONSUMER_MAX_PARTITION_FETCH_BYTES: 5242880 # 5 megabytes
      • Import Services Properties

        spring:
          cloud:
            stream:
              kafka:
                bindings:
                  batchRequestOutput:
                    producer:
                      configuration:
                        max:
                          request:
                            size: 5242880 # 5 megabytes
                        compression:
                          type: gzip
      • Catalog Services Properties

        spring:
          cloud:
            stream:
              kafka:
                bindings:
                  batchCompletionOutput:
                    producer:
                      configuration:
                        max:
                          request:
                            size: 5242880 # 5 megabytes
                        compression:
                          type: gzip
Isolated Imports with Varying Concurrency and Varying Number of Variants, Assets, and Category Relationships
  • These are trials of 500-product imports, with varying numbers of variants, assets, and category relationships (aka "dependent items") per product. For each trial, the import was run in isolation, meaning no other imports were run at the same time.

  • Another variable in these trials was the message consumer concurrency property. More specifically, these properties:

    • spring.cloud.stream.bindings.batchRequestInput.consumer.concurrency in CatalogServices

    • spring.cloud.stream.bindings.batchCompletionInput.consumer.concurrency in ImportServices

  • For each concurrency and dependent-count combination, an import consisting entirely of creates was run, followed by an import updating all of those records, and then finally an import consisting of half-updates and half-creates.

  • All trials were run directly from the commandline with mvn spring-boot:run, using these JVM arguments

    <boot.jvm.args>-Xdebug
      -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=${debug.port}
      -Xmx1536m <!--(1)-->
      -verbose:gc <!--(2)-->
    </boot.jvm.args>
    1. Limits maximum heap size to 1.5GB

    2. Enables verbose garbage collection logging

Summary

Increasing concurrency greatly improves performance, with the optimal setting being 5. A concurrency of 10 was tested for the 500-product-34-dependents-each imports, but there was no improvement in performance.

For the imports of 500 products each with 34 "dependent items", at a concurrency of 5, the pace averaged out to 500 products in 168 seconds. This comes out to 374,976 total records per hour (10,713.6 products).

Detailed Results
Table 2. Isolated Imports of 500 Products with 34 Dependents Each (run without debugger)
Trial Name Concurrency Total Time Taken Number of Products per Import Number of Variants per Product Number of Category Products per Product Number of ProductAssets per Product

500 Products, 34 Dependents Each, All Creates

1

438 seconds

500

20

4

10

500 Products, 34 Dependents Each, All Creates

5

156 seconds

500

20

4

10

500 Products, 34 Dependents Each, All Creates

10

176 seconds

500

20

4

10

500 Products, 34 Dependents Each, All Updates

1

510 seconds

500

20

4

10

500 Products, 34 Dependents Each, All Updates

5

173 seconds

500

20

4

10

500 Products, 34 Dependents Each, All Updates

10

177 seconds

500

20

4

10

500 Products, 34 Dependents Each, Half-Creates Half-Updates

1

443 seconds

500

20

4

10

500 Products, 34 Dependents Each, Half-Creates Half-Updates

5

175 seconds

500

20

4

10

500 Products, 34 Dependents Each, Half-Creates Half-Updates

10

160 seconds

500

20

4

10

Table 3. Isolated Imports of 500 Products with 17 Dependents Each (run without debugger)
Trial Name Concurrency Total Time Taken Number of Products per Import Number of Variants per Product Number of Category Products per Product Number of ProductAssets per Product

500 Products, 17 Dependents Each, All Creates

1

247 seconds

500

10

2

5

500 Products, 17 Dependents Each, All Creates

5

81 seconds

500

10

2

5

500 Products, 17 Dependents Each, All Updates

1

259 seconds

500

10

2

5

500 Products, 17 Dependents Each, All Updates

5

90 seconds

500

10

2

5

500 Products, 17 Dependents Each, Half-Creates Half-Updates

1

245 seconds

500

10

2

5

500 Products, 17 Dependents Each, Half-Creates Half-Updates

5

91 seconds

500

10

2

5

Table 4. Isolated Imports of 500 Products with 8 Dependents Each (run without debugger)
Trial Name Concurrency Total Time Taken Number of Products per Import Number of Variants per Product Number of Category Products per Product Number of ProductAssets per Product

500 Products, 8 Dependents Each, All Creates

1

148 seconds

500

5

1

2

500 Products, 8 Dependents Each, All Creates

5

37 seconds

500

5

1

2

500 Products, 8 Dependents Each, All Updates

1

134 seconds

500

5

1

2

500 Products, 8 Dependents Each, All Updates

5

48 seconds

500

5

1

2

500 Products, 8 Dependents Each, Half-Creates Half-Updates

1

129 seconds

500

5

1

2

500 Products, 8 Dependents Each, Half-Creates Half-Updates

5

52 seconds

500

5

1

2

Table 5. Isolated Imports of 500 Products with 0 Dependents Each (run without debugger)
Trial Name Concurrency Total Time Taken Number of Products per Import Number of Variants per Product Number of Category Products per Product Number of ProductAssets per Product

500 Products, 0 Dependents Each, All Creates

1

14 seconds

500

0

0

0

500 Products, 0 Dependents Each, All Creates

5

8 seconds

500

0

0

0

500 Products, 0 Dependents Each, All Updates

1

16 seconds

500

0

0

0

500 Products, 0 Dependents Each, All Updates

5

8 seconds

500

0

0

0

500 Products, 0 Dependents Each, Half-Creates Half-Updates

1

16 seconds

500

0

0

0

500 Products, 0 Dependents Each, Half-Creates Half-Updates

5

8 seconds

500

0

0

0