Broadleaf Microservices
  • v1.0.0-latest-prod

Broadleaf Data Feed Services

Important
Data Feeds functionality is in a pre-release stage and not generally available. The functionality as well as this documentation is subject to change.

Overview

Conceptually, data feeds are a mechanism through which data from one system is sent to another system.

In eCommerce applications, the primary use-case is for 'product data feeds', where the focus is on transferring product information. Common systems accepting product data feeds include price comparison engines and product ad platforms. For example, Google Merchant Center ingests product data feeds to list products across Google services.

Broadleaf Data Feed Services contains all key functionality required to successfully gather data from Broadleaf systems and publish it using data feeds to external systems.

Key Concepts and Components

Data Feed Process Types

There are two main types of data feed processes by default: FULL_AUTHORITATIVE and PARTIAL_UPDATE.

  • FULL_AUTHORITATIVE processes are responsible for uploading the full authoritative dataset of all entries that belong in a feed. Third parties typically refer to this type of feed as a 'full' or 'primary' feed, where the end-result of submission is a full replacement of the dataset such that it exactly matches the contents of the upload.

    For example, with a feed file approach, this typically means the destination will create any items from the file that it hasn’t seen before. If the destination currently has an item that is not present in the file, it will be deleted.

  • PARTIAL_UPDATE processes are exclusively responsible for updating existing entries in a data feed rather than creating or removing entries. Third parties typically refer to this type of feed as a 'supplementary' feed or an 'update' process. Typically, destinations allow higher frequencies of these types of updates as compared to FULL_AUTHORITATIVE.

    Different feed destinations have different limitations on what can be achieved with a partial update - some support 'upserts' and 'deletes' in supplementary feeds, while others don’t. However, almost all destinations do support updating fields on existing entities (ex: stock availability, price changes) with this mechanism.

    Note
    See the shared concept documentation for more on partial processes.

In Broadleaf, the overall feed process flow remains highly similar between full and partial jobs. The main difference lies in how partial update jobs determine which entities to include in the collection and publication process.

Job Management Concepts

Data feed jobs are comprised of a few different key concepts.

Tip
See the data model documentation for how these concepts are persisted in the data store.

DataFeedProcess

  • This is an admin-manageable 'template' configuration for a job that will perform a single query for data and then publish that data to one or more data feeds.

DataFeedProcessExecution

  • This tracks the lifecycle and state of a specific execution of a DataFeedProcess.

  • The system (not admin users) manages the entire lifecycle of these entities.

DataFeedPublication

  • This represents a 'sub-execution' within a DataFeedProcessExecution that is specifically interested in transforming data (which was already fetched in an earlier step of the execution) and publishing it for one specific DataFeedPublicationConfiguration.

  • The system (not admin users) manages the entire lifecycle of these entities.

DataFeedPublicationConfiguration

  • To publish data to a particular data feed (regardless of the destination), there are unique secret values (ex: API keys) and configuration that are required but cannot be persisted in a datastore. These need to be managed separately via application properties and environment variables by system administrators, and thus cannot be directly part of the admin-user configured/persisted DataFeedProcess entity.

    DataFeedPublicationConfiguration represents a non-persisted, identifiable abstraction of an individual data feed publication configuration. DataFeedProcess can subsequently reference specific configurations that will be included in its flow. The knowledge of how to use a particular DataFeedPublicationConfiguration requires a compatible DataFeedPublisher implementation.

  • Details on how these instances are constructed are provided in the publication configuration design docs.

  • It is important to note that despite the name, this is not related in any way to the Spring @Configuration concept.

IncrementalDataFeedEntityUpdateRecord

  • Each IncrementalDataFeedEntityUpdateRecord contains key reference details about an entity that was updated elsewhere in Broadleaf, and now deserves re-processing to keep data feed(s) up to date.

  • For each IncrementalDataFeedEntityUpdateEvent that is received from other Broadleaf microservices, an IncrementalDataFeedEntityUpdateRecord will be created for each DataFeedProcess of type PARTIAL_UPDATE who may be interested in processing it. This means each PARTIAL_UPDATE DataFeedProcess will effectively have its own 'queue' of update events that it can process whenever it is executed.

    Note
    See the shared concept documentation for a detailed look at how IncrementalDataFeedEntityUpdateRecord instances are created.
  • The system (not admin users) manages the entire lifecycle of these entities.

Data Feed Publication Configuration Design

The mechanism to establish DataFeedPublicationConfiguration instances is outlined below.

feed publication configuration design
  • Configuration properties under a known prefix are gathered to define behavior for a particular publication process

    • Example:

      • MerchantCenterSFTPConfigurationProperties

  • DataFeedPublicationConfigurationContributor beans transform gathered property values into DataFeedPublicationConfiguration instances

    • Example:

      • MerchantCenterSFTPPublicationConfigurationContributor, which produces MerchantCenterSFTPPublicationConfiguration instances

  • At startup, DataFeedPublicationConfigurationService injects and invokes all DataFeedPublicationConfigurationContributor components to collect all DataFeedPublicationConfiguration instances in one place for centralized access by other components

Processing Flow

High Level Flow Design

high level datafeeds flow design

Initiation

Creating Jobs

In the Broadleaf Admin, admin users can create new 'full' or 'partial' data feed processes under Processes > Data Feeds > Data Feed Processes.

Tip
If possible, try to maximize the number of publication configurations on each DataFeedProcess. For example, if you need to publish a primary feed for the same data from 'Application X' to both Google Merchant Center and Meta Commerce Manager, try to put both of those publication configurations on the same DataFeedProcess rather than creating separate DataFeedProcess instances for each. This way, the data collection process (potentially expensive) is only run once and the collected data can be shared by both publication steps.
Scheduling Executions

Executing any given DataFeedProcess is done through scheduled jobs owned by Scheduled-Job Services.

In the Broadleaf Admin, managing scheduled jobs for data feeds is made easy under Processes > Data Feeds > Data Feed Process Execution Schedules. Besides standard recurring execution, admins can also manually run a process by clicking the 3-dot icon in the top right corner and selecting 'Run Now'.

Starting Executions

  • Scheduled job trigger events are received by DataFeedExecutionRequestScheduledJobListener, which delegates to DataFeedProcessExecutionManager to request an execution for the specified DataFeedProcess.

    • For 'full' processes, DataFeedProcessExecutionManager will create a new DataFeedProcessExecution and emit a StartDataFeedProcessExecutionRequest message

    • For 'partial' processes, DataFeedProcessExecutionManager will create a new DataFeedProcessExecution. Then, it will gather all unacked IncrementalDataFeedEntityUpdateRecord entries for the specified process and mark them as part of the newly created execution (effectively 'acking' them). If at least one IncrementalDataFeedEntityUpdateRecord was successfully found and acked for this process, the DataFeedProcessExecution is advanced to the next stage and a StartDataFeedProcessExecutionRequest message is emitted

  • StartDataFeedProcessExecutionRequestListener in Data Feed Services itself will consume the emitted StartDataFeedProcessExecutionRequest message and delegate to DataFeedProcessExecutionManager to actually run the specified execution

Data Collection

Collecting data is the first step in running a DataFeedProcessExecution. The DataFeedProcessExecutionManager delegates to the first compatible DataFeedCollector component that can handle collection for the execution.

The goal of the DataFeedCollector is to gather data from Broadleaf systems, ultimately producing a file containing the collected data. Besides gathering entities of the base collected type, implementations of DataFeedCollector may further delegate to CollectedEntityContributor implementations to hydrate additional data onto each collected entity.

For example, ProductDataFeedCollector gathers products from CatalogServices, and then additionally hydrates inventory, pricing, and reviews information on each CollectedProduct through InventoryEntityContributor, PricingEntityContributor, and RatingsReviewsEntityContributor. Furthermore, since ProductDataFeedCollector is an extension of AbstractJsonArrayDataFeedCollector, the file it builds is ultimately just an array of JSON objects, with each element representing a CollectedProduct.

Note
The data collection step is the only time any querying or API requests should occur. Any data required by the transformation steps should be gathered and set onto each collected entity in this stage. The expectation is that transformation of any specific collected entity to an output representation only requires data already present on the collected entity.
Collection Differences in 'Full' and 'Partial' DataFeedProcesses

Between the 'full' and 'partial' process types, the overall collection/contribution mechanism is the same. The main difference lies in the determination of which base collected entities need to be fetched.

  • Full process executions are typically interested in gathering 'all data' within a particular application.

    • This is typically achieved by performing sorted, paginated requests for batches of the collected entity, then applying CollectedEntityContributors to those batches.

    • Each final hydrated batch is then serialized to the file.

  • Partial process executions are only interested in collecting specific entities referenced by the IncrementalDataFeedEntityUpdateRecords currently being processed.

    • This first requires gathering all the different IncrementalDataFeedEntityUpdateRecord references and building up a distinct set of entity IDs to process. For example, given a mix of SKUs, external IDs, and IDs, DataFeedServices should make requests as needed to ultimately end up with a final set of IDs that it will process.

    • Then, typically that ID set is broken into batches for collection. For example, DataFeedServices would send CatalogServices a request to get 25 products by ID. The received batch of collected products would then go through the CollectedEntityContributors for additional hydration.

    • Each final hydrated batch is then serialized to the file.

Collected Data Storage

The DataFeedCollector will usually prepare its output file in a temporary local filesystem directory.

After collection is complete, the DataFeedProcessExecutionManager copies the data from the file to a more long-lived, commonly accessible location via the StorageService. The default implementation of StorageService is JpaDataFeedFileLobStorageService, which leverages LOB storage in the database to hold the file data.

The 'location' of this data (by default, just the unique filename used by the StorageService for the file) is updated on the DataFeedProcessExecution for downstream consumption.

Data Publication

Publication encompasses the end-to-end flow of iterating over the collected entities, transforming them into the representation needed by the feed destination, and submitting the final result.

Requesting Publications

After collection is complete and the data file is properly stored, DataFeedProcessExecutionManager will engage the publication step. For better performance and status tracking, publication for each destination is broken out into its own sub-execution called a DataFeedPublication. DataFeedProcessExecutionManager creates a new DataFeedPublication and emits a StartDataFeedPublicationRequest for each DataFeedPublicationConfiguration referenced by the DataFeedProcessExecution. StartDataFeedPublicationRequestListener consumes each of these messages and delegates to DataFeedPublicationManager to run each publication. This has good workload distribution characteristics, as different instances of DataFeedServices can process the publications in parallel.

Running a Publication

DataFeedPublicationManager first reaches out to StorageService and reads the contents of the collected data file into a temporary file on the local filesystem. Afterwards, it delegates to the first compatible DataFeedPublisher implementation to actually execute the publication steps.

The goal of the DataFeedPublisher is to actually perform the end-to-end read-transform-submit flow.

Note
In some cases, publication for 'partial' and 'full' processes may be similar enough that the same DataFeedPublisher can handle both.
File Based Feeds

There are a variety of ways a feed destination may accept data, but one of the most common is through feed file uploads. With file-based approaches, the destination expects data to be serialized to a file (typically standard formats like CSV or XML) and then uploaded either through an API call or an SFTP transfer.

By default, Broadleaf DataFeedServices provides AbstractFileFeedPublisher as a robust starting point for file-based implementations. Internally, AbstractFileFeedPublisher delegates to a few other components:

  • DataFeedFileTransformer

    • Responsible for reading through the collected data file, transforming collected entities to the output representation, and serializing the result to a final output file.

    • By default, AbstractJsonArrayToCSVFileTransformer and AbstractJsonArrayToXMLFileTransformer provide useful starting points for transforming a JSON array collected data file into CSV and XML output, respectively.

      • Internally, each of these delegate to a few other components:

        • StreamingJsonArrayFileReader

          • A utility for streaming over the collected data file, deserializing the collected elements into POJOs in batches, and performing arbitrary work on each batch.

        • DataFeedElementTransformer

          Note
          In almost all cases, this is the component that will involve the most customization to match unique business requirements.
          • Responsible for converting a batch of collected entity POJOs into POJO elements of output data.

          • The expectation is for the output POJO to be defined and configured in a way that can easily be serialized to a file via utilities such as Jackson’s CsvMapper or XmlMapper. For example, this could mean using special Jackson annotations and mapper configuration to define property names and serialization behavior.

            • See MerchantCenterXMLProduct as an example of what’s done for Google Merchant Center.

      • After each batch of elements is successfully converted, it is immediately serialized to the output file.

  • DataFeedFileSubmitter

    • Responsible for taking a transformed output file and submitting it to the destination.

    • This typically involves gathering configuration details such as authentication credentials from the DataFeedPublicationConfiguration to prepare and execute the submission.

    • By default, AbstractSFTPSubmitter is provided as a useful starting point for implementations that need to use SFTP for file submission.

Completing a Publication

After finishing the transformation and submission steps, DataFeedPublisher will return back to the DataFeedPublicationManager. DataFeedPublicationManager will record the status and mark the corresponding DataFeedPublication as completed.

Execution Completion

After completing a DataFeedPublication, DataFeedPublicationManager checks to see if all publications for the associated DataFeedProcessExecution are also completed. If so, it delegates to DataFeedProcessExecutionManager to engage completion of the DataFeedProcessExecution.

First, the execution is marked as completed with any necessary status details. Then, if the execution was for a 'partial' process, any IncrementalDataFeedEntityUpdateRecords for that execution are hard-deleted to free space in the data store.

Default Integrations

Out of box, Data Feed Services comes with integrations to a few different platforms.

Google Merchant Center Product Feeds

DataFeedServices supports submitting XML-file-based product feeds over SFTP to Google Merchant Center for both primary and supplementary feeds.

Configuration Properties

Name Description Default

broadleaf.datafeed.feedprovider.google.sftp.enabled

Whether to enable Merchant Center related functionality.

false

broadleaf.datafeed.feedprovider.google.sftp.configs.{PUBLICATION_CONFIG_ID}.*

Properties configuring a specific Merchant Center publication configuration. Note that the publication configuration ID should always be all-caps (ex: 'MY_ID') to ensure Spring relaxed property binding sees the same value no matter where the properties are being sourced from. See MerchantCenterSFTPPublicationConfigurationProperties for more details about supported properties.

Feed Setup

  • Create your 'Upload' type primary feed (for 'full' processes) and/or supplementary feed (for 'partial processes') by following the Google feed creation documentation

    Note
    At this time, the official Google 'Regional product inventory feeds' functionality is not automatically supported out of box. (Customizations should be able to adjust functionality as needed.) By default, the expectation is for each Merchant Center 'sub account' to be for a specific locale, with its primary and supplementary feeds only having translations/pricing for that locale. For example, if you had data for Egypt and wanted to support English and Arabic, you would create 2 separate Merchant Center sub accounts - Egypt English and Egypt Arabic, and then primary and supplementary feeds for each sub account (4 total feeds).
    • Note down the unique filename you specified for this feed.

  • Obtain SFTP credentials as described in Google feed SFTP documentation

  • Establish a new feed publication configuration for each feed (ex: MY_MERCHANT_CENTER_FULL_CONFIG_ID) by setting properties under broadleaf.datafeed.feedprovider.google.sftp.configs.MY_MERCHANT_CENTER_FULL_CONFIG_ID.*

    • Provide the unique filename and SFTP credentials from previous steps

  • Create a new DataFeedProcess that targets your new feed publication configuration and execute it

Customization

If the default implementation doesn’t quite satisfy your business needs, you can customize the behavior.

In almost all cases, the only customization needed will be in the behavior of CollectedProductToMerchantCenterXMLProductTransformer, where Broadleaf collected entities are transformed into output elements. Please examine the logic of this component and override its behavior as needed.

Meta (Facebook) Commerce Manager Product Feeds

DataFeedServices supports submitting CSV-file-based product feeds through API requests to Meta Commerce Manager for both primary and supplementary feeds.

Configuration Properties

Name Description Default

broadleaf.datafeed.feedprovider.meta.api.enabled

Whether to enable Commerce Manager related functionality.

false

broadleaf.datafeed.feedprovider.meta.api.configs.{PUBLICATION_CONFIG_ID}.*

Properties configuring a specific Commerce Manager publication configuration. Note that the publication configuration ID should always be all-caps (ex: 'MY_ID') to ensure Spring relaxed property binding sees the same value no matter where the properties are being sourced from. See CommerceManagerFeedAPIPublicationConfigurationProperties for more details about supported properties.

Feed Setup

  • Create a new Facebook business account, following Facebook documentation

  • Create a new Facebook Commerce Manager account, following Facebook documentation

  • Create a new product catalog and corresponding feed configuration

    • Facebook catalog creation documentation

    • Facebook feed creation documentation

      Note
      Choose the 'Upload from computer' approach for creating feeds. DataFeedServices will be pushing data to Facebook with direct API upload requests - it will not be hosting feed files at a URL.
      Note
      At this time, the official Facebook 'Country and language feeds' functionality is not automatically supported out of box. (Customizations should be able to adjust functionality as needed.) By default, the expectation is for each Facebook product catalog to be for a specific locale, with its primary and supplementary feeds only having translations/pricing for that locale. For example, if you had data for Egypt and wanted to support English and Arabic, you would create 2 separate Facebook product catalogs - Egypt English and Egypt Arabic, and then primary and supplementary feeds for each catalog (4 total feeds).
    • After creating a feed, note down the data feed ID. This is available in the Commerce Manager UI under 'Catalog > Data Sources > Data feeds'.

  • Create a permanent, non-expiring access token with permissions capable of controlling the commerce catalog associated with the data feed. This is necessary for DataFeedServices to make API requests to upload feed files.

    • To create such an access token, you first create a Meta developer application and associate it with the business account. Then, in business settings, a System User must be created. Then, on the system user page, access tokens can be granted (option for permanent here) with the selected permissions. In this case, the catalog_management scope should be granted.

      Tip
      You can debug your access token and verify it is correct with the Facebook Access Token Debugger.
  • Establish a new feed publication configuration for each feed (ex: MY_COMMERCE_MANAGER_FULL_CONFIG_ID) by setting properties under broadleaf.datafeed.feedprovider.meta.api.configs.MY_COMMERCE_MANAGER_FULL_CONFIG_ID.*

    • Provide the data feed ID and access token from previous steps

  • Create a new DataFeedProcess that targets your new feed publication configuration and execute it

Customization

If the default implementation doesn’t quite satisfy your business needs, you can customize the behavior.

In almost all cases, the only customization needed will be in the behavior of CollectedProductToCommerceManagerFeedCSVProductTransformer, where Broadleaf collected entities are transformed into output elements. Please examine the logic of this component and override its behavior as needed.