Broadleaf Microservices
  • v1.0.0-latest-prod

Key Components

ContentResolverService

This is the base service API for fetching the actual digital contents of an Asset by their URLs.

This service should be used when the actual digital content related to an Asset needs to be retrieved. Implementations should rely on a StorageProvider to retrieve the location of the resource, and then return a handle to stream that resource.

StorageService

This interface exposes methods required to perform basic tasks involving processing of uploaded files. It is responsible for validation and pre-processing of files (such as unzipping and URL creation), but interacts with and relies on the StorageProvider to abstract away the details of file storage/management.

This service is also responsible for creating the Folders and Asset for the files it processes.

StorageProvider

This interface exposes methods required for interacting with an asset storage provider, that is, to store, retrieve, and delete digital content. In other words, this service is responsible for supporting DefaultAssetStorageType#INTERNAL assets that Broadleaf has access and responsibility to store. It should be implemented for each type of storage provider to be used.

By default, there should only be a single implementation of this component in use at a time, meaning that there should only be one active storage provider used by the Asset service.

Note this service’s use can be restricted by a MIME type whitelist (see isWhitelistedMimeType(String)). This means that implementors can restrict what kinds of files can be supported. This method is linked to a system property of the form broadleaf.asset.internal.storageProvider.mimeTypeWhitelist.

The method contract supports creating new resources from either source InputStream or File instances.

Also note that, in the case of failures while processing multiple resources in either the addResourcesFromStreams(Map) or deleteResources(Iterable) methods, the default implementation will try to process as many resources as possible before returning a composite exception with all files that failed and their causes (see BulkStorageException).

StorageProvider Paths

Overview

Notably, the 'URLs' or 'paths' provided to StorageProvider are expected to already be globally unique. For example, they will look something like /tenants/{tenant-id}/my-asset.jpg, or /applications/{application-id}/my-asset.jpg.

The StorageProvider component itself is not intended to carry the burden of DataTracking context discrimination or other complex processes. It is merely intended to be a data storage mechanism, and as such, it is the responsibility of the calling component to ensure the path/URL values are unique before submitting them to the StorageProvider. For example, when the StorageService is processing assets, it does the following:

  • For a newly uploaded Asset, if the Asset.url would be a match for an existing asset in the same context, it will apply an incrementing suffix to the end of the URL (ex: /my-asset.jpg becomes /my-asset-1.jpg). This ensures that at least within a particular tenant/application, the Asset.url itself is unique.

  • Before calling the StorageProvider, Asset.url is prepended with the context-aware prefix (ex: /tenants/{tenant-id} for tenant-level assets, or /applications/{application-id} for application-level assets) based on the Asset entity’s context. This allows different tenants/applications to have similarly named assets without clashing.

StorageLocationMapping

(since 2.0.3)

Internally, a StorageProvider implementation may not necessarily store assets exactly at the path/URL provided by the caller.

Broadleaf has a StorageLocationMapping entity that can optionally be used by such StorageProvider implementations to keep track of the mappings from the 'original location' (path/URL) provided by the caller to the actual 'storage provider location' that holds the binary data. This has the additional advantage of making potential migrations of data easier to tackle, should it be necessary.

This entity is only expected to be managed/used internally by StorageProvider implementations that need it, and is not intended to be exposed to outside callers. It is mainly an implementation detail. Please see the StorageLocationMappingService/StorageLocationMappingRepository components for performing operations on this entity.

As of 2.0.3, both the FilesystemStorageProvider and the GoogleCloudStorageProvider have been updated to create StorageLocationMapping records for any newly created assets. During asset resolution/management, the actual storage location is determined first by checking for a StorageLocationMapping record. If no mapping record is available, the system will fall back to the legacy (pre-2.0.3) storage location calculation. This guarantees backward compatibility and successful resolution of previously-created assets.

Implementations

There are a few implementations of StorageProvider that come with Broadleaf by default.

FilesystemStorageProvider

This is the default StorageProvider implementation. It is active when broadleaf.asset.internal.storage-provider.implementation=FILESYSTEM, and can be configured further with properties under broadleaf.asset.internal.storage-provider.filesystem.*, as noted in InternalAssetProperties.

With this approach, the system will access and manage assets as files under a directory path. In a production environment, the typical pattern is to establish a shared persistent volume, and use NFS volume mounts to allow each individual container to access the data at a locally-available path.

Storage Path Calculation

In order to achieve a more efficient and balanced distribution of files in the filesystem, the FilesystemStorageProvider will internally hash the caller-provided URL/path and use the result to calculate the actual storage path.

This evenly distributes the files and avoids having a single directory with too many files.

  • Behavior since 2.0.3

    • If the URL is /product/myproductimage.jpg, then the MD5 would be 35ec52a8dbd8cf3e2c650495001fe55f. Assuming broadleaf.asset.internal.storage-provider.filesystem.max-generated-directory-depth=2, this would result in the following file on the filesystem: {providerRootLocation}/35/ec/{RANDOM-ULID}.jpg

      • In this implementation, the filename is set to a fully random ULID value (albeit with the same extension as the original URL). This ensures that a collision, where multiple distinct caller-provided URLs map to the same actual location, is not possible.

      • This requires the StorageLocationMapping concept to map between original/actual paths.

  • Behavior prior to 2.0.3

    • If the URL is /product/myproductimage.jpg, then the MD5 would be 35ec52a8dbd8cf3e2c650495001fe55f. Assuming broadleaf.asset.internal.storage-provider.filesystem.max-generated-directory-depth=2, this would result in the following file on the filesystem: {providerRootLocation}/35/ec/myproductimage.jpg.

      • In this implementation, the filename from the original URL is preserved in the final path.

GoogleCloudStorageProvider

(since 2.0.1)

This implementation integrates with Google Cloud Storage to store assets as objects in buckets.

It is active when broadleaf.asset.internal.storage-provider.implementation=GCS, and can be configured further with properties under broadleaf.asset.internal.storage-provider.google-cloud-storage.*, as noted in InternalAssetProperties. This also requires the com.google.cloud:google-cloud-storage dependency to be available.

With this approach, the system will access and manage assets as objects in a GCS bucket.

Storage Path Calculation
  • Behavior since 2.0.3

    • If the URL is /product/myproductimage.jpg, and broadleaf.asset.internal.storage-provider.google-cloud-storage.path-prefix-in-bucket=blc_assets, then this would result in the following object name under the configured bucket: blc_assets/{RANDOM-ALPHANUM-PREFIX}-{RANDOM-ULID}.jpg

      • In this implementation, the filename is set to a fully random value (albeit with the same extension as the original URL). This ensures that a collision, where multiple distinct caller-provided URLs map to the same actual location, is not possible. The filename is a ULID prepended with random alphanumeric characters to honor Google Cloud Storage object naming best practices, particularly those around using non-sequential names.

      • This requires the StorageLocationMapping concept to map between original/actual paths.

  • Behavior prior to 2.0.3

    • If the URL is /product/myproductimage.jpg, then the MD5 would be 35ec52a8dbd8cf3e2c650495001fe55f. Assuming broadleaf.asset.internal.storage-provider.google-cloud-storage.max-generated-directory-depth=2 and broadleaf.asset.internal.storage-provider.google-cloud-storage.path-prefix-in-bucket=blc_assets, this would result in the following object name under the configured bucket: blc_assets/35/ec/myproductimage.jpg.

      • In this implementation, the filename from the original URL is preserved in the final object name.

      • This implementation was originally just a carry-over that matched the approach from FilesystemStorageProvider for the sake of consistency.

Sandboxing Configuration

At the time of this writing, there’s no sandboxable entities in this service. If a sandboxable entity is introduced in this service, the following configurations should be added:

spring:
  cloud:
    stream:
      bindings:
        persistenceOutput:
            triggeredJobEventInputPurgeSandbox:
              group: asset-purge-sandbox
              destination: triggeredJobEvent
broadleaf:
   transitionrequest:
     enabled: true
   changesummary:
     notification:
       active: true
   tracking:
     sandbox:
       purge:
         enabled: true

See Sandboxing In Detail for more details.

Note
These configurations typically only affect the Granular Deployment model. For Min and Balanced deployements, these configurations are likely already added at the flexpackage-level configuration.