Broadleaf Microservices
  • v1.0.0-latest-prod

Purging Obsolete Sandbox Data

(since DataTracking 2.0.5)

Overview

As data moves through sandboxing workflows, database records representing obsolete sandbox state build up in the system.

These are identified as records whose trk_sandbox_archived column is set to 'Y'.

The mechanism outlined in this article is responsible for purging these obsolete records from the system.

High Level Approach and Key Components

  • Basic purge configuration settings are listed in the Obsolete Sandbox Data Purge Properties documentation

  • Out of box, Broadleaf comes with a ScheduledJob of type PURGE_OBSOLETE_SANDBOX_DATA that runs periodically

  • PurgeObsoleteSandboxDataListener listens for triggered PURGE_OBSOLETE_SANDBOX_DATA job events and invokes PurgeObsoleteSandboxDataHandler to execute the processing

    • To avoid holding up the message listener for long periods of time, execution is performed on a background thread managed by the TaskExecutor with bean name purgeObsoleteSandboxDataTaskExecutor

  • PurgeObsoleteSandboxDataHandler invokes TrackableRepository.purgeObsoleteSandboxData on each eligible repository.

  • In each repository, TrackableRepository.purgeObsoleteSandboxData is responsible for purging obsolete sandbox data for the domain managed by that repository.

JpaTrackableRepository.purgeObsoleteSandboxData implementation

Note
Please see the JPA Obsolete Sandbox Data Purge Properties for additional configuration options specific to this implementation.

The default JpaTrackableRepository.purgeObsoleteSandboxData implementation is fairly comprehensive and should work well for most entities:

  • Handles boilerplate logic around deleting entities in batches as driven by configuration properties

  • Applies a 'best effort' approach to explicitly also clean up foreign key references coming from OneToOne mappings and OneToMany mappings. Deleting entities making hard references to obsolete records is necessary, as database constraints will otherwise disallow deletion of the obsolete record.

    Note

    By default, handling of hard references made from unidirectional relationships (where the relationship is only defined on the 'other' entity and there is no backreference to it from the current entity) will not be supported.

    The implementation only supports deleting hard-references from bidirectional relationships, as it relies on the backreferences in the repository’s managedType definition to find any mappings that may be making a foreign-key reference to any obsolete entities of said managedType.

    • At a high level, the implementation is using JPA Criteria to get a list of native IDs by querying for sandbox records that meet the "obsolete" criteria. Then, using EntityManager, it finds all the fields from other classes that are making hard references to the managedType at hand. Once found, this information (held in a EntityHardReferenceMapping object) is used to build a new delete query for the entities of those classes that are pointing to any native ids in our original list.

    • Notably, there is no explicit logic to handle hard references coming from ManyToMany or ElementCollection relationships, as this is implicitly handled by Hibernate.

Customization via JpaPurgeObsoleteSandboxRecordsHandler

It is possible that the default behavior of JpaTrackableRepository.purgeObsoleteSandboxData is not suitable for a particular entity.

If the customization needs to be drastic, clients can override the JpaTrackableRepository.purgeObsoleteSandboxData method as a whole for the entity’s repository.

However, in many cases, clients may only want to tweak the deletion behavior, while still leveraging the boilerplate obsolete entity fetching and batching logic provided by the default implementation. In support of this, Broadleaf has a JpaPurgeObsoleteSandboxRecordsHandler interface which can be implemented for each entity requiring custom behavior.

The JpaPurgeObsoleteSandboxRecordsHandler interface is meant to be implemented to override the delete behavior of a single batch of obsolete sandbox records, for a specific type of entity. If an implementation exists for a particular entity, it will be used once a list of native ids of obsolete records to delete has already been collected. This should facilitate what we expect to be a common customization point for clients by reducing the amount of boilerplate code needed to have a working extension.