Magnus' blog


Supporting SQL Data Migration with Rich Domains

In an ideal scenario, only the application would ever touch the database. You might even use libraries to automatically generate and update schemas, almost forgetting the database exists. But data migrations change that — suddenly the database becomes another interface where things can go wrong, exposed to direct manipulation outside the application’s control. This post is about how to prevent data integrity issues during migrations by modelling rich domains.

Migrating data between systems is usually done directly at the data layer. This is the most convenient and performant approach, but it makes testing much harder. SQL databases tend to enforce only the most basic data requirements, and several common patterns fall through the cracks:

  • Enum values are usually stored as numbers or strings, making it easy to insert a value the application has no corresponding model for.
  • Dependent fields are hard to enforce. For example, if a boolean flag is set, another field may always need to be populated — but the database will only see a nullable column, not the business rule behind it.
  • Calculated values stored for performance reasons may be impossible to validate or recompute in pure SQL.

These are just a few examples. In practice, any business rule that lives in application code rather than the schema is invisible to a migration working at the data layer.

Rich Domain Hydration

Rich domain models wrap primitives in typed value objects that enforce business rules at construction time. They typically support at least two ways to be constructed: a normal factory method (of) that validates external state, and a rehydration method (rehydrate) that reconstructs an object from persisted data. The key difference is intent — of enforces business rules, while rehydrate only checks that the data fits the expected format.

Here is an example:

public class Order {
    public final OrderNumber number;
    public final RecipientId recipient;
    public PaymentId payment;

    private Order(OrderNumber number, RecipientId recipient, PaymentId payment) {
        this.number = number;
        this.recipient = recipient;
        this.payment = payment;
    }

    public static Order of(
        RecipientId recipient,
        OrderNumberReservationSystem orderNumberReservationSystem
    ) {
        final var orderNumber = orderNumberReservationSystem.reserve();
        return new Order(orderNumber, recipient, null);
    }

    public void registerPayment(
        Payment payment
    ) {
        // verify throws if it does not match
        payment.verifyOrderNumber(number);
        this.payment = payment.getId();
    }

    public static Order rehydrate(
        final String orderNumber,
        final Long recipientId,
        final Long paymentId
    ) {
        // Any of the `of` methods could throw if they fail their checks
        final var orderNumberDomain = OrderNumber.of(orderNumber);
        final var recipientIdDomain = RecipientId.of(recipientId);
        final var paymentIdDomain = PaymentId.of(paymentId);
        return new Order(orderNumberDomain, recipientIdDomain, paymentIdDomain);
    }
}

An Order is initially created without a payment. The payment is attached later once received. The normal construction path does not support rehydration, so rehydrate allows the database values to be passed in directly to reconstruct the object. Importantly, rehydrate still delegates to the value object constructors (OrderNumber.of, RecipientId.of, etc.), so if a stored value is malformed or out of range, it will throw — surfacing the data problem rather than silently loading bad state.

Data Integrity Through Rehydration

Because rehydration still validates that data fits the expected format, it gives you a simple and powerful way to verify your database after a migration. Here is an example test using a JPA entity OrderJpa that maps directly to the orders table:

EntityManager em;

@Test
public void verifyData() {
    final var orders = em.createQuery("select o from OrderJpa o", OrderJpa.class)
        .getResultList();

    var failed = false;
    
    for (final var orderJpa : orders) {
        try {
            final var order = Order.rehydrate(
                orderJpa.getOrderNumber(),
                orderJpa.getRecipientId(),
                orderJpa.getPaymentId()
            );
        } catch (DomainValidationException ex) {
            ex.printStackTrace();
            failed = true;
        }
    }

    assertFalse(failed);
}

This iterates through all orders and attempts to rehydrate each one, logging any failures to the console. Rather than stopping on the first error, it collects all failures so you get a complete picture of what went wrong in one run.

In practice, this test works well as an integration test run against a staging environment after the migration completes, but before switching traffic over to the new system. For larger datasets, it can also be structured as a lightweight script run directly against production in a read-only transaction. Either way, the feedback loop is fast: run the test, see which records failed and why, fix the migration, and run again.

Conclusion

Data migrations are inherently risky, and issues often only surface after the migration is complete. Rich domain models give you a testing interface that is not usually present in a migration setup, letting you verify data integrity at the application layer rather than relying solely on database constraints.

This pattern scales well as the domain grows. As new value objects and rules are added to the application, the rehydration tests automatically cover them — there is no extra test maintenance burden. Combined with other migration strategies like incremental rollouts or dual-write periods, using rich domains for post-migration verification gives you much stronger confidence that the data your application depends on is in the shape it expects.

View next or previous post: