Magnus' blog


Data cloning - Part 4: Verifying the configuration

In the previous post a builder was introduced to configure the cloning. Since this is user input, it has to be verified before it can be used. This helps give early warnings, and it can be used by developers in tests to verify the configuration early before development. This is an important aspect as it makes the user implementation unit testable.

The simple verification is the most straightforward:

  • Does the schema exist?
  • Are all the tables present?
  • Does the referenced column exist?

This is all checked by a verifier interface:

public interface ConfigurationVerifier {
    List<VerificationError> verify(Connection connection, CloneConfiguration configuration) throws CloneException;
}

Where the errors are represented as a Discriminated Union that is made in Java using a Sealed interface:

public sealed interface VerificationError
        permits ColumnNotFound, ReferencedTableNotConfigured, ReferencingTableNotConfigured, SchemaNotFound, TableNotFound
{}

With the example of an error being:

public record ColumnNotFound(
        String tableName,
        String columnName
) implements VerificationError { }

These are quite easily verified by querying the PostgreSQL pg_catalog, which contains all the information about how the tables in the database are structured.

For example all the columns are found with the following query:

select c.relname as table_name,
    a.attname as column_name,
    pg_catalog.format_type(a.atttypid, a.atttypmod) as column_type
from pg_catalog.pg_attribute a
    join pg_catalog.pg_class c on a.attrelid = c.oid
    join pg_catalog.pg_namespace n on c.relnamespace = n.oid
where n.nspname = ?
    and c.relname = any(?)
    and c.relkind = 'r'
    and a.attnum > 0
    and not a.attisdropped
order by c.relname, a.attnum

Where the conditions are defined by:

  • n.nspname is the schema
  • c.relname is the table name
  • c.relkind = 'r' means only ordinary tables as opposed to indexes and such
  • a.attnum > 0 means non-system columns. (system columns have values below 0)
  • not a.attisdropped means the column has not been dropped.

A similar query can be made against the pg_catalog to find table names, schema names or foreign key constraints.

Using these queries the Verifier checks that the configuration will not run into any obvious issues when executing.

Foreign keys

The first simple errors are what happens when the user has done something wrong. In those cases the cloner will not work without fixing them. The foreign key checks are a little bit different. Specifically what we want to check are the following scenarios:

  • A foreign key from a non-configured table points to a configured table.
    • This might be a sub-table of the previously configured table that was forgotten in the implementation.
  • A foreign key from a configured table points to a non-configured table.
    • This might be a forgotten table, but it could also be a central table where the entry is known to be present in other databases too. For example, a system setting might have a constant id across environments.

With both checks they are clear warnings as they could allude to a misconfiguration, but it might also be expected behaviour. By having these checks and configuring them as warnings, it can be up to the implementor to verify the warnings are not wrong.

For example consider a test that could be in a user implementation:

@Test
void verifyConfiguration(Connection connection) {
    // Arrange
    final var verifier = ConfigurationVerifier.create();
    
    final var expected = List.of(
        // The central_config table is constant across environment so is expected to not be configured.
        new ReferencedTableNotConfigured("central_config", "entity")
    );

    // Act
    final var errors = verifier.verify(connection, CENTRAL_CONFIGURATION);

    // Assert
    assertEquals(
        expected,
        errors,
        "An unexpected configuration error. Add it to the expected list if the entry is constant."
    );
}

Here the project implementation verifies that a central_config is purposely not configured. However, the test will fail and force a developer to take into account if they forgot to configure a table.

Conclusion

Querying pg_catalog lets the verifier catch obvious misconfigurations before any data is moved. The foreign key warnings in particular give developers a safety net against forgotten tables, and the expected-errors test pattern makes the configuration unit testable from day one.

View the source on github.com/HHMagnus/PostgresqlJavaEntityCloner

View next or previous post: