Magnus' blog


Data cloning - Part 2: Moving multiple tables

In the last post the COPY-function was explored. This one expands on it to allow cloning an entity from multiple tables at a time.

Read the rest of the series:

In order to move multiple tables the idea is to have a list specifying the tables and how to select the specific data:

record CloneTable(String table, String select) {}

List<CloneTable> transfer(long id) {
    return List.of(
        new CloneTable("my_table", "select * from my_table where id = " + id),
        new CloneTable("second_table", "select * from second_table where my_table_id = " + id),
        new CloneTable("third_table", "select tt.* from third_table tt join second_table st on st.id = tt.second_table_id where st.my_table_id = " + id )
    );
}

In the example the first statement to select the main table is simple, but for each join table the select gets more troublesome. This will be explored more in a future post.

Afterwards a ZIP file can be constructed with the content from the COPY. Imagine a ZIP-file containing:

  • my_table.csv
  • second_table.csv
  • third_table.csv

In a closed system the ZIP can be trusted to maintain the order of the selects such that they match the same order when they are inserted.

Now the export becomes:

void exportClone(OutputStream outputStream, List<CloneTable> tables, Connection connection) throws SQLException, IOException {
    final var copyManager = getCopyManager(connection);

    try (final var zos = new ZipOutputStream(outputStream)) {

        for (final var cloneTable : tables) {

            zos.putNextEntry(new ZipEntry(cloneTable.table() + ".csv"));

            final var copyOut = "copy (" + cloneTable.select() + ") to stdout with csv header";

            copyManager.copyOut(copyOut, zos);

            zos.closeEntry();
        }

        zos.finish();
    }
}

Since the import can rely on the ordering it becomes very simple:

void importClone(InputStream inputStream, Connection connection) throws SQLException, IOException {
    final var copyManager = getCopyManager(connection);

    try (final var zis = new ZipInputStream(inputStream)) {
        ZipEntry entry;

        while ((entry = zis.getNextEntry()) != null) {
            final var fileName = entry.getName();
            final var tableName = fileName.substring(0, fileName.length() - 4);

            final var copyInSql = "copy " + tableName + " from stdin with csv header";
            copyManager.copyIn(copyInSql, zis);

            zis.closeEntry();
        }
    }
}

This supports the most basic cases, and is a simplified version of a similar solution I wrote for another project. It supports moving the data, but is error-prone and can become very complex, since you have to write the SQL.

It is important to note that these approaches use streams, since it will allow for larger data sizes to be transferred instead of saving a possibly big ZIP file or having it in-memory.

Row Level Security

PostgreSQL has a limitation on COPY IN for tables that contain Row Level Security policies. In those cases the import can be expanded to use a temporary table to store the data before moving it into the main tables:

// while loop from importClone
while ((entry = zis.getNextEntry()) != null) {
    final var fileName = entry.getName();
    final var tableName = fileName.substring(0, fileName.length() - 4);

    try (final var stmt = connection.createStatement()) {
        stmt.execute("create temp table temp_" + tableName + " (like " + tableName + " including defaults)");
    }

    final var copyInSql = "copy temp_" + tableName + " from stdin with csv header";
    copyManager.copyIn(copyInSql, zis);

    try (final var stmt = connection.createStatement()) {
        stmt.execute("insert into " + tableName + " select * from temp_" + tableName);
    }

    zis.closeEntry();
}

This bypasses the RLS restriction because the including defaults of the temp table does not include security policies.

Conclusion

Packaging multiple tables into a streamed ZIP archive makes it possible to move an entire slice of related data between environments in one operation. The ordering of inserts is handled implicitly by the ZIP entry order, though writing the select statements manually is still error-prone. The next post introduces a builder to address that.

View the source on github.com/HHMagnus/PostgresqlJavaEntityCloner

View next or previous post: