Magnus' blog


Data cloning - Part 1: Moving data between environments

Reproduction is the most important part of bugfixing. Without it you cannot verify when an issue is fixed. This requires knowing the state of the system and action that caused it to fail. However, the state of a system (the data) is not always something that is available to the developer. In highly sensitive environments, data access is often restricted to minimize the possibility of leaks. With these restrictions it becomes near impossible to reproduce bugs. This blog post is the first in a series introducing a Data Cloner project, I have been working on, to solve this issue.

This is the only post in the series. Stay tuned for more!

Before an issue has been investigated it is usually hard to know exactly what data is needed to reproduce a production issue. In most cases the issue can be related to an Aggregate Root Entity, so the idea is to clone the specific root and all its related entities into a separate database. This approach allows picking only the required information to reproduce a certain issue, moving it to another environment without leaking other data.

The Aggregate Root Entity usually has information in other tables that needs to be joined for it to make sense. Therefore the cloning needs to support both one-to-one, one-to-many, many-to-one and many-to-many relations, in a way that still only picks out the related data.

The project originated as a solution to this very case, but made in a less general way. It solved exactly the needs we had on another project, but I wanted to make a general version that could be more easily reused. It can be found on github.com/HHMagnus/PostgresqlJavaEntityCloner

This post introduces the initial version of moving data between environments, which will be expanded upon in later posts.

Transferring data with COPY

The initial version is to use COPY to export and import the data. COPY allows extracting a specific select statement and inserting it in another

  1. The data is extracted to an CSV using:
    copy (select * from my_table where id = :id) to stdout with csv header
    
  2. The data is inserted into another database using:
    copy my_table from stdin with csv header
    

And in theory the data is now transferred.

COPY in Java

In order to do the COPY in Java, the PostgreSQL library has a CopyManager that can be accessed directly from the Connection:

CopyManager getCopyManager(Connection connection) throws SQLException {
    return connection.unwrap(PGConnection.class).getCopyAPI();
}

Data can then be exported using:

void exportClone(OutputStream outputStream, Connection connection, long id) throws SQLException, IOException {
    final var copyManager = getCopyManager(connection);
    final var select = "select * from my_table where id = " + id;
    final var copyOut = "copy (" + select + ") to stdout with csv header";
    copyManager.copyOut(copyOut, outputStream);
}

And imported using:

void importClone(InputStream inputStream, Connection connection) {
    final var copyManager = getCopyManager(connection);
    final var copyIn = "copy my_table from stdin with csv header";
    copyManager.copyIn(copyIn, inputStream);
}

Conclusion

With these simple copy methods, a single table can be cloned from one environment to another. The COPY command gives us an efficient, low-overhead mechanism for moving precise slices of data — rather than dumping entire tables — which is exactly what we need when working under data access restrictions.

That said, a real production issue rarely lives in a single table. An Aggregate Root typically spans multiple related tables, and cloning it meaningfully requires understanding and following those relationships. The next post in the series will tackle exactly that: traversing the full aggregate graph, handling one-to-one, one-to-many, and many-to-many relations, and ensuring everything lands in the target environment in a consistent state.

View the source on github.com/HHMagnus/PostgresqlJavaEntityCloner

View next or previous post: