When writing automated tests, it's common to mock or simulate the database. The motivation is clear: mocks are fast, easy to set up, and avoid the overhead of managing a real database. Frameworks like Entity Framework Core offer in-memory implementations that make this approach even more convenient.

But there's a serious problem: these tests can pass while hiding real bugs.

In-memory databases and mocks often fail to accurately reproduce the behavior of a real database system — especially when it comes to things like joins, includes, constraints, and null handling. As a result, tests may appear to validate your code, while silently missing critical errors that only surface in production.

This post presents a real-world example where this mismatch caused a production bug — despite having what seemed like complete test coverage. It then outlines practical solutions for how to avoid these pitfalls and test database-dependent logic with confidence.

Case Study: Not Getting the Negatives

The system in question is a transactional platform that records incoming and outgoing financial transactions. It maintains a ledger of agreements, where each agreement contains a list of Outgoing and Incoming transactions. The primary function of the system is to pay out money, represented as Outgoing entries. When too much is paid out, a corresponding Incoming transaction is recorded for reconciliation. Certain Receivers appear across multiple agreements, and part of the system's job is to aggregate payouts for each receiver across all agreements and generate a Transaction for each.

At launch, the system had around 250 receivers and roughly half a million agreements.

The code looked something like this:

var agreements = context.Transaction
    .Include(p => p.Outgoing)
    .Include(p => p.Incoming)
    .ToList();

// aggregate functionality (omitted for brevity)

// send to bank (involved parsing into a specific file format)

The Include method in Entity Framework Core is used to eagerly load related data — in this case, the Outgoing and Incoming lists.

About 20 different test cases were identified and implemented, including:

Multiple agreements with outgoing transactions aggregating correctly across multiple receivers.
A single large receiver correctly aggregating across multiple outgoing and incoming transactions.

The system was written in C# using Entity Framework Core, and the tests used its in-memory implementation.

When the code was finally deployed to production, an error was discovered in the aggregation logic: the payout to a specific large receiver was different from the expected amount. Fortunately, manual verification mechanisms were in place to catch this discrepancy.

It took several days of debugging to trace the issue. The root cause was that the PostgreSQL provider for Entity Framework Core behaved differently from the in-memory implementation. Specifically, using Include on multiple lists resulted in only the first list being populated correctly; the second always contained just a single element. The issue was resolved by rewriting the query to use separate Select statements for each list, which worked as expected. EF Core’s in-memory provider does not truly emulate relational database behavior — it simulates navigation properties but lacks SQL join semantics, which can lead to incorrect relationship materialization.

What's important here is that the test case which mirrored the failing production case did exist — but the bug went unnoticed because the in-memory database behaved differently from the actual one.

Solutions

This case study highlights the serious drawbacks of mocking or simulating the database in tests.

There are multiple solutions to using the actual database in testcases:

Integration testing against a shared real database with test data is another option. While slightly more involved, it enables faster and more accurate validation of database-related code.
Testcontainers provides lightweight, disposable containers for databases (and other services). You can spin up a real database instance for each test, ensuring behavior matches production.

In addition, it’s important to separate business logic from infrastructure code. Logic that doesn’t require a database should live in its own layer and be tested independently, which allows for high-confidence unit tests while keeping the actual database interactions properly tested through integration tests. This idea is further explored in a previous post: Logic vs. Communication.

Lessons learned

In-memory or mocked databases don't guarantee production behavior. Even when tests pass, subtle differences — like how includes or joins are materialized — can introduce silent bugs.
Test coverage is not the same as confidence. You may have test cases that seem to cover critical paths, but if the underlying database isn't behaving the same, you're not truly validating them.
Always test with your production database provider. If you're deploying to PostgreSQL, then PostgreSQL should be part of your automated test pipeline — even if only in integration tests.
Use integration tests for infrastructure-dependent logic. Business logic can and should be tested in isolation. But anything involving data persistence, relationships, or queries must be tested with a real (or realistic) backing store.
Bugs can survive despite having the “right” test case. The presence of a test case isn't meaningful if the environment doesn't behave the same as production.
Real databases are worth the setup cost. They allow fast, and realistic database testing — catching issues before they reach production.