Don't rely on the File System
File systems are ubiquitous in development environments, but this availability often leads developers to assume file access will always be present. This assumption can cause issues in production, especially in containerized or cloud setups. There are two important questions that are often overlooked:
- Does the program have access to a file system?
- Is the expected output truly a file?
This blog post explores these issues and offers alternative approaches.
TL;DR: Don’t assume file system access in production environments like Docker or cloud. Favor in-memory structures and defer file output when necessary.
Does the Program have access to a File System?
During development, the answer is usually a resounding "yes", you have access to your own file system. But the same cannot always be said in production environments:
- On Linux, it's common to run programs with minimal permissions, which may mean the application cannot write to the file system.
- A Docker container uses a virtual file system, which may not persist or may be isolated from the host.
- Cloud deployments often have limited disk space. For example, on Google Cloud, the default configuration might allocate only 10GB of disk space alongside 32GB of RAM. In such setups, you might actually have more RAM than disk space.
- In cloud-native environments, object storage (like S3, Azure Blob, or GCS) is often a better choice for persistent storage than local disk.
- Some application servers may lack file system access by default. For example, certain WildFly or JBoss configurations do not include file system access unless explicitly configured.
These limitations are a primary reason why this blog post recommends avoiding the file system when possible. If persistent storage is truly required, it's crucial to identify that need early and ensure the necessary infrastructure is in place. In environments where developers don't manage the servers directly, this often leads to problems if not handled in advance.
Is the Expected Output really a File?
Do you actually need to write a file to disk, or could you simply work with the data in memory? A file implies a specific structure saved on disk, often with metadata attached. But in many web applications, for example, the goal is to return the data to the user. In those cases, writing the data to the server’s file system is unnecessary, and may even be the wrong abstraction.
Instead, consider using an in-memory structure (such as a byte array) and returning it directly in the response.
It’s not always immediately clear whether a file is truly needed. In such cases, it helps to delay file writing until a later stage. For instance, your core logic could return a byte array, with file storage handled separately if needed. Combining data generation and file writing in the same place often makes the code harder to change later.
When dealing with very large data sets that don’t consistently fit in memory, writing to disk may be necessary. Even then, you should carefully assess the operational constraints, especially since available disk space may still be more limited than RAM.
Example: Mixing Data Construction with File Writing
An example could be writing user data to a file, where you could express the logic like this:
public void exportUserDataToFile(List<User> users, Path filePath) throws IOException {
try (BufferedWriter writer = Files.newBufferedWriter(filePath)) {
writer.write("ID,Name,Email\n");
for (User user : users) {
writer.write(user.getId() + "," + user.getName() + "," + user.getEmail() + "\n");
}
}
}
This approach works, but it tightly couples the act of generating the CSV content with writing it to disk. That makes it hard to:
- Return the data to a web client instead of saving it
- Write to a cloud bucket, database, or memory stream
- Unit test the CSV generation in isolation
A better approach would be to refactor the code so that content generation and output are independent:
public class UserCsvExporter {
public byte[] generateCsv(List<User> users) {
StringBuilder sb = new StringBuilder();
sb.append("ID,Name,Email\n");
for (User user : users) {
sb.append(user.getId())
.append(',')
.append(user.getName())
.append(',')
.append(user.getEmail())
.append('\n');
}
return sb.toString().getBytes(StandardCharsets.UTF_8);
}
public void saveToFile(byte[] csvData, Path filePath) throws IOException {
Files.write(filePath, csvData);
}
public void writeToHttpResponse(byte[] csvData, HttpServletResponse response) throws IOException {
response.setContentType("text/csv");
response.setHeader("Content-Disposition", "attachment; filename=\"users.csv\"");
response.getOutputStream().write(csvData);
}
}
This provides the following:
- Serve the CSV directly in a web response.
- Save it to disk if needed.
- Unit test generateCsv() without mocking the file system.
A more advanced approach could use a stream, but a byte array will usually be enough.
Conclusion
In most cases, use in-memory data structures to manage your data, and only persist it to a file if absolutely necessary. In web applications, you can often return data directly without saving it to disk at all. When file output is required, structure your code to defer the write operation, making it easier to change or remove later.
Understanding these issues up front can save significant time and prevent critical production failures.