Version Control for Reservoir Simulation Data: Git, DVC, and ZFS

Managing simulation files in reservoir engineering is a mess if you don't have a system. A single model run can produce gigabytes of data. Multiply that by dozens of sensitivity cases or history-match iterations and you're drowning in files fast. Here's how we solved it.

The Problem

Every reservoir engineer has been there — digging through folders named "final_v3_ACTUALLY_FINAL" trying to figure out which run produced a specific result. It's not just an annoyance. When a client asks "how did you get that result six months ago?" you need a real answer, not an archaeological expedition through your file system.

Git + DVC + ZFS: A Practical Stack

At Ridgeline we built a workflow around a self-hosted Gitea server paired with DVC (Data Version Control) running on ZFS storage. Each tool handles what it's good at:

Git tracks what matters — the input decks, scripts, and configuration files that define a simulation. These are small text files that version control was designed for.
DVC handles the heavy lifting, managing large binary outputs (restart files, grid properties, simulation results) and linking them to the exact Git commit that produced them.
ZFS ties it all together with built-in compression and snapshot capabilities, keeping storage costs manageable even as projects scale.

The Payoff: Reproducibility

The real value is reproducibility. When a client asks about a result from six months ago, we check out the exact commit, pull the associated data with DVC, and have a complete picture — inputs, outputs, and everything in between. No guessing, no recreating runs, no "I think it was this version."

This matters especially in carbon capture and storage projects where regulatory submissions require documented, traceable workflows. Being able to demonstrate exactly how a simulation result was produced isn't optional — it's a requirement.

Worth the Investment

Setting up this infrastructure takes time upfront. But the time you spend building it is a fraction of the time you'd lose chasing down lost files or recreating results. For any engineering team running complex subsurface models, investing in your data management infrastructure pays dividends.