Our Great Migration from MongoDB to PostgreSQL

Published on
July 2nd, 2024
Author
Category
How To
Share On
Hey Product Hunt!
I’m excited to share our story for how we recently performed a full database migration from MongoDB to PostgreSQL at Infisical. This entailed deliberating the initiative, adopting new tech, creating new database schemas, rewiring logic, re-writing queries, to migrating millions (if not billions) of database records over to PostgreSQL.

Where we started

When we first built Infisical, we built it with the stack that felt most familiar with the team. As part of that stack, we chose MongoDB + Mongoose ORM because the combination presented least overhead and allowed us to ship quality features quickly. As Donald Knuth states, “premature optimization is the root of all evil,” and there was certainly no need for further optimization at the time.
At the time, we were also more focused on building Infisical Cloud, the managed SaSS offering, and given this focus, we didn’t anticipate as many users self-hosting the product and hence it wasn’t designed with that use-case in mind.
Note: Looking back, we followed YC’s traditional advice that you should not over-optimize especially when just starting out — Just pick a stack that you can build with most quickly, reduce scope, and hack together the MVP. We continue to follow this principle with optimizations made overtime (e.g. moving from Heroku to DigitalOcean to AWS, revamping features based on feedback, etc.).

Why not MongoDB?

While MongoDB served Infisicalwell in the early days, it started showing signs of shortcoming when the use-case of our product evolved beyond the managed service. As time passed, we found that many organizations, especially ones operating at the intersection of compliance and security, preferred self-hosting Infisical as opposed to using Infisical Cloud; others had on-prem requirements that needed to be met.
With demand growing for self-hosting Infisical, we found ourselves shipping many features catered to reducing the learning curve needed to self-host Infisical and, as part of that, we ended up leaving MongoDB in favor of PostgreSQL.
In practice, we and our customers often ran into constraints around the capabilities and usability of MongoDB like the lack of support for transactions, clean-up, inconsistent versioning across managed offerings by cloud providers, not to mention issues associated with schema-less database design structure.
Amongst a dozen more reasons, we came to the realization that a full database migration to something more universal was the ultimate feature needed to make Infisical more accessible to teams and organizations around the world.

Why PostgreSQL?

When searching for a new database, we began by listing out what aspects mattered most to us: ease of management (i.e. configuration, deployment, and scaling included), built-in support for transactions, and relational capabilities. As part of the deliberation, we also contemplated whether or not we should build our own integrated storage or pursue an external storage solution.
Here’s what that meant for each option:
Integrated storage: We could package in a database system like SQLite directly into Infisical and pursue a horizontal replication strategy to reduce latency by avoiding extra network hops. In this model, scaling the system would mean deploying multiple instances of Infisical and have them communicate with each other via some consensus algorithm like Raft. While this seemed like an excellent solution since customer’s wouldn’t need to connect any dependencies to run Infisical, the tooling ecosystem to execute this vision felt immature and the engineering effort required for it felt nothing short of overwhelming.
External storage: We could simply replace MongoDB with another database(s) like PostgreSQL or MySQL and use its built-in scaling capabilities. Although this solution didn’t fully eliminate friction associated with needing external dependencies to use Infisical, we felt that it already delivered significant benefits by virtue of not being MongoDB. When it came to supporting one or multiple databases, we felt that supporting multiple would mean missing out on the unique advantages of each solution; it would also add to our engineering overhead.
After careful consideration, we chose PostgreSQL. Beyond having a vibrant community, extensive documentation, and a myriad of solutions and extensions available, we appreciated most its open source nature and how the vast majority of cloud providers offered managed services of PostgreSQL.
Above all, this meant that users of Infisical could more easily self-host our platform on any cloud provider and pair it with its corresponding managed PostgreSQL service. Moreover, given how widely-adopted the database has become, we were confident that users would have less trouble operating it when using Infisical.

What about the ORM?

After settling on PostgreSQL, we needed to figure out how our application would interact with the database. Right off the bat, we wanted something comparable to our experience with MongoDB where we used Mongoose ORM. So, we began evaluating candidates on the basis of maturity, visualization and migration support, and appropriate level of abstraction; we primarily considered Drizzle ORM, Prisma ORM, TypeORM, and Knex.js, a query builder.
At the end, we decided to use Knex.js, a query builder, instead of a ORM to maintain better control over the database. While admittedly, going with raw SQL would be most versatile with least abstraction in place, we felt the approach would be far too error-prone and frankly cumbersome to maintain, especially without proper TypeScript support. Moreover, beyond being close to bare SQL, Knex.js came with its own toolkit for seeding and migration, had a mature ecosystem with excellent documentation and answers for almost any possible query. Coupled with some custom Zod integration work, we managed to get it to a satisfactory level for TypeScript support.
Having decided on the database and ORM, we kicked off a process that would ultimately result in a re-write of dozens of data structures and hundreds of queries across the application.

How did we plan the migration?

Toward the end of the code-rewrite, we started to think about how we would conduct the migration operation to map our MongoDB data to PostgreSQL with minimal disruption to the Infisical Cloud platform.
Given Infisical’s critical role in customers infrastructure, we immediately ruled out the possibility of having any absolute downtime. The part where we had to make a compromise was in disallowing write operations during the brief migration window (i.e. customers would not be able to create or update application configuration) in exchange for higher guarantee of data integrity. This tradeoff seemed acceptable given that customers primarily fetched back secrets from Infisical and, to a much lesser extent, updated their application configuration on a second-by-second basis.
Next, regarding the actual migration operation, we needed to dump data from MongoDB, transform it carefully, and insert it back into PostgreSQL. As we audited the migration sequence, we grappled through challenges like making sure that various tree-like structures from NoSQL were correctly transformed to their relational counterparts; this was particularly sensitive for data structures like folders that had recursive considerations. We also found that we needed a persistent way to store and map identifiers in MongoDB to those in PostgreSQL; doing so in-memory would not work considering how much data we were dealing with. In the end, we settled for using the LevelDB key-value store to assist with identifier storage and lookup operations. With it, we would move data table-by-table into PostgreSQL.

The Great Migration

Finally, we were ready to conduct the migration. At this point, folks not directly involved in the codebase re-write had spent a much-needed quarter improving other aspects of Infisical including making frontend changes, performing maintenance patches, extending client functionality, and writing up better documentation. We now all reconvened to prepare for the migration itself that is replacing the application codebase with the new one and transferring data over from MongoDB to PostgreSQL.
As part of the preparation, we drafted a detailed migration checklist with an expected timeline.
On a high-level, the plan looked something like this:
  • In the weeks building up to the migration, we would communicate in advance via both email and in-app banner to let users know about the impending database upgrade. We would conduct thorough testing of every feature flow on the platform and perform trial runs for the migration.
  • The migration itself would occur within a six-hour window where only read operations would be allowed to the platform. During this window, we would run the migration script to move data from MongoDB to PostgreSQL, check that no data was lost, and if successful switch the DNS to the new instance. There were of course backup plans in place in case things went south.
  • Finally, after the migration, we would iron out any residual issues and start rolling out new documentation for working with Infisical and PostgreSQL.
Fortunately, the migration turned out to be smooth with zero data loss and only a few non-essential incidents of feature malfunction; we ironed out these bugs out in the following 36 hours with minimal impact to customers. Overall, we considered the initiative to be a success given the objective at hand, the scope of the task, and the resulting execution of it.

Closing Thoughts

The decision to move from MongoDB to PostgreSQL was not an easy one from the get-go. All in all, the initiative took us 3–4 months to perform with careful planning and discussion around why we needed to perform it, how we were going to do it; and then to execute it all with care. For those of you who are still with me, I’d highly recommend thinking through the use-case and implementation deeply before attempting such a big endeavor.
Overall, I’m extremely happy that everything went according to plan and we were able to deliver such a huge update that will make a large difference to users of Infisical moving forward!
Tony is a co-founder of Infisical where he works on all things Engineering and Product; he's a YC alum and Cornell grad where he studied CS.
Comments
Be the first to comment