Database changes via spring boot vs SQL flyway script
/ 4 min read
Table of Contents
Data migration on a live database
The application of one project I used to work on had a spring boot backend. We used Flyway scripts to update the postgres database alongside the backend code.
Updating data leads to challenges, especially with live data, already stored in the database.
A simple and common scenario would be renaming a field from country to alphaCountryCode.
In such a case, it is sufficient to simply rename the column using a SQL script:
ALTER TABLE items RENAME COLUMN country TO alpha_country_code;But what if existing data needed to be migrated because the business logic changed?
Dry run and logging
At first, I tried to execute a dry run in SQL. All changes needed to be logged, so we would be able to investigate any possible issues later on. I ended up with a script similar to the one below, but this just does not feel satisfying:
BEGIN;
DO$$ BEGIN IF NOT EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'items' AND column_name = 'name') THEN ALTER TABLE items ADD COLUMN name CHAR END IF; END$$;
CREATE TEMP TABLE dryrun_check( uuid UUID PRIMARY KEY, name CHAR(1), status TEXT CHECK (status IN ('UPDATED', 'NOT_UPDATED_OR_NULL'))) ON COMMIT PRESERVE ROWS;
WITH upd AS ( UPDATE items item SET name = CASE WHEN item.type IN ('REGULAR', 'LARGE') THEN 'B' WHEN item.type = 'SMALL' THEN 'S' ELSE 'X' END FROM collections col WHERE col.uuid = item.col_uuid RETURNING item.uuid)INSERTINTO dryrun_check (uuid, name, status)SELECT *FROM (SELECT item.uuid, item.name, 'UPDATED' AS status FROM items item JOIN upd u ON u.uuid = item.uuid UNION ALL SELECT item.uuid, item.name, item.payment_scheme_2, 'NOT_UPDATED_OR_NULL' AS status FROM items item LEFT JOIN upd u ON u.uuid = item.uuid WHERE u.uuid IS NULL OR item.name IS NULL) srcON CONFLICT (uuid) DO NOTHING;
DO$$ DECLARE r RECORD; BEGIN FOR r IN SELECT * FROM dryrun_check ORDER BY uuid LOOP RAISE NOTICE '%', json_build_object( 'uuid', r.uuid, 'name', r.name, 'status', r.status )::text; END LOOP; END$$;
ROLLBACK;A better approach?
An event listener triggered by the ApplicationReadyEvent is perfect for this scenario.
First, flyway scripts are executed, then the application boots up, and lastly, the migration method gets executed.
Applying new logic to data for me as a developer with a higher focus on backend is much more intuitive using backend code instead of SQL scripts. The compiler is extremely helpful, too. Also, it is possible to write tests for such methods to assert the outcome is as intended. If there are any issues or unexpected results during the migration, it is much easier to log what has happened. I would rather have to analyse my structured log files, instead of trying to figure out the root cause of the problem from error messages thrown during the script execution.
@Serviceclass ItemsMigrationProcessor( private val itemRepository: ItemRepository,) { private val logger = LoggerFactory.getLogger(javaClass)
@EventListener(ApplicationReadyEvent::class) @Transactional fun migrateItems() { logger.atDebug().log { "Starting to migrate items" }
val items = itemRepository.findAll() items.forEach { it.editSomeStuff() }
logger.atDebug().log { "Migrated [${items.size}] items" } }
private fun Item.editSomeStuff() { logger.atDebug().log { "Migrating item ${this.uuid}" } this.name = when (this.type) { Type.REGULAR, Type.LARGE -> "B" Type.SMALL -> "S" } logger.atDebug().log { "Migrated item ${this.uuid} to name ${this.name}" } }}Disadvantages
Using the spring boot application to migrate data has some downsides too!
1. Limited memory on large datasets
Make sure the application has enough memory to update all entries at once. Alternatively, do not update all data in one run, but use chunks instead.
Issues like that are easy to overlook, because usually tests focus on correct logic, but not on performance/memory restrictions.
2. Execution time on large datasets
Using Flyway, the script gets executed before the application has booted. Our migration processor needs to wait until the application has finished starting in order to execute.
In most deployment processes, the application gets live as soon as it is started up and processes, such as the data migration, get executed on a live system. Keep in mind that users work with outdated data, if a freshly deployed application is accessible, before the migration has finished.
3. Testability does not mean all scenarios get tested
Tests need realistic data and a proper test setup to be able to predict issues.
4. Clean up
Do not forget to remove migration methods: even with idempotency, the new business logic of the application already gets applied and Database entries only need to be migrated one time. In general, it is preferable to remove unnecessary code after its execution.
Conclusion
- I am more familiar with writing backend code
- The migration code is testable, but requires proper test setup
- Logged errors are easier to analyse