Database changes via spring boot vs SQL flyway script • blog.metters.dev

Data migration on a live database

The application of one project I used to work on had a spring boot backend. We used Flyway scripts to update the postgres database alongside the backend code.

Updating data leads to challenges, especially with live data, already stored in the database.

A simple and common scenario would be renaming a field from country to alphaCountryCode. In such a case, it is sufficient to simply rename the column using a SQL script:

ALTER TABLE items RENAME COLUMN country TO alpha_country_code;

But what if existing data needed to be migrated because the business logic changed?

Dry run and logging

At first, I tried to execute a dry run in SQL. All changes needed to be logged, so we would be able to investigate any possible issues later on. I ended up with a script similar to the one below, but this just does not feel satisfying:

BEGIN;

DO
$$
  BEGIN
    IF NOT EXISTS (SELECT 1
                   FROM information_schema.columns
                   WHERE table_name = 'items'
                     AND column_name = 'name') THEN
      ALTER TABLE items
        ADD COLUMN name CHAR
    END IF;
  END
$$;

CREATE TEMP TABLE dryrun_check
(
  uuid   UUID PRIMARY KEY,
  name   CHAR(1),
  status TEXT CHECK (status IN ('UPDATED', 'NOT_UPDATED_OR_NULL'))
) ON COMMIT PRESERVE ROWS;

WITH upd AS (
  UPDATE items item
    SET
      name = CASE
               WHEN item.type IN ('REGULAR', 'LARGE') THEN 'B'
               WHEN item.type = 'SMALL' THEN 'S'
               ELSE 'X'
        END
    FROM collections col
    WHERE col.uuid = item.col_uuid
    RETURNING item.uuid)
INSERT
INTO dryrun_check (uuid, name, status)
SELECT *
FROM (SELECT item.uuid,
             item.name,
             'UPDATED' AS status
      FROM items item
             JOIN upd u ON u.uuid = item.uuid
      UNION ALL
      SELECT item.uuid,
             item.name,
             item.payment_scheme_2,
             'NOT_UPDATED_OR_NULL' AS status
      FROM items item
             LEFT JOIN upd u ON u.uuid = item.uuid
      WHERE u.uuid IS NULL
         OR item.name IS NULL) src
ON CONFLICT (uuid) DO NOTHING;

DO
$$
  DECLARE
    r RECORD;
  BEGIN
    FOR r IN
      SELECT *
      FROM dryrun_check
      ORDER BY uuid
      LOOP
        RAISE NOTICE '%',
          json_build_object(
            'uuid', r.uuid,
            'name', r.name,
            'status', r.status
          )::text;
      END LOOP;
  END
$$;

ROLLBACK;

A better approach?

An event listener triggered by the ApplicationReadyEvent is perfect for this scenario. First, flyway scripts are executed, then the application boots up, and lastly, the migration method gets executed.

Applying new logic to data for me as a developer with a higher focus on backend is much more intuitive using backend code instead of SQL scripts. The compiler is extremely helpful, too. Also, it is possible to write tests for such methods to assert the outcome is as intended. If there are any issues or unexpected results during the migration, it is much easier to log what has happened. I would rather have to analyse my structured log files, instead of trying to figure out the root cause of the problem from error messages thrown during the script execution.

@Service
class ItemsMigrationProcessor(
    private val itemRepository: ItemRepository,
) {
    private val logger = LoggerFactory.getLogger(javaClass)

    @EventListener(ApplicationReadyEvent::class)
    @Transactional
    fun migrateItems() {
      logger.atDebug().log { "Starting to migrate items" }

      val items = itemRepository.findAll()
      items.forEach {
        it.editSomeStuff()
      }

      logger.atDebug().log { "Migrated [${items.size}] items" }
    }

    private fun Item.editSomeStuff() {
      logger.atDebug().log { "Migrating item ${this.uuid}" }
      this.name = when (this.type) {
        Type.REGULAR, Type.LARGE -> "B"
        Type.SMALL -> "S"
      }
      logger.atDebug().log { "Migrated item ${this.uuid} to name ${this.name}" }
  }
}

Disadvantages

Using the spring boot application to migrate data has some downsides too!

1. Limited memory on large datasets

Make sure the application has enough memory to update all entries at once. Alternatively, do not update all data in one run, but use chunks instead.

Issues like that are easy to overlook, because usually tests focus on correct logic, but not on performance/memory restrictions.

2. Execution time on large datasets

Using Flyway, the script gets executed before the application has booted. Our migration processor needs to wait until the application has finished starting in order to execute.

In most deployment processes, the application gets live as soon as it is started up and processes, such as the data migration, get executed on a live system. Keep in mind that users work with outdated data, if a freshly deployed application is accessible, before the migration has finished.

3. Testability does not mean all scenarios get tested

Tests need realistic data and a proper test setup to be able to predict issues.

4. Clean up

Do not forget to remove migration methods: even with idempotency, the new business logic of the application already gets applied and Database entries only need to be migrated one time. In general, it is preferable to remove unnecessary code after its execution.

Conclusion

I am more familiar with writing backend code
The migration code is testable, but requires proper test setup
Logged errors are easier to analyse