Load service data via portable and persistent data volumes #22235

KevinMind · 2024-05-10T16:47:06Z

Description

Load data in mysqld, elasticsearch, rabbitmq services via persistent data volumes. Also load the storage data via a persistent volume.

Some of these volumes are exportable and importable via make commands which allows an environment to be "saved" at a particular point in time and then "resumed" with that exact set of data across the relevant containers.

Persistence:

Mysql: exportable/importable persistent volume
Storage: persistent volume mounted on host (already exported)
Rabbitmq: persistent not re-storeable (is there a need?)
Elasticsearch: persistent not re-storeable (can be reindexed instead)
Redis: persistent not re-storeable (no need for in memory)

Context

By moving our data to data volumes, we decouple the lifecycle of a compose project and it's containers from the data in the containers.

This means you can restart, recreate or totally reinstantiate your entire environment without destryoing any historical data you may have.

Additionally, this opens the possibility to load a snapshot of the database from one environment in another. There are a multitude of use cases for this such as testing or supporting multiple profiles of initial data.

Testing

Export data

First make a snapshot by running. It's important you do this first.

make data_export

This will create a directory backups/<timestamp> that will contain a tar file for each volume. this tar file contains the extracted data the container had at the moment of extraction. We do this first just to make sure we keep a snapshot of the original data.

Data persisting across containers

Now, try totally recreating your environment to see that the volume data is persisted. Note, we are not using the snapshot yet, we are just recreating the containers.

make down
make up

This creates a totally fresh environment by building the docker image and re-creating all containers, wiping any local state they may have. You will still have all of your data though because the data itself is located in the still running volumes.

You should run make down the first time, to decouple any links to anonymous volumes make down and make up but running make up will recreate containers even if they are already running in the future.

Now you can test that the app is running, but there should basically be no database.. the volumes are empty.

You should get an error like this on /login

And like this on the home page.

That is where our snapshot comes in.

Restore from the base snapshot.

Run make data_restore and you should have a restored environment.

Data persist from snapshot

Now, change something in the environment. Maybe change your admin user name, or update an addon, anything to change the state from the first snapshot.

Create a new snapshot, again make data_export

Now you can restore the first snapshot and see the data snap back.

make data_restore RESTORE_DIR=/Users/kmeinhardt/src/mozilla/addons-server/backups/<timestamp>

Make sure to include the full absolute path, and also include the timestamp from the FIRST snapshot.

Now you can check that the data has been restored to before you made the change. You can restore the change by running again make data_restore and pass the path to the newer snapshot, or pass no arguments. By default the command will restore the latest snapshot.

diox · 2024-05-14T10:39:38Z

I don't think we care that much about existing redis or rabbitmq data FWIW.

Interestingly for #22244 I explicitly wanted rabbitmq not to persist to make my life easier (can't control when we're forcing users to enable all feature flags, forcing to start from scratch works around the issue)

docker-compose.yml

KevinMind · 2024-05-14T12:48:00Z

Interestingly for #22244 I explicitly wanted rabbitmq not to persist to make my life easier

Regarding this, It's not clear to me if we are on the same understanding of how it currently works but by adding the data to a persistent volume, we are not persisting the state of the container. running make up in this case would create a totally new instnace of the container with whatever parameters you set. Only thing that would theoretically remain is the actual state of whatever is running. If you make up and in between change feature flags or any configurations, all of that gets propogated to the new container. It's just reading from a persistent data store.

One argument I have for persisting these volumes is that it removes the problem (I've discovered) of dangling volumes. When a container (like rabbit, mysql and redis currently) relies on a data volume, if you don't sepcify one it creates an anon volume. These sometimes end up dangling on compose down and just sit there forever, by using named data volumes, we have an explicit understanding which volume relates to which service and the volumes don't dangle because they are easily identifiable.

IDK, this is a trade off, because like you pointed out, if you actually WANT to destroy the data you would have to explicitly do that. FWIW, you can do that with.

docker compose down --volumes

which is now a part of the clean_docker command. If we want this more granularly, I could split the clean_docker to sub commands and you could clean_volumes specifically... idk wdyt?

Load storage via persistent data volume (host bound) Make up/down commands to easily recreate local environments

KevinMind · 2024-05-14T13:33:35Z

@diox and I alligned on a hybrid solution. adding --volumes to docker_compose_down removes anonymous and named volumes.

To persist the mysqld volume specifically, we made it an "external" volume with it's own lifecycle management. This way the mysql data can persist while other named volumes get cleaned up whenever you make down.

diox

Works great 👍

KevinMind force-pushed the db-volmes branch 3 times, most recently from d6b3a7d to 1c17fa5 Compare May 13, 2024 07:40

KevinMind requested a review from diox May 13, 2024 07:41

KevinMind marked this pull request as draft May 13, 2024 09:11

KevinMind force-pushed the db-volmes branch 3 times, most recently from f0332ce to 1d30887 Compare May 13, 2024 13:58

KevinMind changed the title ~~Load mysql data via persistent data volume~~ Load service data via portable and persistent data volumes May 13, 2024

KevinMind force-pushed the db-volmes branch 3 times, most recently from 93cbeab to f35de6d Compare May 13, 2024 15:41

KevinMind marked this pull request as ready for review May 13, 2024 16:03

diox reviewed May 14, 2024

View reviewed changes

docker-compose.yml Show resolved Hide resolved

docker-compose.yml Show resolved Hide resolved

KevinMind requested a review from diox May 14, 2024 12:52

KevinMind added 5 commits May 14, 2024 15:20

Infer superuser credentials from git config

29b85e7

Load service data via portable and persistent data volumes

a662a82

Load storage via persistent data volume (host bound) Make up/down commands to easily recreate local environments

TMP: add redis volume too

fab46e3

TMP: better control over volume state

17ed3aa

TMP: remove dead code

1acf8b9

KevinMind force-pushed the db-volmes branch from 28da609 to 1acf8b9 Compare May 14, 2024 13:20

TMP: more dead code gone

49946ff

diox approved these changes May 14, 2024

View reviewed changes

TMP: add wait for mysql before restoring snapshots

29753a5

KevinMind merged commit bc18619 into master May 14, 2024
15 checks passed

KevinMind deleted the db-volmes branch May 14, 2024 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load service data via portable and persistent data volumes #22235

Load service data via portable and persistent data volumes #22235

KevinMind commented May 10, 2024 •

edited

diox commented May 14, 2024

KevinMind commented May 14, 2024

KevinMind commented May 14, 2024

diox left a comment

Load service data via portable and persistent data volumes #22235

Load service data via portable and persistent data volumes #22235

Conversation

KevinMind commented May 10, 2024 • edited

Description

Context

Testing

Export data

Data persisting across containers

Restore from the base snapshot.

Data persist from snapshot

diox commented May 14, 2024

KevinMind commented May 14, 2024

KevinMind commented May 14, 2024

diox left a comment

Choose a reason for hiding this comment

KevinMind commented May 10, 2024 •

edited