Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load service data via portable and persistent data volumes #22235

Merged
merged 7 commits into from May 14, 2024
Merged

Conversation

KevinMind
Copy link
Contributor

@KevinMind KevinMind commented May 10, 2024

Description

Relates to: mozilla/addons#14784

Load data in mysqld, elasticsearch, rabbitmq services via persistent data volumes. Also load the storage data via a persistent volume.

Some of these volumes are exportable and importable via make commands which allows an environment to be "saved" at a particular point in time and then "resumed" with that exact set of data across the relevant containers.

Persistence:

  • Mysql: exportable/importable persistent volume
  • Storage: persistent volume mounted on host (already exported)
  • Rabbitmq: persistent not re-storeable (is there a need?)
  • Elasticsearch: persistent not re-storeable (can be reindexed instead)
  • Redis: persistent not re-storeable (no need for in memory)

Context

By moving our data to data volumes, we decouple the lifecycle of a compose project and it's containers from the data in the containers.

This means you can restart, recreate or totally reinstantiate your entire environment without destryoing any historical data you may have.

Additionally, this opens the possibility to load a snapshot of the database from one environment in another. There are a multitude of use cases for this such as testing or supporting multiple profiles of initial data.

Testing

Export data

First make a snapshot by running. It's important you do this first.

make data_export

This will create a directory backups/<timestamp> that will contain a tar file for each volume. this tar file contains the extracted data the container had at the moment of extraction. We do this first just to make sure we keep a snapshot of the original data.

Data persisting across containers

Now, try totally recreating your environment to see that the volume data is persisted. Note, we are not using the snapshot yet, we are just recreating the containers.

make down
make up

This creates a totally fresh environment by building the docker image and re-creating all containers, wiping any local state they may have. You will still have all of your data though because the data itself is located in the still running volumes.

You should run make down the first time, to decouple any links to anonymous volumes make down and make up but running make up will recreate containers even if they are already running in the future.

Now you can test that the app is running, but there should basically be no database.. the volumes are empty.

You should get an error like this on /login

image

And like this on the home page.

image

That is where our snapshot comes in.

Restore from the base snapshot.

Run make data_restore and you should have a restored environment.

Data persist from snapshot

Now, change something in the environment. Maybe change your admin user name, or update an addon, anything to change the state from the first snapshot.

Create a new snapshot, again make data_export

Now you can restore the first snapshot and see the data snap back.

make data_restore RESTORE_DIR=/Users/kmeinhardt/src/mozilla/addons-server/backups/<timestamp>

Make sure to include the full absolute path, and also include the timestamp from the FIRST snapshot.

Now you can check that the data has been restored to before you made the change. You can restore the change by running again make data_restore and pass the path to the newer snapshot, or pass no arguments. By default the command will restore the latest snapshot.

@KevinMind KevinMind force-pushed the db-volmes branch 3 times, most recently from d6b3a7d to 1c17fa5 Compare May 13, 2024 07:40
@KevinMind KevinMind requested a review from diox May 13, 2024 07:41
@KevinMind KevinMind marked this pull request as draft May 13, 2024 09:11
@KevinMind KevinMind force-pushed the db-volmes branch 3 times, most recently from f0332ce to 1d30887 Compare May 13, 2024 13:58
@KevinMind KevinMind changed the title Load mysql data via persistent data volume Load service data via portable and persistent data volumes May 13, 2024
@KevinMind KevinMind force-pushed the db-volmes branch 3 times, most recently from 93cbeab to f35de6d Compare May 13, 2024 15:41
@KevinMind KevinMind marked this pull request as ready for review May 13, 2024 16:03
@diox
Copy link
Member

diox commented May 14, 2024

I don't think we care that much about existing redis or rabbitmq data FWIW.

Interestingly for #22244 I explicitly wanted rabbitmq not to persist to make my life easier (can't control when we're forcing users to enable all feature flags, forcing to start from scratch works around the issue)

docker-compose.yml Show resolved Hide resolved
docker-compose.yml Show resolved Hide resolved
@KevinMind
Copy link
Contributor Author

Interestingly for #22244 I explicitly wanted rabbitmq not to persist to make my life easier

Regarding this, It's not clear to me if we are on the same understanding of how it currently works but by adding the data to a persistent volume, we are not persisting the state of the container. running make up in this case would create a totally new instnace of the container with whatever parameters you set. Only thing that would theoretically remain is the actual state of whatever is running. If you make up and in between change feature flags or any configurations, all of that gets propogated to the new container. It's just reading from a persistent data store.

One argument I have for persisting these volumes is that it removes the problem (I've discovered) of dangling volumes. When a container (like rabbit, mysql and redis currently) relies on a data volume, if you don't sepcify one it creates an anon volume. These sometimes end up dangling on compose down and just sit there forever, by using named data volumes, we have an explicit understanding which volume relates to which service and the volumes don't dangle because they are easily identifiable.

IDK, this is a trade off, because like you pointed out, if you actually WANT to destroy the data you would have to explicitly do that. FWIW, you can do that with.

docker compose down --volumes

which is now a part of the clean_docker command. If we want this more granularly, I could split the clean_docker to sub commands and you could clean_volumes specifically... idk wdyt?

@KevinMind KevinMind requested a review from diox May 14, 2024 12:52
@KevinMind
Copy link
Contributor Author

@diox and I alligned on a hybrid solution. adding --volumes to docker_compose_down removes anonymous and named volumes.

To persist the mysqld volume specifically, we made it an "external" volume with it's own lifecycle management. This way the mysql data can persist while other named volumes get cleaned up whenever you make down.

Copy link
Member

@diox diox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great 👍

@KevinMind KevinMind merged commit bc18619 into master May 14, 2024
15 checks passed
@KevinMind KevinMind deleted the db-volmes branch May 14, 2024 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants