Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use asynchronous childProcess commands in Network tests - Closes #2334 #2382

Merged
merged 10 commits into from Sep 14, 2018

Conversation

jondubois
Copy link
Contributor

What was the problem?

  • Functions used by network tests to spawn nodes were synchronous.
  • Timeouts were used to estimate when nodes would be ready (before running test cases).

How did I fix it?

  • Made test cases asynchronous.
  • Wait for nodes to be ready before executing tests.

How to test it?

grunt mocha:default:network

Review checklist

@jondubois jondubois self-assigned this Sep 7, 2018
@diego-G diego-G requested review from ManuGowda, nazarhussain and 4miners and removed request for MaciejBaj and diego-G September 7, 2018 08:43
@@ -17,6 +17,9 @@
const childProcess = require('child_process');
const utils = require('../utils');

const NODE_READY_REGEX = /Finished sync/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to check loading status by log message then better would be Blockchain ready

https://github.com/LiskHQ/lisk/blob/ed88a37008d92b82260d86db956ec543c5c71eca/modules/loader.js#L430

Copy link
Contributor Author

@jondubois jondubois Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I only wait for 'Blockchain ready', it breaks the test since it doesn't give enough time for all the nodes to sync up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of regular expression you are using is NODE_READY_REGEX https://github.com/LiskHQ/lisk/blob/ed88a37008d92b82260d86db956ec543c5c71eca/test/network/scenarios/common.js#L20

Which clearly is not the case, you are not waiting for the node to be ready, you are waiting for node to be synced at least once. If that's the case please use proper naming for the variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not a big fan of grepping the log message to verify if the node is ready, the reliable way is to either subscribe to the blockchainReady event which clearly indicated the blockchain is loaded or using the API endpoint to check if the syncing is true or false and make the decision based on the boolean result.

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which clearly is not the case

It's not so black and white. 'Node ready' can mean a lot of things depending on whether we're looking at it from the perspective of the node or from the perspective of the test.

I'll rename to NODE_FINISHED_SYNC_REGEX to avoid any confusion.

pm2LogProcess.stdout.removeAllListeners('data');
resolve();
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better practice to handle this use case would be to use pm2 builtin feature for it. PM2 have a CLI option –wait-ready and then from the process you can send an event process.send('ready') to notify PM2 that app is ready.

This would be the best way to in the core and then either PM2 or some other process manager can also use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about this feature. I'll give it a try.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feature would be perfect but I can't get it to work for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can sit together to check it out.

})
.catch(err => {
done(err.message);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code will have similar impact.

before(done => {
   common.stopNode('node_1').then(done).catch(done);
});

or simply returning the promise will also work here.

before(() => common.stopNode('node_1'))

Copy link
Contributor Author

@jondubois jondubois Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested approach looks cleaner but it's more brittle; if the stopNode() function changes in the future (e.g. returns a value from the promise), it could break stuff in an unexpected way. I think it's better to be explicit about what is passed to the done function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If stopNode change in the future we should should its usage. So why not use the cleaner approach which fits in current circumstances.

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function should be a blackbox as much as possible. The stopNode function shouldn't have to worry about how its output is being used by dependent functions (done in this case). That's why the extra fat-arrow functions are important in this case. It's also more readable because looking at the test you can see exactly what parameters go into the done function without having to dig into the implementation details of the stopNode function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing to do with internal functionality of the function.

Yes function should be blackbox but also follow the documented interface. If its documented that the function will return a Promise then it should always return promise. And all relevant code should be written in that perspective that it is returning the Promise.

And once function return type is change, its reference usage must be updated accordingly.

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do it since most people seem to disagree with me :'(

})
.catch(err => {
done(err.message);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

})
.catch(err => {
done(err.message);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much compact option.

it('stop all the nodes in the network except node_0', () => {
    return Promise.map(new Array(TOTAL_PEERS), (value, index) => {
        return common.stopNode(`node_${index}`); 
    });
});

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't work. There is no node_10 - We need to loop from node_0 to node_9.
Also, a for loop is easier to read in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the +1 from the index.

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then it stops node_0 which we want to keep running.

})
.catch(err => {
done(err.message);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -17,6 +17,9 @@
const childProcess = require('child_process');
const utils = require('../utils');

const NODE_READY_REGEX = /Finished sync/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not a big fan of grepping the log message to verify if the node is ready, the reliable way is to either subscribe to the blockchainReady event which clearly indicated the blockchain is loaded or using the API endpoint to check if the syncing is true or false and make the decision based on the boolean result.

@@ -17,6 +17,9 @@
const childProcess = require('child_process');
const utils = require('../utils');

const NODE_READY_REGEX = /Finished sync/;
const NODE_READY_TIMEOUT = 20000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the disadvantage of having the timeout as a way to check if the application starts within this time frame is that it will be indeterministic if the performance of the application degrades then the tests will fail and if the performance improves then still it has to wait 20 seconds, as I said before if we subscribe to blockchainReady event then regardless of the time the tests will be consistent.

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout specifies the upper bound; there needs to be a timeout in case the node is taking way too long to launch (due to an unforeseen issue with the node itself); then it should fail the test with an error. It should be a large value though. I can make it higher. 20s is a bit short.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it 40 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's not the best but we're running the node with PM2 so we can't listen to the node's events directly. I investigated using the HTTP API to check the sync status of the node but that would add a lot of complexity because of how we're running the nodes, it's difficult to associate them with a URL.

@jondubois jondubois force-pushed the 2334-network_tests_async branch 2 times, most recently from 5fc911f to 55fc17d Compare September 10, 2018 14:17
@diego-G diego-G requested review from nazarhussain, SargeKhan and ManuGowda and removed request for 4miners September 10, 2018 15:08
let pm2LogProcess = childProcess.spawn('pm2', ['logs', nodeName]);
pm2LogProcess.once('error', err => {
clearTimeout(nodeReadyTimeout);
pm2LogProcess.stdout.removeAllListeners('data');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not remove error listener here as well?

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses the once('error', ...) handler so if the handler executes once then it gets removed implicitly. I can change it to on('error', ...) and then explicitly remove the 'error' event handler if you prefer.

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, based on your earlier feedback, I moved up the let pm2LogProcess = ... declaration since that's more clear. Also I made it const.

@@ -17,6 +17,9 @@
const childProcess = require('child_process');
const utils = require('../utils');

const NODE_FINISHED_SYNC_REGEX = /Finished sync/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using Finished sync as a regular expression?

Copy link
Contributor Author

@jondubois jondubois Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like doing regex.test(...) instead of string.indexOf(...) !== ... it really doesn't matter for me though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment in the code.

@jondubois jondubois dismissed stale reviews from ManuGowda and nazarhussain September 11, 2018 09:03

Addressed key issues and justified the ones which couldn't be resolved.

Copy link
Contributor

@nazarhussain nazarhussain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving the PR as its hanging for long. But I am not convinced with the approach to wait for syncing as part of the node startup.

That particular part should be done explicitly only in the cases where needed. With detail comments that why we need that in that particular test case.

common
.stopNode('node_1')
.then(done)
.catch(done);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if pm2 kill is being executed after the tests failed.

Copy link
Contributor Author

@jondubois jondubois Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were running the network tests with the mocha --bail flag which meant that our after() hook was skipped (when an error was thrown inside our before() hook) and so the test suite would not cleanup after itself.

This is an open issue with mocha: mochajs/mocha#3398

Removing the --bail flag fixed the issue.

@jondubois
Copy link
Contributor Author

jondubois commented Sep 13, 2018

The purpose of the --bail flag in mocha is to terminate the test suite as soon as any one of the test cases fails; this feature saves time and it would have been nice to keep it but none of the alternative solutions I investigated could be used.

  • Using process.on('exit', cleanupPm2Processes) doesn't work because we need to launch the pm2 kill command in a separate process; by the time the 'exit' event triggers, the main process is already shutting down and so this doesn't give our process enough time to execute the pm2 kill command as a child process (in Node.js, child processes are killed with their parent); this means that the cleanup doesn't happen and the PM2 nodes are left running.
  • Using process.on('SIGTERM', cleanupPm2Processes) also doesn't work; same reason as above.

The best way to detect if the tests have finished is to use Mocha's after hook but currently this is not possible if we use the --bail flag. Until the issue is fixed in Mocha, I recommend to not use the --bail flag for the network tests.

@MaciejBaj MaciejBaj merged commit d9a9ba9 into development Sep 14, 2018
@MaciejBaj MaciejBaj deleted the 2334-network_tests_async branch September 14, 2018 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use asynchronous childProcess commands in Network tests
7 participants