Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements concurrency pool for Bluebird.maps() #1637

Open
UrielCh opened this issue Jan 30, 2020 · 4 comments
Open

Implements concurrency pool for Bluebird.maps() #1637

UrielCh opened this issue Jan 30, 2020 · 4 comments

Comments

@UrielCh
Copy link

UrielCh commented Jan 30, 2020

Hi,

This is a code pattern I have commonly:

With the current Bluebird

const api = {} as any;// my dataprovider

async function main() {
    // first ids level
    const Ids: number[] = await api.getIds();
    Bluebird.map(Ids, async (id) => {
        const object: { name: string, childIds: number[] } = await api.getObject(id);
        // second ids level
        const subId: number[] = object.childIds;
        Bluebird.map(subId, async (subid) => {
            const subObject = await api.getSubElm(id, subId);
            // third ids level
            // subObject contains some childs that will be iterate with a third Bluebird.map call
        }, { concurrency: 50 })
    }, { concurrency: 50 })
}

this code can start up to 50 x 50 concurrent calls to the API.

with a 4 level loop, the concurrency value can be 20 x 20 x 20 x 20
but allowing 160 000 concurrent can will blow up the nodejs process.

A real life 3 level sample can be found here:

If the ids array size vary a lots from empty / tiny to very large, there is no way to have a constant processing speed.

it's possible to refactor this code to improve linear performances, by preloading each level of API call, but that force me to pre-load each API layer in memory.

in java I can use Executors.newFixedThreadPool(max_threads); to fix a number of parallels threads.

With some new features

something like:

const pool = Bluebird.pool({name: pool, concurrency: 250});
...
 Bluebird.map(Ids, async (id) => {
...
 Bluebird.map(subIds, async (subId) => {
  ...
  }, { concurrency: pool })
...
  }, { concurrency: pool })
....

to limit the number of concurrent task to 250.

or

const poolApi1 = Bluebird.pool({name: pool, concurrency: 10});
const poolApi2 = Bluebird.pool({name: pool, concurrency: 240});
...
 Bluebird.map(Ids, async (id) => {
...
 Bluebird.map(subIds, async (subId) => {
  ...
  }, { concurrency: poolApi2 })
...
  }, { concurrency: poolApi1 })
....

In this case:

  • no more that 10 parallels call are done to the first API.
  • no more that 240 parallels call are done to the second API.
  • no more that 250 parallels call are done to all APIs.
  • the first and the second API are calling together.
@steveetm
Copy link

steveetm commented Feb 27, 2020

Regarding your first example, what should happen when the outer map eats up all available resources from the pool and the inner map gets an empty pool?

@UrielCh
Copy link
Author

UrielCh commented Mar 4, 2020

The outer map won't close until all of his child get resolved..

In the worst case the outer loop stay running because a single task is not completer,
the same think append in the second map.
and the 3th map is active with all his concurrent Promise.

@UrielCh UrielCh closed this as completed Mar 4, 2020
@UrielCh UrielCh reopened this Mar 4, 2020
@spion
Copy link
Collaborator

spion commented Mar 6, 2020

Why not simply use https://github.com/ForbesLindesay/throat instead?

@UrielCh
Copy link
Author

UrielCh commented Mar 11, 2020

it should work...
I will try it soon.thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants