Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance of including relationships #1800

Merged
merged 2 commits into from Apr 12, 2018

Conversation

alechirsch
Copy link
Contributor

Including a relationship causes very slow performance. I first noticed this when including a relationship that had a .through, It was sometimes taking over 2 seconds to return from bookshelf. I did some digging and found that one of the queries that was supposed to return only 2 results, returned over 4k results. All of the 4k results were duplicate rows. Here is an example of a query that was generated for the following relationship:

// relationship
	supplier(){
		return this.belongsTo(Models.Supplier).through(Models.ClientSupplier, 'client_supplier_id');
	},

// query
select "suppliers".*, "clients_suppliers"."id" as "_pivot_id", "clients_suppliers"."supplier_id" as "_pivot_supplier_id"
from "suppliers" inner join "clients_suppliers" on "clients_suppliers"."supplier_id" = "suppliers"."id"
inner join "invoices" on "clients_suppliers"."id" = "invoices"."clients_supplier_id"
where "invoices"."client_supplier_id" in (17, 18)

Since my invoices table had thousands of rows, this query returns many duplicates of the same supplier.

Making the request use the distinct keyword ensures that the result only includes the desired results without duplicates.

@alechirsch alechirsch changed the title fixed issue where relation query returns thousands of duplicate rows Fix performance of including relationships Mar 26, 2018
Copy link
Member

@ricardograca ricardograca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! However the tests must pass before this gets merged. Can you take a look at it? It's probably just a matter of changing the expected query.

@ricardograca ricardograca added this to To Do in Version 0.14.0 via automation Mar 27, 2018
@ricardograca ricardograca moved this from To Do to In progress in Version 0.14.0 Mar 27, 2018
@alechirsch
Copy link
Contributor Author

alechirsch commented Mar 27, 2018

@ricardograca Will do, I just got my environment all set up for testing

@alechirsch
Copy link
Contributor Author

@ricardograca All of the tests pass on my local machine, have you ever run into issues where tests differ between machines?

@alechirsch
Copy link
Contributor Author

screen shot 2018-03-27 at 9 43 31 am

@ricardograca
Copy link
Member

ricardograca commented Mar 27, 2018

That's weird. Looking at the failed test it seems like the two expected objects in the response array have their order swapped, even though they are correct individually. I've never seen this particular error before. Can you check if there could be some kind of race condition here? It's important that any async tests always return promises or call done() when done.

Also, what Node version are you using?

@alechirsch
Copy link
Contributor Author

I am using Node 8.9.2

@ricardograca
Copy link
Member

Looking at the test more closely it's a bit weak since there shouldn't be any requirement (or guarantee) that the columns are retrieved in any particular order. I'll ignore that failed test, but I still need to review this more carefully to make sure no new bugs or breaking changes are introduced and I don't have time for that right now.

@alechirsch
Copy link
Contributor Author

alechirsch commented Mar 27, 2018

It is very strange, the test that is failing here is passing consistently on my end.

Copy link
Member

@ricardograca ricardograca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm merging this even with the tests failing because they only fail for PostgreSQL and only because the order of the expected results isn't as expected, but there's also no ORDER BY clause on the queries, so some randomness isn't too surprising.

The issue of failing tests will be fixed with a later PR.

@ricardograca ricardograca merged commit 880f3bf into bookshelf:master Apr 12, 2018
Version 0.14.0 automation moved this from In progress to Done Apr 12, 2018
@jwhitmarsh
Copy link

Apologies if this is not the place to put this - thank you for this great fix, but one (rare) GOTCHA is that case statements in an order statement stop working now that all relationships are called with the distinct keyword. It's taken me a while to work out what was causing my errors so just wanted to put this out there.

My solution was to create a "computed" column using case in the select and to apply the order to that. Again, just in case someone in the future is experiencing the same thing.

select * 
from table 
order by 
    case when col1 is not null then col1 else col2 end

becomes

select *, 
case when col1 is not null then col1 else col2 end as order_col 
from table
order by order_col

(obviously all achieved via knex, not raw sql)

@ricardograca
Copy link
Member

@jwhitmarsh Can you share your actual code solution to the issue you describe?

@jwhitmarsh
Copy link

No probs. In the instance that I'm fixing it it's a fairly complicated query, but this is the key bit:

query.column([
    'sections.*', 
    this.database.knex.raw(`
        case when section_order > 0 then section_order else section_order_draft end as order_col
    `)
]);

query.orderByRaw(`order_col`);

for reference, this used to look like:

// no need to define query.columns()

query.orderByRaw( 
    `case when section_order > 0 then section_order end, case when section_order < 0 then section_order_draft end`
)

The "fixed" way is actually much neater and, like I said, I only wanted to post this in case someone stumbled across it in the future (relevant XKCD https://xkcd.com/979/)

@juliusza
Copy link

juliusza commented Feb 18, 2019

The extra distinct table.* causes problems with JSON fields.

"could not identify an equality operator for type json"

See #1941

@juliusza
Copy link

@ricardogama do you think this change is adequate?

It seems to me that this particular query can be optimized with .query(). Or add DISTINCT only to queries generated by .through().
Surely there is no reason to add DISTINCT to every single select query ?

See also #1941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants