New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme memory-intensive SHA256 calculation #3186
Comments
After investigating it a little further, the reason behind this huge memory allocation can be found in the following lines (from At first I suspected a missing pre-allocation to be the reason, but even the final array size is more than my system could handle. For example, the size of my video buffer was 229878635. Creating just this array in a fresh NodeJS console: Array.from({ length: 229878635 }, () => 0) already resulted in a
for me. |
Thanks for finding this. The reason for not using "crypto" is that we do not have crypto available in the browser build.
In the browser build, it is replaced by a fallback that throws an error when used. Nevertheless, we need the hashes e.g. for the REPL, and I know there are other uses of Rollup in the browser. Also I vaguely remember that crypto performance may not have been the best, but this would require benchmarking. But maybe we should shop for another library. https://github.com/asmcrypto/asmcrypto.js seems to be very fast and from anecdotal evidence should be capable of handling files in that order of magnitude, but we should check. |
So in my experiment, it was able to Hash a 1GB file without me having to increase memory, which could be a good sign. |
Thanks for the very fast reply and your time to look into this issue. Awesome 😃
I will try a few larger files (tested it with my 250 MB video just now) to be sure about this and could provide a pull request with the necessary changes to use |
Ah, I did not consider the Web Crypto API. This could also be a nice solution for the browser build. Would be nice to know how they compare performance-wise. If |
Could the crypto lib not be swapped out for the browser build? It's what I do for the repl on m-css.com and it works pretty well in practice. |
Yes exactly. We could swap crypto for WebCrypto just as we swap fs and path. The question is just how they stack up. |
Memory-wise, I didn't see much of a difference between asmcrypto.js100 MB: took 816.19 ms crypto100 MB: took 249.60 ms Web Crypto API100 MB: took 314.15 ms A mixed I can provide you a pull request tomorrow. if I saw it correctly, it should contain the following:
|
Wow, these numbers look impressive, and we have a clear winner, thanks so much! Also your TODO list looks spot on. Only point: You do not need to remove hash.js from the license file, it should automatically correct itself when you do a full rebuild of Rollup without the dependency. |
While finishing the Web Crypto API based I would therefore suggest to use My Web Crypto API based wrapper looked like this: function createHash (algorithm) {
return new Hasher(algorithm);
}
class Hasher {
constructor(algorithm) {
switch(algorithm) {
case 'sha256':
this.algorithm = 'SHA-256';
break;
default:
throw new Error(`Unsupported algorithm: '${algorithm}'.`);
}
this.buffer = new Uint8Array(0);
}
update(data) {
let dataBuffer = (typeof data === 'string' ? new TextEncoder('utf-8').encode(data) : data);
const appendedBuffer = new Uint8Array(this.buffer.length + dataBuffer.length);
appendedBuffer.set(this.buffer, 0)
appendedBuffer.set(dataBuffer, this.buffer.length)
this.buffer = appendedBuffer;
return this;
}
async digest(encoding) {
let hash = await crypto.subtle.digest(this.algorithm, this.buffer);
switch(encoding) {
case 'hex':
hash = Array.from(new Uint8Array(hash))
.map(x => x.toString(16).padStart(2, '0'))
.join('');
break;
default:
throw new Error(`Unsupported encoding: '${encoding}'.`);
}
return hash;
}
} My import * as asmCrypto from 'asmcrypto.js';
function createHash(algorithm: string) {
return new Hasher(algorithm);
}
class Hasher {
hasher: asmCrypto.Sha256;
constructor(algorithm: string) {
switch (algorithm) {
case 'sha256':
this.hasher = new asmCrypto.Sha256();
break;
default:
throw new Error(`Unsupported algorithm: '${algorithm}'.`);
}
}
digest(encoding: string) {
this.hasher.finish();
// The type of '.result' is always 'Uint8Array' after calling 'finish()'. Before that, it could also be 'null'. Casting
// is necessary to avoid type errors when encoding the hash.
const hash = this.hasher.result as Uint8Array;
let encoded: string;
switch (encoding) {
case 'hex':
encoded = asmCrypto.bytes_to_hex(hash);
break;
default:
throw new Error(`Unsupported encoding: '${encoding}'.`);
}
return encoded;
}
update(data: string | Buffer) {
if (typeof data === 'string') {
this.hasher.process(new Uint8Array(data.length).map((_, i) => data.charCodeAt(i)));
} else {
this.hasher.process(data);
}
return this;
}
}
export { createHash }; |
Fixed by #3194 |
How Do We Reproduce?
Emit a very huge asset (250mb on my system -- a mp4 file for my use-case) without a final file name (adding a
name
option, instead offileName
)Expected Behavior
Everything builds fine.
Actual Behavior
A few GB of memory are allocated (even increasing the NodeJS limit to 8 GB was not enough), until it crashes with an
error.
Proposed solution
The reason for this seems to be located in the
generateAssetFileName
, especiallyrollup/src/utils/FileEmitter.ts
Lines 36 to 40 in 06fb16a
The
hash.js/lib/hash/sha/256
library used to generate the hash seems to be very memory-intensive.Using
crypto
instead (I assume NodeJS modules are fine, asfs
is also used) resolvs this problem perfectly, without any notable memory increase:The text was updated successfully, but these errors were encountered: