I was inspired by Mourner's data load test, (where I took the code for the boxes) which is all well and good if we want to do all the calculations in one worker, but what if we want to divide it up amongst several workers for processing?
Easy you say, you'll open some more workers inside your worker, but alas chrome hasn't gotten around to to implementing nested worker, you're probably going to have to transfer it back to the main thread first.
Before we go any farther, we're going to be working with a massive 28MiB(12MiB gziped) text file, I'd hit the button to download dictionary now. The file in question is Webster's Unabridged English Dictionary originally from project gutenberg, but I grabbed it from this repo
The problem is that transferring between threads is a blocking operation, once the dictionary is done loading you'll see a button marked "Transfer with structured clone" hit it while looking at the boxes, you'll notice them jump. Here is where transferable objects come in, for certain JavaScript data types (array buffers) we can transfer ownership, this is slightly different from sending messages as the object is no longer available in the original context, to see this in action click it the button marked "Transfer with transferable objects" and again look at the boxes, no jump for me at least.
We've actually done some preprocessing on the object so were actually transferring ownership of about 5000 separate typed arrays which slows things down, if you're curious you can , for me the prepossessing increased the time it took to transfer by about 42%.
The down side of transferable objects is one I hinted at earlier, you can only use it with array buffers(and message ports but I'm not sure that's implemented/what they are). Array buffers are the basis of typed arrays, buffers are abstract containers for bytes and typed arrays are representations with various byte lengths it is a non trivial task to convert a typed array back to text. With the array buffer version we turned the text into unsigned 16 bit integers, I'd give you a demo of trying to convert it back in one go but it fills up the heap and crashes your browser.
So instead we cut the text up into about 5000 pieces, to make it even we cut both the text and the array buffer version into arrays with the same number of pieces, then we fire up some more workers and start handing out those pieces to them for a mapreduce of letter frequencies. We send a piece to one of the map workers, it computes an object with with the numbers for each letter which it sends back, we send that to the reducer function and send the map function a new piece. Once we are out of data and we shut down the map workers and ask the reducer for its data.
Results, updated 2. After a massive optimization effort which involved going through and replacing all of the map and forEach loops with while loops, holding off converting back from CharCode to until we are all the way reduced, clears up the results dramatically, transferable objects is dramatically faster to the point that it takes approximately the same amount of time to transfer the structured clone as it does to parse the transferable object.This is on my desktop, on my nexus 10 tablet I get the opposite results with single threaded fastest and transferable objects slowest.
I've noticed that workers are extremely poorly documented on MDN (update not the case anymore ... your welcome) and HTML5 rocks has a few omissions, namely, worker.postMessage is unprefixed for transferable objects (i.e. not worker.webkitPostMessage), it also works fine in FireFox, blob builder is no longer with us, you can just use Blob, the Chromium still does not allow subworkers, and IE10 which in theory should work fine has security restriction if you try to open a worker from a blob url, lastly it is a okay to send circular references to a worker, they are stringified using structured clone, not JSON.stringify.
In case you're curious all the way on the right is a version that just does it single threaded in the worker, update with a better written function doesn't take a huge amount of time, still slower though.
This has has been built with my catiline library.