The module performs great in single-worker mode. I had 2Gpbs (max bandwidth for the card) in single worker mode with Xeon core still having much idle time. However multi-worker mode is an amazing challenge towards giant traffic if fast network card is available.
The first step is already done. The new branch 'auto-relay' implements automatic stream pushing to all workers from the one accepting the stream. Per-worker unix sockets are used for that. Much work is still to be done to finish this feature. However it's already functional with hardcoded socket names and empty stream arguments.