From f41d672514c2a0cb50d5fa1d5e11072dcbb4566c Mon Sep 17 00:00:00 2001 From: Henrik Carlqvist Date: Mon, 25 Jul 2022 15:21:11 +0700 Subject: system/splitjob: Added (Tool to split up data). Signed-off-by: Willy Sudiarto Raharjo --- system/splitjob/README | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 system/splitjob/README (limited to 'system/splitjob/README') diff --git a/system/splitjob/README b/system/splitjob/README new file mode 100644 index 0000000000000..985797496be8a --- /dev/null +++ b/system/splitjob/README @@ -0,0 +1,28 @@ +This program is used to split up data from stdin in blocks which are +sent as input to parallel invocations of commands. The output from +those are then concatenated in the right order and sent to stdout. + +Splitting up and parallelizing jobs like this might be useful to speed +up compression using multiple CPU cores or even multiple computers. + +For this approach to be useful, the compressed format needs to allow +multiple compressed files to be concatenated. This is the case for +gzip, bzip2, lzip and xz. + +Example 1, use multiple logical cores: +splitjob -j 4 bzip2 < bigfile > bigfile.bz2 + +Example 2, use remote machines: +splitjob "ssh host1 gzip" "ssh host2 gzip" < f > f.gz + +The above example assumes that ssh is configured to allow logins +without asking for password. See the manpage for ssh-keygen or do +a google search for examples on how to accomplish this. + +Example 3, Use bigger blocks to reduce overhead: +splitjob -j 2 -b 10M gzip < file > file.gz + +For "xz -9" a block size of 384 MB gives best compression. + +Example 4, parallel decompression: +splitjob -X -r 10 -j 10 -b 384M "xz -d -" < file.xz > file -- cgit v1.2.3