./run_cli $ python -m tosagent.main -t 767d833d-4848-47f9-996b-4c0773077648767d833d-4848-47f9-996b-4c0773077648 is done: ('2021-03-15T08:44:06 | You have started this Job with 20000 Patients and 50000 rows. Will it fail? false\n2021-03-15T08:53:41 | This is an intermediate message. I have pseudonymised all 20000 patients.\n2021-03-15T09:20:04 | This is an intermediate message. I have pseudonymised all 50000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-15T09:20:05 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 18304, 'Pseudonymized Visits': 50000, 'Processed Rows': 50000})
2021-03-15T08:44:06 until 2021-03-15T09:20:05 -> 35 minutes
./run_cli $ python -m tosagent.main -t 218c5075-8cb5-4b71-abd6-afb1fc4f4455218c5075-8cb5-4b71-abd6-afb1fc4f4455 is done: ('2021-03-15T08:44:25 | You have started this Job with 20000 Patients and 50000 rows. Will it fail? false\n2021-03-15T08:54:08 | This is an intermediate message. I have pseudonymised all 20000 patients.\n2021-03-15T09:20:19 | This is an intermediate message. I have pseudonymised all 50000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-15T09:20:20 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 18316, 'Pseudonymized Visits': 50000, 'Processed Rows': 50000})
2021-03-15T08:44:25 until 2021-03-15T09:20:20 -> 35 minutes
./run_cli.sh $ python -m tosagent.main -t 9088f9aa-91fd-43cf-b859-1d1ce652b0729088f9aa-91fd-43cf-b859-1d1ce652b072 is done: ('2021-03-15T12:08:22 | You have started this Job with 20000 Patients and 50000 rows. Will it fail? false\n2021-03-15T12:14:53 | This is an intermediate message. I have pseudonymised all 20000 patients.\n2021-03-15T12:32:21 | This is an intermediate message. I have pseudonymised all 50000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-15T12:32:22 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 18347, 'Pseudonymized Visits': 50000, 'Processed Rows': 50000})
2021-03-15T12:08:22 until 2021-03-15T12:32:22 -> 24 minutes
That was only one container though. I deem that not really significant for now.
After pretty much a days worth of testing around, I could not find any parameters that would improve the performance in a measurable way. Another way to try more things would be to attach a visualvm or jconsole to a container to find out more.
Created task 2e4c0331-899f-4df5-9e39-aa089a2701eaCreated task 2e71ace0-9f03-4aa3-a78f-de30c61095afCreated task 46d4849c-dc84-47b8-a256-c5071b42d8e9Created task 8847636b-9c6a-4fe6-a7da-15a7d6395666
While I copied these commands, all Jobs finished:
./run_cli.sh $ python -m tosagent.main -t 2e4c0331-899f-4df5-9e39-aa089a2701ea2e4c0331-899f-4df5-9e39-aa089a2701ea is done: ('2021-03-16T03:20:28 | You have started this Job with 20000 Patients and 50000 rows. Will it fail? false\n2021-03-16T03:20:35 | This is an intermediate message. I have pseudonymised all 20000 patients.\n2021-03-16T03:20:36 | This is an intermediate message. I have pseudonymised all 50000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T03:20:38 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 18346, 'Pseudonymized Visits': 50000, 'Processed Rows': 50000})./run_cli.sh $ python -m tosagent.main -t 2e71ace0-9f03-4aa3-a78f-de30c61095af2e71ace0-9f03-4aa3-a78f-de30c61095af is done: ('2021-03-16T03:20:28 | You have started this Job with 20000 Patients and 50000 rows. Will it fail? false\n2021-03-16T03:20:36 | This is an intermediate message. I have pseudonymised all 20000 patients.\n2021-03-16T03:20:37 | This is an intermediate message. I have pseudonymised all 50000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T03:20:38 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 18328, 'Pseudonymized Visits': 50000, 'Processed Rows': 50000})./run_cli.sh $ python -m tosagent.main -t 46d4849c-dc84-47b8-a256-c5071b42d8e946d4849c-dc84-47b8-a256-c5071b42d8e9 is done: ('2021-03-16T03:20:28 | You have started this Job with 20000 Patients and 50000 rows. Will it fail? false\n2021-03-16T03:20:36 | This is an intermediate message. I have pseudonymised all 20000 patients.\n2021-03-16T03:20:37 | This is an intermediate message. I have pseudonymised all 50000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T03:20:38 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 18323, 'Pseudonymized Visits': 50000, 'Processed Rows': 50000})./run_cli.sh $ python -m tosagent.main -t 8847636b-9c6a-4fe6-a7da-15a7d63956668847636b-9c6a-4fe6-a7da-15a7d6395666 is done: ('2021-03-16T03:20:56 | You have started this Job with 20000 Patients and 50000 rows. Will it fail? false\n2021-03-16T03:21:00 | This is an intermediate message. I have pseudonymised all 20000 patients.\n2021-03-16T03:21:01 | This is an intermediate message. I have pseudonymised all 50000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T03:21:01 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 18387, 'Pseudonymized Visits': 50000, 'Processed Rows': 50000})
All Jobs concluded successfully. Again, I am multiplying the input context by 10 to check what the Containers do. I will add four of these and eight reduced ones:
80a80a85-eae8-46cc-a033-90b6d7e0345d is done: ('2021-03-16T03:30:54 | You have started this Job with 200000 Patients and 500000 rows. Will it fail? false\n2021-03-16T03:32:25 | This is an intermediate message. I have pseudonymised all 200000 patients.\n2021-03-16T03:32:40 | This is an intermediate message. I have pseudonymised all 500000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T03:32:58 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 183638, 'Pseudonymized Visits': 500000, 'Processed Rows': 500000})f845cb11-f7c2-4131-8e1a-264166ce675f is done: ('2021-03-16T03:30:55 | You have started this Job with 200000 Patients and 500000 rows. Will it fail? false\n2021-03-16T03:32:26 | This is an intermediate message. I have pseudonymised all 200000 patients.\n2021-03-16T03:32:41 | This is an intermediate message. I have pseudonymised all 500000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T03:33:02 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 183741, 'Pseudonymized Visits': 500000, 'Processed Rows': 500000})57467bd3-119c-4cc1-ad06-f4408f4968ec is done: ('2021-03-16T03:33:21 | You have started this Job with 200000 Patients and 500000 rows. Will it fail? false\n2021-03-16T03:34:51 | This is an intermediate message. I have pseudonymised all 200000 patients.\n2021-03-16T03:35:05 | This is an intermediate message. I have pseudonymised all 500000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T03:35:23 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 183585, 'Pseudonymized Visits': 500000, 'Processed Rows': 500000})bba9fea5-2098-4afe-9070-192e5111d346 is done: ('2021-03-16T03:33:21 | You have started this Job with 200000 Patients and 500000 rows. Will it fail? false\n2021-03-16T03:34:50 | This is an intermediate message. I have pseudonymised all 200000 patients.\n2021-03-16T03:35:04 | This is an intermediate message. I have pseudonymised all 500000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T03:35:26 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 183455, 'Pseudonymized Visits': 500000, 'Processed Rows': 500000})
The rest of the tasks are still not ready. The workers seemed to have reduced the number of active tasks, as only two Containers are running right now. Still, no tasks failed so far, which I think is a good sign. I will check on the tasks again tomorrow.
42995183-891a-4307-830f-326cbb097a5c is done: ('2021-03-16T08:31:18 | You have started this Job with 200000 Patients and 500000 rows. Will it fail? false\n2021-03-16T08:32:38 | This is an intermediate message. I have pseudonymised all 200000 patients.\n2021-03-16T08:32:53 | This is an intermediate message. I have pseudonymised all 500000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T08:33:16 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 183523, 'Pseudonymized Visits': 500000, 'Processed Rows': 500000})363f811e-22f5-4b79-b0eb-a3042642f9be is done: ('2021-03-16T07:31:18 | You have started this Job with 200000 Patients and 500000 rows. Will it fail? false\n2021-03-16T07:32:40 | This is an intermediate message. I have pseudonymised all 200000 patients.\n2021-03-16T07:32:54 | This is an intermediate message. I have pseudonymised all 500000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T07:33:17 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 183819, 'Pseudonymized Visits': 500000, 'Processed Rows': 500000})2f92f05b-bd33-46a5-b522-c223e558d299 is done: ('2021-03-16T04:31:13 | You have started this Job with 200000 Patients and 500000 rows. Will it fail? false\n2021-03-16T04:32:15 | This is an intermediate message. I have pseudonymised all 200000 patients.\n2021-03-16T04:32:25 | This is an intermediate message. I have pseudonymised all 500000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T04:32:43 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 183415, 'Pseudonymized Visits': 500000, 'Processed Rows': 500000})49f32828-e08b-44a1-bff1-7d783bfdf01f is done: ('2021-03-16T06:31:17 | You have started this Job with 200000 Patients and 500000 rows. Will it fail? false\n2021-03-16T06:32:39 | This is an intermediate message. I have pseudonymised all 200000 patients.\n2021-03-16T06:32:53 | This is an intermediate message. I have pseudonymised all 500000 rows.\nSLF4J: Class path contains multiple SLF4J bindings.\nSLF4J: Found binding in [jar:file:/job/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: Found binding in [jar:file:/job/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]\nSLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.\nSLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]\n2021-03-16T06:33:16 | I am done. Thank you for travelling with TOS.\n', {'Pseudonymized Patients': 183552, 'Pseudonymized Visits': 500000, 'Processed Rows': 500000})
The four non-huge Jobs did succeed.
Traceback (most recent call last): File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_mainreturn _run_code(code, main_globals, None, File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_codeexec(code, run_globals) File "/var/app/tosagent/main.py", line 60, in <module> main() File "/var/app/tosagent/main.py", line 32, in main print(f"{args.task} is done: {result.get(0)}") File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 219, in get self.maybe_throw(callback=callback) File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 335, in maybe_throw self.throw(value, self._to_remote_traceback(tb)) File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 328, in throw self.on_ready.throw(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/vine/promises.py", line 234, in throw reraise(type(exc), exc, tb) File "/usr/local/lib/python3.9/site-packages/vine/utils.py", line 30, in reraise raise valueException: <class 'docker.errors.ContainerError'>(('Command \'[\'/job/ResourceHungry/ResourceHungry_run.sh\', \'--context_param=patient_n=2000000\', \'--context_param=row_n=5000000\', \'--context_param=error=False\', \'--context_param=output_file=/output/something.csv\', \'--context_param=aw_message_filepath=/output/rgsxypyaeodi.json\']\'in image \'openjdk:8\' returned non-zero exit status 1: b\'Exception in thread "main" java.lang.OutOfMemoryError: Java heap space\\n\\tat java.lang.Class.getDeclaredConstructors0(Native Method)\\n\\tat java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)\\n\\tat java.lang.Class.getConstructor0(Class.java:3075)\\n\\tat java.lang.Class.getConstructor(Class.java:1825)\\n\\tat java.security.Provider$Service.newInstance(Provider.java:1594)\\n\\tat sun.security.jca.GetInstance.getInstance(GetInstance.java:236)\\n\\tat sun.security.jca.GetInstance.getInstance(GetInstance.java:164)\\n\\tat java.security.SecureRandom.getInstance(SecureRandom.java:288)\\n\\tat java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:205)\\n\\tat java.security.SecureRandom.<init>(SecureRandom.java:162)\\n\\tat routines.TalendString.getAsciiRandomString(TalendString.java:89)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry$1tRowGenerator_3Randomizer.getRandomother_psn(ResourceHungry.java:22701)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tRowGenerator_3Process(ResourceHungry.java:22710)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tFileInputDelimited_2Process(ResourceHungry.java:12207)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tFixedFlowInput_3Process(ResourceHungry.java:9394)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tFileInputDelimited_1Process(ResourceHungry.java:8631)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tRowGenerator_1Process(ResourceHungry.java:4299)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tFixedFlowInput_2Process(ResourceHungry.java:2512)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.runJobInTOS(ResourceHungry.java:24969)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.main(ResourceHungry.java:24739)\\n\'',))
Traceback (most recent call last): File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_mainreturn _run_code(code, main_globals, None, File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_codeexec(code, run_globals) File "/var/app/tosagent/main.py", line 60, in <module> main() File "/var/app/tosagent/main.py", line 32, in main print(f"{args.task} is done: {result.get(0)}") File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 219, in get self.maybe_throw(callback=callback) File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 335, in maybe_throw self.throw(value, self._to_remote_traceback(tb)) File "/usr/local/lib/python3.9/site-packages/celery/result.py", line 328, in throw self.on_ready.throw(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/vine/promises.py", line 234, in throw reraise(type(exc), exc, tb) File "/usr/local/lib/python3.9/site-packages/vine/utils.py", line 30, in reraise raise valueException: <class 'docker.errors.ContainerError'>(('Command \'[\'/job/ResourceHungry/ResourceHungry_run.sh\', \'--context_param=patient_n=2000000\', \'--context_param=row_n=5000000\', \'--context_param=error=False\', \'--context_param=output_file=/output/something.csv\', \'--context_param=aw_message_filepath=/output/eitlxwudanxj.json\']\'in image \'openjdk:8\' returned non-zero exit status 1: b\'Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded\\n\\tat java.util.Random.<init>(Random.java:140)\\n\\tat java.security.SecureRandom.<init>(SecureRandom.java:239)\\n\\tat java.security.SecureRandom.getInstance(SecureRandom.java:290)\\n\\tat java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:205)\\n\\tat java.security.SecureRandom.<init>(SecureRandom.java:162)\\n\\tat routines.TalendString.getAsciiRandomString(TalendString.java:89)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry$1tRowGenerator_3Randomizer.getRandomother_psn(ResourceHungry.java:22701)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tRowGenerator_3Process(ResourceHungry.java:22710)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tFileInputDelimited_2Process(ResourceHungry.java:12207)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tFixedFlowInput_3Process(ResourceHungry.java:9394)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tFileInputDelimited_1Process(ResourceHungry.java:8631)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tRowGenerator_1Process(ResourceHungry.java:4299)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.tFixedFlowInput_2Process(ResourceHungry.java:2512)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.runJobInTOS(ResourceHungry.java:24969)\\n\\tat local_project.resourcehungry_0_3.ResourceHungry.main(ResourceHungry.java:24739)\\n\'',))
Two Jobs are still running:
2bf99511-af37-4d53-98b1-d6f346ca0a43 is not ready yet!84b8a63d-cc21-404e-960c-4e78ca32f702 is not ready yet!
Although the Job itself crashed, this was reported back gracefully. The other Jobs did run successfully, two are still running.
Result
From my perspective, the solution is robust enough to handle our workload. Erronenous cases are handled gracefully, even if Jobs run over hours and crash at the end. Other Jobs are not affected by that.