Add a native resumeable uploads protocol.

The existing cdstar-tus plugin allows creating temporary files that can then be imported into an archive. This requires an additional server-side copy and adds a significant delay because the hashes must be computed (a second time) according to vault settings. Uploading directly to a file endpoint (PUT /v3/<vault>/<id>/<filename>) is way more efficient, but does not allow resumes after an interrupted upload. Supporting resumeable uploads directly to file endpoints would close this gap, and make the tus plugin more or less obsolete.

Why not tus? The https://tus.io/ protocol unfortunately has no standardized way to create resources with a known length and a known name or URL. The core upload protocol does not support Upload-Length, which is required to detect incomplete uploads. The create extension on the other hand does not allow a user to specify a target file name. It supports metadata, but the filename attribute is only a suggestion. Adding the missing header to the tus core upload protocol would defeat the purpose of using an existing protocol, as no client library would support this out of the box. tus also tends to be more complex than necessary. A much simpler protocol would suffice for this specific use-case.

Why not [Content-]Range? The range-family request headers are already standardized for partial downloads, and undefined for uploads. If resumeable uploads are standardized in the future, they are very likely to use these headers, but it is hard to predict in which way. To not conflict with a future standard, we better use our own set of non-sandard headers.

Protocol (draft): To start or resume a resumeable upload, send a PATCH request to /v3/<vault>/<id>/<filename> with the following headers and content:

Patch-Offset: <int> (required) The upload offset (in bytes) as a non-negative integer. The offset MUST be 0 or match the current size of the resource. An offset value of 0 will truncate existing or create missing resources if required. If the offset is larger than 0 and does not match the current resource size, a 409 InvalidPatchOffset error is returned.
Patch-Length: <int> (required) The target size (in bytes) of the entire resource as a non-negative integer, or -1 to indicate an unknown target size. The upload is considered complete if the resource size matches the last recorded Patch-Length.
X-Transaction: <id> (required) Resumeable uploads only work within a transaction context.
The request body should contain a chunk of war data, starting at Patch-Offset.

Successful PATCH requests will return 204 No Content with updated Patch-Offset and Patch-Length headers as long as the upload is still incomplete. The last PATCH request that completes an upload will behave similar to a successful PUT request and return 201 Created or 200 OK (FileInfo). HEAD or GET requests will also return Patch-* headers if they target incompletely uploaded files.

IO errors (network issues, aborted requests, early EOF) during PUT or PATCH requests will NOT rollback the transaction and instead store and persist as much data as possible and mark the file as incomplete. Trying to commit a transaction with incomplete resources will cause a (hard) error and trigger a rollback. Resources are considered incomplete if their size does not match their last recorded Patch-Length target size. After an error, the client should repeatedly issue HEAD requests for the incomplete resource until a valid answer is recieved, and resume the upload based on the returned Patch-Offset or Content-Length header.

Parallel uploads to different resources are allowed, but parallel uploads to the same resources are not supported. There can only be one request writing to a resource at the same time. New requests will abort conflicting requests as a side-effect. PATCH requests in particular MUST abort other writing operation on the same resource in a timely fashion, so clients can quickly resume uploads that are still in a half-open state and not detected as failed by the server yet. In worst case after a network failure during PATCH, there might still be data in network buffers and the size of the resource changes between the HEAD and second PATCH request. The second PATCH will fail, but also close the hanging first PATCH and the third attempt will be successful. This is way better than waiting 30 seconds or more for a network timeout.

Edited Dec 09, 2019 by Marcel Hellkamp

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information