Workflow for Large File Upload Design
-
Initialize File Upload Session (File Creation Protocol)
- Client Action: The client starts by sending basic file metadata (e.g., filename, size) to the server.
- Server Response: The server returns a unique upload token that the client will use for all subsequent actions related to this upload.
-
File Chunking and Hash Calculation
- Client Action:
- The client reads the file in chunks and calculates the hash for each chunk.
- This hash will be used to check if the chunk already exists on the server, optimizing the upload by preventing redundant chunk uploads.
- Client Action:
-
Verify Chunk (Hash Verification Protocol)
- Client Action: For each chunk, the client sends the chunk hash and upload token to the server to verify if the chunk has already been uploaded.
- Server Response:
- If the chunk hash exists on the server, it returns a confirmation that this chunk doesn’t need to be uploaded.
- If the chunk hash does not exist, the client is instructed to upload the chunk.
-
Upload Missing Chunks (Chunk Upload Protocol)
- Client Action: For chunks that the server hasn’t verified, the client uploads the binary data along with the chunk’s index and the upload token.
- Server Action:
- The server stores the chunk with a unique identifier based on the token and chunk hash, ensuring no duplicates are saved.
- The server keeps a record of the chunk’s upload status in the upload session.
-
Notify Completion and Merge (Chunk Merging Protocol)
- Client Action: After uploading all chunks, the client sends a completion request to the server, indicating that all chunks are uploaded.
- Server Action:
- The server verifies that all expected chunks are uploaded and updates the file’s status in the database.
- No actual merging of the chunks occurs at this point. Instead, the server prepares a virtual URL or access point for the file.
-
File Access on Request (On-Demand Chunk Streaming)
- User Action: When the user or another system accesses the file, the server reads each chunk in sequence from storage.
- Server Action:
- The server streams the chunks directly to the client, using a file stream to ensure seamless access without duplicating the entire file on storage.
Design Flow Summary:
Step | Client Action | Server Action |
---|---|---|
1. Initialize Upload Session | Send file metadata to server | Generate and return upload token |
2. File Chunking & Hashing | Chunk the file, calculate hash for each chunk | N/A |
3. Verify Chunk | Send chunk hash to verify if chunk exists | Confirm existence or prompt client to upload |
4. Upload Missing Chunks | Upload missing chunks with token | Store chunk uniquely and update session |
5. Notify Completion & Merge | Send completion request after all chunks uploaded | Verify all chunks, update file status, prepare access URL |
6. File Access on Request | Request file access | Stream chunks directly to client, avoiding full file storage copy |
Diagram of Workflow
A high-level flowchart to visualize this can include:
-
Start
- ⬇️ Client initializes upload ⬅️→ Server returns token
-
Loop through chunks
- For each chunk:
- Calculate chunk hash ⬅️→ Check with server ⬅️→ Server confirms existence or prompts upload
- For each chunk:
-
Upload missing chunks
- ⬇️ Upload chunk ⬅️→ Server saves chunk
-
Completion Notification
- Notify server of completion ⬅️→ Server updates status, no physical merge
-
File Access
- Client requests access ⬅️→ Server streams chunks directly
This design flow maintains high performance, reduces redundancy, and optimizes both client and server resources for large file uploads.
Leave a Reply