System Design 101: Large File Upload Design

Workflow for Large File Upload Design

  1. Initialize File Upload Session (File Creation Protocol)

    • Client Action: The client starts by sending basic file metadata (e.g., filename, size) to the server.
    • Server Response: The server returns a unique upload token that the client will use for all subsequent actions related to this upload.
  2. File Chunking and Hash Calculation

    • Client Action:
      • The client reads the file in chunks and calculates the hash for each chunk.
      • This hash will be used to check if the chunk already exists on the server, optimizing the upload by preventing redundant chunk uploads.
  3. Verify Chunk (Hash Verification Protocol)

    • Client Action: For each chunk, the client sends the chunk hash and upload token to the server to verify if the chunk has already been uploaded.
    • Server Response:
      • If the chunk hash exists on the server, it returns a confirmation that this chunk doesn’t need to be uploaded.
      • If the chunk hash does not exist, the client is instructed to upload the chunk.
  4. Upload Missing Chunks (Chunk Upload Protocol)

    • Client Action: For chunks that the server hasn’t verified, the client uploads the binary data along with the chunk’s index and the upload token.
    • Server Action:
      • The server stores the chunk with a unique identifier based on the token and chunk hash, ensuring no duplicates are saved.
      • The server keeps a record of the chunk’s upload status in the upload session.
  5. Notify Completion and Merge (Chunk Merging Protocol)

    • Client Action: After uploading all chunks, the client sends a completion request to the server, indicating that all chunks are uploaded.
    • Server Action:
      • The server verifies that all expected chunks are uploaded and updates the file’s status in the database.
      • No actual merging of the chunks occurs at this point. Instead, the server prepares a virtual URL or access point for the file.
  6. File Access on Request (On-Demand Chunk Streaming)

    • User Action: When the user or another system accesses the file, the server reads each chunk in sequence from storage.
    • Server Action:
      • The server streams the chunks directly to the client, using a file stream to ensure seamless access without duplicating the entire file on storage.

Design Flow Summary:

Step Client Action Server Action
1. Initialize Upload Session Send file metadata to server Generate and return upload token
2. File Chunking & Hashing Chunk the file, calculate hash for each chunk N/A
3. Verify Chunk Send chunk hash to verify if chunk exists Confirm existence or prompt client to upload
4. Upload Missing Chunks Upload missing chunks with token Store chunk uniquely and update session
5. Notify Completion & Merge Send completion request after all chunks uploaded Verify all chunks, update file status, prepare access URL
6. File Access on Request Request file access Stream chunks directly to client, avoiding full file storage copy

Diagram of Workflow

A high-level flowchart to visualize this can include:

  1. Start

    • ⬇️ Client initializes upload ⬅️→ Server returns token
  2. Loop through chunks

    • For each chunk:
      • Calculate chunk hash ⬅️→ Check with server ⬅️→ Server confirms existence or prompts upload
  3. Upload missing chunks

    • ⬇️ Upload chunk ⬅️→ Server saves chunk
  4. Completion Notification

    • Notify server of completion ⬅️→ Server updates status, no physical merge
  5. File Access

    • Client requests access ⬅️→ Server streams chunks directly

This design flow maintains high performance, reduces redundancy, and optimizes both client and server resources for large file uploads.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *