Flat Datacenter Storage Paper Review

March 30, 2016
Distributed System

[TOC]

A review for paper Nightingale E B, Elson J, Fan J, et al. Flat datacenter storage[C]//Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 2012: 1-15.

Introduction

What is FDS?

How is the Performance?

Architecture

High-level design – a common pattern

comparison

Right distributed model - GFS & HDFS: Data is either on a local disk or a remote disk.

Left distributed model – FDS: Object storage assuming no oversubscription. Data is all remote.

architecture

Above it’s the architecture of FDS:

Deisign Overview

How to store data? – Blobs and Tracts

  1. Data is logically stored in blobs.

    • A blob is a byte sequence named with a 128-bit GUID.
    • Blobs can be any length up to the system’s storage capacity.
    • Blobs are divided into tracts.
  2. Tracts are the units responsible for read and write

    • Tracts are sized such that random and sequential access achieves nearly the same throughput.
    • The tract size is set when the cluster is created based upon cluster hardware.(64kb~8MB)
    • All tracts’ metadata is cached in memory, eliminating many disk accesses.
  3. Every disk is managed by a process called a tractserver:

    • Services read and write requests from clients.
    • Lay out tracts directly to disk by using the raw disk interface.
    • Provides API, and the API features are follow:
      • Tract reads are not guaranteed to arrive in order of issue. Writes are not guaranteed to be committed in order of issue.
      • Tractserver writes are atomic: a write is either committed or failed completely.
      • Calls are asynchronous: using callback, allows deep pipelining to achieve good performance.
      • Weak consistency to clients

How to organize and manage metadata? – Deterministic data placement

  1. Many systems solve this problem using a metadata server that stores the location of data blocks.

    • Advantage: allowing maximum flexibility of data placement and visibility into the system’s state.
    • Drawbacks: the metadata server is a central point of failure, usually implemented as a replicated state machine.
  2. FDS uses a metadata server, but it’s role is simple and limited: tract locator table (TLT):

    • collect a list of the system’s active tractservers and distribute it to clients.
    • With k-way replication, each entry has the address of k tractservers.
    • Weighted by disk speed
    • Only update when cluster changes
  3. Compute a tract index to read or write, which is designed to both be deterministic and produce uniform disk utilization: Tract_Locator = TLT[(Hash(GUID) + Tract) % len(TLT)]

    • Hash(GUID): Randomize blob’s tractserver, even if GUIDs aren’t random (uses SHA-1)
    • Tract: adds tract number outside the hash, so large blobs use all TLT entries uniformly
  4. Compute a tract index for Blob metadata, which enable to distribute Blob metadata: Tract_Locator = TLT[(Hash(GUID) - 1) % len(TLT)]

    • The metadata server isn’t a single point failure.
    • Parallelized operation can be servied in parallel by independent tractservers.
  5. To summarize, FDS metadata scheme has following properties:

    • The metadata server is in the critical path only when a client process starts.
    • The TLT can be cached long-term, eliminating all traffic to the metadata server under normal conditions.
    • TLT contains random permutations of the list of tractservers, which make sequential reads and writes parallel.

What kind of application will /will not benefic from FDS? – Dynamic Work Allocation

Replication and Failure Recovery

Replication

Failure recovery

Replicated data layout

Cluster growth

Networking


Reference:

Flat DataCenter Storage之系统分析

How does FDS (flat datacenter storage) make optimizations around locality unnecessary?

Youtube video

分布式事务

March 21, 2017
Distributed System

MIT 6.824: Lab 4 Sharded KeyValue Service Implementation

September 9, 2016
Distributed System MIT 6.824

MIT 6.824: lab3 Fault-Tolerant Key/Value Service Implementation

August 6, 2016
Distributed System MIT 6.824
comments powered by Disqus