Distributed File System

Posted by : on

Category : research


Distributed File System

  • Storage
  • Communications
  • Computation

Distributed System


Abstractions

Distributed System abstraction is to fake it so that it hides its distributed identity.

Although not so likely, the goal is to let people use distributed system just like a normal, single-machine system.


Implementation

  • RPC
  • threads
  • concurrency control

Performance

  • Scalability : 2 computers meeans 2 times throughput.

Bottle-neck can shift.

Eg. Client-Server-DB

Adding servers can’t guarantee scalable performance increase once the # of server is beyond a certain level.


Fault Tolerance

Masking of failure (which is trivial in non-distributed system) needs to be considered in your design.

  • Availablility
  • Recoverability

Key :

Non-volatile System

Replication


Consistency

  • Put(k, v)
  • Get(k) -> v

Map Reduce

Non-specialist can run software in distributed system without knowing details

Map Reduce is a typical kinds of problems.

When we have these problems, we first write our map and reduce functions. Then the framework will distribute those workload to distributed systems.


Word Count : Map Reduce

Map(k,v) : // k as filename v as text

1. Split v into words w
2. For each word in w
    * emit (w, 1)
Reduce(k, v) : // k is word, v is a vector containing values of corresponding key
    emit(len(v))


GFS


Why hard?

Performance -> Sharding

Faults -> Tolerance

Tolerance -> Replication

Replication -> In consistency

Consistency -> Low performance


Replication

  • State Transfer
    • Directly dump the content of storage mediat to anther place
  • Replicated State Machine
    • Observation : Most of services can be dermined by internal instructions and external output.

In ransomware testing framework, both should be subject to test.


Non-det. events

  • Inputs - packets (network packets) - [data + interrupt]
  • Weird instruction like randomness (or multicore)

Inspiration : Can we use the same non-det. events dependent logging method for normal data replication?

Why ?

In higher level, we don’t even have to consider multi-core complexity.


About moomoohorse
moomoohorse

Hi I a Hao Ren. Please know me in mywebsite

Email : haor2@illinois.edu

Website : http://moomoohorse.com

About moomoohorse

Hi, my name is Hao Ren.