The most exciting thing about this world is its ever changing quality.

Monday, June 08, 2009

Distributed system notes - Part III

Virtulisation

  • Memory virtulisation - process owns individual memory space. Created by MMU, configured by OS
  • Storage virtulisation - Logical storage images. DFS (distributed file system) is one typical example. Another one is SAN (Storage Area Network).
  • CPU/Machine virtulisation - Process has its own CPU, machine resources, created by OS pre-emption and scheduler.
    • For unprivileged instructions, they will be interpreted to the physical CPU instruction sets.
    • For privileged instruction, VM will trap these instructions and emulate them. However, X86 doesn't support Trap. So either we have to replace these instructions (VMWare) or just don't use them (Xen).
    • Virtual Machine Monitor - Hypervisor, the program responsible for the virtulisation
      • Arbitrate accesses to physical resources (hypervisor sometime can be run on top of a thin layer OS itself - Xen.)
      • Present a set of virtual device interface to each host OS
      • Hypervisor-based rootkits
      • VMWare, Parallels, Xen, MS Virtual PC, xVM Virtual Box, Altris

Clustering

Clustering is to achieve reliability and scalability by interconnectin multiple autonomous standard servers into appear-to-be single machine image.
Existing clustering types:
  • High Performance (Super)-Computing (HPC)
    • Typically Linux + message passing software + remote exec + remote monitoring
    • Interconnect network isolated from external network
    • Network load is determined only by application
    • Global process ID provided, Global signaling mechanism

  • Batch processing
    • Single-queue work distribution
    • Application specific: graphics rendering
    • Dispatcher remotely exec process
    • A portable Batch System - OpenPBS

  • Grid Computing
    • Not application-specific, use general-purpose protocols for accessing resources, authenticating, discovery etc.
    • Coordinate computing resources that are not subject to centralized control
    • Open Grid Services Architecture (Globus Toolkit - open source software tool kit used for building Grid systems and applications.)
      • All grid resources are modeled as services
      • Protocols
        • Grid Resource Information Protocol (GRIP), Register resources; based on LDAP
        • Grid Resource Access and Management Protocol (GRAM) Allocation & monitoring of resources; based on HTTP
        • GridFTP, Data access
      • Service
        • Define service’s interface (GWSDL)
        • Implement service (Java)
        • Define deployment parameters (WSDD)
        • Compile & Generate GAR file (Ant)
        • Deploy service with Ant

  • High availabitlity (HA)

  • Load balancing - Redirect request, most router has now load balancing capability, simple NAT table load balancing.
    • Software load balancing - Forward request via load balancing (IBM interactive network dispatcher)
      • Leaves original source address
      • Load balancer not in path of outgoing traffic (high bandwidth)
      • Kernel extensions for routing TCP and UDP requests
        • Each client accepts connections on its own address and dispatcher’s address
        • Dispatcher changes MAC address of packets.
    • Load balancing Router - Virtual Router Redundancy Protocol, LSNAT protocol
      • Pick machine with least # TCP connections
      • Factor in weights when selecting machines
      • Pick machines round-robin
      • Pick fastest connecting machine (SYN/ACK time)
      • Port level redirection
      • Multiple virtual addresses to be assigned to physical address
      • Manage Session during redirections.
    • Network load balancing, component load balancing

Multi-tier clustering system
  • Top tier: cluster abstractions - Failover manager, resource monitor, cluster registry
  • Middle tier: distributed operations - Global status update, quorum (keeps track of who’s in charge), membership
  • Bottom tier: OS and drivers - Cluster disk driver, cluster network drivers, IP address takeover

Clock synchronisation

Physical clock - Keep consistent wall clock across all components.

Problems: Clock drift and skew => NTP and SNTP protocol
Logical clock - Maintain logical relationship between events. The key is the sequence of the messages, which each component maintains its own
  • Partial ordering - Lamport's algorithm
    • Each message carries the physical timestamp from the sender;
    • When a message arrives, if the receiver's physical clock < timestamp =""> set receiver's clock to message's timestamp + 1, otherwise do nothing;
    • Clock must be advanced between two events in the same process.
    • When there is no logical sequential ordering among events in different components, i.e. no send-receive causal relationship, we have concurrency.
  • Total ordering
    • Force each timestamp in the system to be unique in {Ti , i}. "i" is the process identifier in the system. Total ordering does not related to event ordering.
    • Use Vector clock to analyse and identify concurrent events in a distributed system.


No comments: