|
Research Projects
Previous Research (2001-2003)
Fault Tolerant Software Distributed Shared Memory System
-
A Software Distributed Shared Memory (Software DSM) system provides shared
memory abstraction over physically distributed memory of clusters.
Programming with Software DSM is considered to be easier than with
a message-passing model. High availability and reliability of DSM
become critical as long-running applications on larger clusters become
necessary. We developed Fault-Tolerant Software DSM in Linux clusters.
We proposed (i) a lightweight logging scheme (called Remote Logging) and
(ii) a recovery protocol for Home-based DSM.
Remote logging stores coherence-related data to the volatile memory
of a remote node where the logging overhead can be moderated with
high-speed system area network and user-level DMA operations supported
by modern communication protocols.
In addition, logging information is exploited to enhance the performance
of base model with remote logging.
Basically, it enhance reduce the stalled times necessary for
updates of invalid pages by minimizing the failure-free execution time.
We developed a user-level library running for fault-tolerant software DSM with
pthreads and user-level DMA libraries under Linux kernel environment.
(Papers: PDCS03, ISPDC03, M.S. Thesis)
Project Homepage: KAIST Distributed Shared Memory
|
|