Research Projects

Previous Research (2001-2003)

  • Fault Tolerant Software Distributed Shared Memory System
    • A Software Distributed Shared Memory (Software DSM) system provides shared memory abstraction over physically distributed memory of clusters. Programming with Software DSM is considered to be easier than with a message-passing model. High availability and reliability of DSM become critical as long-running applications on larger clusters become necessary. We developed Fault-Tolerant Software DSM in Linux clusters. We proposed (i) a lightweight logging scheme (called Remote Logging) and (ii) a recovery protocol for Home-based DSM. Remote logging stores coherence-related data to the volatile memory of a remote node where the logging overhead can be moderated with high-speed system area network and user-level DMA operations supported by modern communication protocols. In addition, logging information is exploited to enhance the performance of base model with remote logging. Basically, it enhance reduce the stalled times necessary for updates of invalid pages by minimizing the failure-free execution time. We developed a user-level library running for fault-tolerant software DSM with pthreads and user-level DMA libraries under Linux kernel environment. (Papers: PDCS03, ISPDC03, M.S. Thesis)

  • Project Homepage: KAIST Distributed Shared Memory