Gary Grider, HPC luminary and the High Performance Computing Division Leader at Los Alamos National Laboratory, will give a BrightTalk live online presentation Nov 10 2:00 pm United States – New York (Eastern) time. The video will be available 45 minutes after his presentation at 2:45 Eastern. Register here to attend or view the video. Those who cannot wait can find Gary’s slides on github.
MarFS is a cloud and HPC file-system designed to deliver a scalable near-POSIX name space over standard object systems, with target scaling out to trillions of POSIX files, hundreds of Gigabytes/sec of data bandwidth, and millions of POSIX metadata operations/sec. Features include:
- Near-POSIX global scalable name space over many POSIX and non POSIX data repositories (Scalable object systems – CDMI, S3, etc.)
- It scales name space by sewing together multiple POSIX file systems both as parts of the tree and as parts of a single directory allowing scaling across the tree and within a single directory
- It is small amount of code (C/C++/Scripts)
- A small Linux Fuse
- A pretty small parallel batch copy/sync/compare/ utility
- A set of other small parallel batch utilities for management
- A moderate sized library both FUSE and the batch utilities call
- Data movement scales just like many scalable object systems
- Metadata scales like NxM POSIX name spaces both across the tree and within a single directory
- It is friendly to object systems by
- Spreading very large files across many objects
- Packing many small files into one large data object
- Linux system(s) with C/C++ and FUSE support
- MPI for parallel comms in Pftool (a parallel data transfer tool)
- MPI library can use many comm methods like TCP/IP, Infiniband OFED, etc.
- Support lazy data and metadata quotas per user per name space
- Wide parallelism for data and metadata
- Try hard not to walk trees for management (use inode scans etc.)
- Use trash mechanism for user recovery
- If use MarFS to combine multiple POSIX file systems into one mount point, any set of POSIX file systems can be used.
- Multi-node parallelism MD FS’s must be globally visible somehow
- Using object store data repo, object store needs globally visible.
- The MarFS MD FS’s must be capable of POSIX xattr and sparse
- don’t have to use GPFS, we use due to ILM inode scan features