Gary Grider Talks MarFS - A Scalable Near-POSIX Name Space over Cloud Objects File System

Gary Grider, HPC luminary and the High Performance Computing Division Leader at Los Alamos National Laboratory, will give a BrightTalk live online presentation Nov 10 2:00 pm United States – New York (Eastern) time. The video will be available 45 minutes after his presentation at 2:45 Eastern. Register here to attend or view the video. Those who cannot wait can find Gary’s slides on github.

Gary Grider, LANL HPC Division Leader

MarFS is a cloud and HPC file-system designed to deliver a scalable near-POSIX name space over standard object systems, with target scaling out to trillions of POSIX files, hundreds of Gigabytes/sec of data bandwidth, and millions of POSIX metadata operations/sec. Features include:

Near-POSIX global scalable name space over many POSIX and non POSIX data repositories (Scalable object systems – CDMI, S3, etc.)
It scales name space by sewing together multiple POSIX file systems both as parts of the tree and as parts of a single directory allowing scaling across the tree and within a single directory
It is small amount of code (C/C++/Scripts)
A small Linux Fuse
A pretty small parallel batch copy/sync/compare/ utility
A set of other small parallel batch utilities for management
A moderate sized library both FUSE and the batch utilities call
Data movement scales just like many scalable object systems
Metadata scales like NxM POSIX name spaces both across the tree and within a single directory
It is friendly to object systems by
- Spreading very large files across many objects
- Packing many small files into one large data object

MarFS requirements:

Linux system(s) with C/C++ and FUSE support
MPI for parallel comms in Pftool (a parallel data transfer tool)
- MPI library can use many comm methods like TCP/IP, Infiniband OFED, etc.
Support lazy data and metadata quotas per user per name space
Wide parallelism for data and metadata
Try hard not to walk trees for management (use inode scans etc.)
Use trash mechanism for user recovery
If use MarFS to combine multiple POSIX file systems into one mount point, any set of POSIX file systems can be used.
Multi-node parallelism MD FS’s must be globally visible somehow
Using object store data repo, object store needs globally visible.
The MarFS MD FS’s must be capable of POSIX xattr and sparse
- don’t have to use GPFS, we use due to ILM inode scan features

Share this:

Leave a Reply Cancel reply