Second International Symposium on Checkpointing for Supercomputing (SuperCheck-SC21)
The Second International Symposium on Checkpointing for Supercomputing will be held on November 15, 2021 at St. Louis, USA, in conjunction with SC21: The International Conference for High Performance Computing, Networking, Storage and Analysis. This workshop will feature the latest work in checkpoint/restart research, tools development and production use.
As a primary approach to fault-tolerant computing, Checkpoint/Restart (C/R) is essential to a wide range of high performance computing (HPC) communities. While there has been much C/R research and tools development, continued C/R research is indispensable to keep pace with ever-changing HPC architectures, technologies, and workloads. More effort is also needed to narrow the gap between proof-of-concept C/R research codes and production-quality codes capable of deployment in real-world workloads. In this workshop, we will bring together C/R researchers and tools developers, practitioners, application developers, and end users to focus on C/R research and successes in production use, motivating the development of usable C/R tools, the closing of the gap between state-of-the-art research and production, and the harnessing of the full benefits of C/R for the HPC community. Paper submissions will be peer-reviewed, and a venue for accepted papers will be identified. We especially encourage PhD students and HPC end users to participate.
The workshop scope includes any and all aspects of checkpointing for science and engineering in the High Performance Computing (HPC) context, including the latest research results and development, deployment, and application experiences. The workshop scope includes but is not limited to:
C/R research and tools development:
C/R targeting the full range of supercomputing software, including MPI, OpenMP, GPGPU software, FPGAs, cloud, container, and serverless applications, etc.
Both pure and hybrid approaches to transparent checkpointing (some examples of hybrid approaches are: application-specific plugins to aid in checkpointing; and integrated modules for transparent checkpointing as part of larger scientific/engineering toolkits)
Frameworks for multi-level checkpointing
The development of new methods for low-overhead checkpointing, newer fundamental algorithms, software development methods, the impact of future supercomputer hardware, performance evaluation, and reproducibility, fault recovering
Research on optimal checkpointing interval, C/R-aware job scheduling and resource management
C/R use in production (including all levels of checkpointing: application, job, and system levels):
The adoption of transparent C/R tools in production workloads (C/R use cases)
The application-initiated use of C/R tools (alternative to built-in internal checkpointing)
C/R applications and support on HPC systems (e.g., resource scheduling, system utilization, batch system integration, best practice, etc.)
We propose two tracks of paper submissions within the workshop, research and production. For the production track, we broaden the definition of novelty for our workshop, to include the work of incorporating novel research results into practice, resulting in a real-life impact.
We invite authors to submit their original, high-quality work with the following categories:
(a) Regular papers:
Intended for submissions describing original work and ideas that have NOT appeared in another conference or journal, and are NOT currently under review for any other conference or journal. Both research and production tracks can submit regular papers. Regular paper submissions must be at least six (6) and must not exceed eight (8) pages in the IEEE format. The page limit will be increased to 10 for accepted submissions.
Accepted regular papers (subject to post-review revisions) will be published in the workshop proceedings in cooperation with IEEE TCHPC (pending acceptance).
(b) Short papers:
Intended for material that is not mature enough for a full paper, allowing authors to present novel, interesting ideas or preliminary results that will be formally submitted elsewhere later. Short papers are also for authors sharing their new efforts on adopting C/R tools in production use. Short paper submissions must not exceed two (2) pages in the IEEE format. The page limit will be increased to 3 for accepted submissions.
Accepted short papers will NOT be included in the workshop proceedings published with the IEEE TCHPC (conditional approval); instead they will be published in arXiv. We will provide links to those short papers in arXiv on our workshop website as we did for our previous workshop.
Note that the page limit above includes figures and tables, but does not include references, for which there is no page limit.
All submissions should be made electronically through the SC21 submission website and must follow the IEEE format.
Submissions must be double blind, i.e., authors should remove their names, institutions or hints found in references to earlier work. When discussing past work, they need to refer to themselves in the third person, as if they were discussing another researcher’s work. Furthermore, authors can identify any conflict of interest with the PC members (reviewers) at the SC21 submission site after their papers are submitted (using the “My Conflicts” tab).
While an Artifact Description (AD) Appendix and the Artifact Evaluation (AE) are optional, we encourage authors to follow the SC21 reproducibility and transparency initiative. The SC21 details can be found at: https://sc21.supercomputing.org/submit/reproducibility-initiative.
Call for Participation Released: June 14, 2021
Paper Submission Due:
September 13, 2021 AOE September 20, 2021 AOE
Acceptance Notification: October 1, 2021 AOE
Workshop Ready Submission Due: October 7, 2021 AOE
Presentation slides and recordings: November 1, 2021
Workshop @SC21: November 15, 2021 (full day)
Camera Ready Deadline: December 10, 2021 AOE
This is a partial list of the PC members who have been confirmed so far.
Gheorghe Almási, Thomas J. Watson Research Center
Kapil Arya, Microsoft Research
Leonardo Bautista-Gomez, Barcelona Supercomputing Center
Franck Cappello, Argonne National Laboratory
Rohan Garg, Nutanix Corp
Amina Guermouche, University of Knoxville, Tennessee
Twinkle Jain, Northeastern University
Zbigniew T Kalbarczyk, University of Illinois at Urbana-Champaign
Preeti Malakar, IIT Kanpur
Yue Li, MemVerge Inc
Rafael Mayo-García. CIEMAT, Madrid
Bogdan Nicolae, Argonne National Laboratory
Sarp Oral, Oak Ridge National Laboratory
Dhabaleswar K. (DK) Panda, Ohio State University
Yves Robert, ENS Lyon
Kento Sato, RIKEN Center for Computational Science
Martin Schulz, Technical University of Munich
Tony Skjellum, University Tennessee, Chattanooga
Osman Unsal, Barcelona Supercomputing Center, Spain
Amelie Chi Zhou, Shenzhen University
Zhengji Zhao, email@example.com
Workshop Website: https://supercheck.lbl.gov
SC21 Workshop Session: https://sc21.supercomputing.org/presentation/?id=wksp139&sess=sess139