Note that the schedule is tentative, please constantly check for changes.

Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8
1/22 1/29 2/5 2/12 2/19 2/26 3/4
1/17 1/24 1/31 2/7 2/14 2/21 2/28 3/6
Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16
3/11 3/18 3/25 4/1 4/8 4/15 4/22 4/29
3/13 3/20 3/27 4/3 4/10 4/17 4/24

Introduction

Wed 1/17
Course Overview slides

Mon 1/22
Introduction to Cloud slides


Fundamentals in Distributed and Cloud Systems

Wed 1/24
Distributed System Foundation slides

Deadline for Lab0: 1/26

Mon 1/29
Example: MapReduce slides

Wed 1/31
RPC slides

Mon 2/5
Transaction slides

Wed 2/7
Transaction (Contd.) slides

Mon 2/12
Time and Coordination slides

Wed 2/14
Agreement slides

Deadline for Lab1: 2/18

Mon 2/19
Two-phase Commit slides

Wed 2/21

Guest talk: Managing Cloud Health with AIOps, Cong Chen, Microsoft Azure slides

Cong Chen is a Principal Data Scientist Manager in Azure Edge and Platform group at Microsoft. He owns model development and quality improvement of a few AIOps solutions. Previously, he was responsible for detection, diagnosis and mitigation of resource leaks on Azure host servers. In this talk, he will share the experience and vision for how AIOps is transforming the way we manage the health of the ever-growing Azure cloud. He will demonstrate how automation and intelligence are crucial for achieving high availability and premier performance, with BRAIN and Gandalf as examples.

Mon 2/26
Consensus slides

Wed 2/28
Consensus (Contd.) slides

Mon 3/4

Spring recess (No Class)

Wed 3/6

Spring recess (No Class)

Mon 3/11

Hacker Day (No Class)
Deadline for Lab2a: 3/12

Wed 3/13
Midterm Review slides

Mon 3/18

Midterm exam: selectively covers topics included in "Introduction" and "Fundamentals in Distributed and Cloud Systems".

Wed 3/20
Isolation and Consistency slides


Real-world Cloud

Mon 3/25

Guest talk: Block Store over the Cloud, Erci Xu, Alibaba Cloud

Erci Xu serves as a research scientist at Alibaba Cloud Storage, where his primary focus lies in the development of distributed storage systems and the enhancement of both software and hardware reliability. He has authored multiple papers in top conferences such as USENIX OSDI, FAST, ATC, and ACM Eurosys. He is the recipient of two USENIX FAST Best Paper Awards (FAST’23 and FAST’24) and 2023 ACM SIGOPS China Rising Star Award.

In this guest lecture, we will study a typical cloud service, Elastic Block Store (EBS). Specifically, we go through design choices, production experience, and lessons in building the EBS at Alibaba Cloud over the past decade. To cope with hardware advancement and users’
demands, we shift our focus from design simplicity in EBS1 to high performance and space efficiency in EBS2, and finally reducing network traffic amplification in EBS3. We will also cover other interesting topics along the way, including maintaining high availablity, adopting hardware offloading and the pros/cons of alternative solutions.

Wed 3/27
Case study: Google File System slides

Prepare: Read GFS paper
Optional: How to Read an Engineering Research Paper

Mon 4/1
Case study: ZooKeeper slides
Prepare: Read ZooKeeper paper

Wed 4/3
Lab Day I: Play with ZooKeeper guide

Mon 4/8
Cloud Infrastructure in Industry slides

Deadline for Lab2b: 4/8

Wed 4/10
Lab Day II: Hack ZooKeeper guide


Special Topics in Cloud Computing

Mon 4/15
Virtualization slides

Wed 4/17

Hacker Day (No Class)

Mon 4/22
Machine Learning Systems slides

Wed 4/24
Reliability slides TLA+ codes TLA+ cfg

Deadline for Lab2c: 4/29

Mon 4/29
Final Review slides

Mon 5/6
2-5pm, Olsson Hall 011

Final exam: selectively covers topics included in "Fundamentals in Distributed and Cloud Systems", "Real-world Cloud", and "Special Topics in Cloud Computing".