Internet_Scale_Computing_1a Giant Scale Services You are dep…
Internet_Scale_Computing_1a Giant Scale Services You are deploying a large-scale machine learning model for inference in a cloud data center. The model is 960 GB in size and can be broken down into 8 GB chunks that must be executed in a pipelined manner. Each chunk takes 0.8 ms to process. The available machines each have 8 GB of RAM. You are required to serve 600,000 queries per second. Assume there is perfect compute and communication overlap, and no additional intermediate memory usage during execution. What is the minimum number of machines required to support this throughput? You are free to assume pipelined execution of chunks for this.