Understanding the Problem
Let's assume we have a very large stream of views on YouTube (our stream is a firehose of VideoIDs). At any given moment we'd like to be able to query, precisely, the top K most viewed videos for a given time period (say 1 hour, 1 day, 1 month, all time) together with their counts.
The word precisely here has dramatic implications on this problem. It's a bit unrealistic for most applications, but an approximate solution leans heavily into probabilistic/approximate solutions which, for many roles, rely on esoteric knowledge of e.g. count-min sketch (opens in a new tab). Except in specialist situations, most reasonable interviewers are going to test general knowledge rather than something you'd learn on the job after reading some papers.
Our interviewer might give us some quantities to help us understand the scale of the system: Youtube Shorts had 70 billion views per day and approximately 1 hour of Youtube content is uploaded every second. Big!
Functional Requirements (opens in a new tab)
Core Requirements
-
Clients should be able to query the top K videos (max 1000) for a given time period.
-
Time periods should be limited to 1 hour, day, month and all-time.
Below the line (out of scope):
-
Arbitrary time periods.
-
Arbitrary starting/ending points (we'll assume all queries are looking back from the current moment).
Non-Functional Requirements (opens in a new tab)
Core Requirements
-
We'll tolerate at most 1 min delay between when a view occurs and when it should be tabulated.
-
Our results must be precise, so we should not approximate. (Note: This would be unusual for most production systems)
-
Our system should be able to handle a massive number (TBD - cover this later) of views per second.
-
We should support a massive number (TBD - cover this later) of videos.
-
We should return results within 10's of milliseconds.
-
Our system should be economical. We shouldn't need a 10k host fleet to solve this problem.
Having quantities on your non-functional requirements will help you make decisions during your design. Note here that a system returning within 10's of milliseconds eliminates many candidates from the solution space - we'll need to favor precomputation.
For quantities that are important to our design but that we can't estimate out of hand, we'll reserve some time to do so.
Here's how it might look on your whiteboard:
Requirements
Scale Estimation
We've earmarked two quantities important to our design: (a) the number of views per second, and (b) the total number of videos. The first will help us understand the overall throughput of the system while the second is important for bounding the storage we'll need.
First let's look at throughput:
70B views/day / (100k seconds/day) = 700k tps
Woo, that's a lot. We're definitely going to need to look for ways to shard this across many different hosts.
Now, let's talk storage. First we need the number of videos:
Videos/Day = 1 hour content/second / (6 minutes content/video) * (100k seconds/day) = 1M videos/day Total Videos = 1M videos/day * 365 days/year * 10 years = 3.6B videos
With that let's estimate how big a naive table of IDs and counts would be:
Naive Storage = 4B videos * (8 bytes/ID + 8 bytes/count) = 64 GB
Ok, probably something we can keep in memory if we're clever, especially if we use a number of hosts.
The goal of estimations isn't actually to see if you can do mental math (so don't be afraid of fudging the numbers). Your reflections on the implications of the quantities on your design, like which architectures are on or off the table as a result, is often the most important factor.
The Set Up
Planning the Approach
Based on our requirements, we know we're going to make some observations for our interviewer:
Sharing your thought process is a great way to demonstrate seniority by showing you're thinking a step ahead.
-
First, we need to index data from a very high volume stream. Most quantities will need to be precomputed in order to meet the latency requirements.
-
Next, problems like this typically have bottlenecks that are hidden behind bottlenecks: solving one problem creates (at least) one more. So we'll aim to solve the simplest problem first, and then add complexity as we go.
-
Finally, we'll note that the sliding time window adds more challenge. So we'll start with all-time and then try to figure out the rest.
Our rough plan is thus:
-
Generate a basic (but not scalable solution) to the all-time top K problem.
-
Solve the primary issues of our basic solution.
-
Add a solution for the time period inputs.
-
Deep dive remaining bottlenecks until we run out of time.
Defining the Core Entities (opens in a new tab)
In our problem, we have some basic entities we're going to work with to build our API:
-
Video
-
View
-
Time Window
From a conceptual perspective this problem is straightforward so we're not going to spend any more time here. We might even skip this section to save time.
API or System Interface (opens in a new tab)
Our API guides the rest of the interview, but in this case it's really basic too! We simply need an API to retrieve the top K videos.
GET /views/top?window={WINDOW}&k={K} Response: { "videos": [ { "videoId": // ... "views": // ... } ] }
We're not going to dawdle here and keep moving on to the meat of the interview.
Especially for more senior candidates, it's important to focus your efforts on the "interesting" aspects of the interview. Spending too much time on obvious elements both deprives you of time for the more interesting parts of the interview but also signals to the interviewer that you may not be able to distinguish more complex pieces from trivial ones: a critical skill for senior engineers.
High-Level Design (opens in a new tab)
1) A Basic Solution for All-Time
Let's start with a simple solution for all-time top K videos which we'll build on a gigantic single host, then we can start to whittle away at optimization.
Especially for infrastructure-style interviews, I highly recommend reasoning about a solution first from the lense of a single host. Oftentimes the path to scale is straightforward from there. On the other hand if you solve scale first without thinking about how the actual mechanics of your solution work underneath, you're likely to back yourself into a corner.
We can do this by maintaining a table of video IDs and counts. This gets us an up-to-date count of every video, but iterating over all 4B keys to find the largest values is untenable, so we'll keep a heap of the top K videos which we can update with each increment. The vast majority of views will never touch this heap since they'll be below the threshold of the top 1000 (the max K we established in our functional requirements).
Basic Solution
The basic function is this: when a request comes in, we atomically increment the counter in the hash table with the incoming ID. We retrieve the updated count and test it against the floor of our heap. If the count is higher than our floor (i.e. the video belongs in the top 1,000) we update/insert it into the heap and heapify. Our clients query directly from that heap to retrieve the top K videos.
This is really simple and fast because we run it on a single host. And while this is possible conceptually, in memory, on a single host, we wouldn't want to do that. First, because the throughput we can support is likely more than an order of magnitude shy of the 700k TPS we need and secondly because that host becomes a single point of failure. What to do here?
2) Primary Issues of the Basic Solution
We have two issues we need to address: how to maintain reliability in the presence of failures and how to scale the write throughput of the system. Let's talk about them in order.
In order for our system to be reliable, we need to be able to gracefully handle node failures. In our single-node system we're going to be in the job search again if that host fails. No good.
Bad Solution: Write out to a database
Good Solution: Replication
Great Solution: Replicas with Snapshots
Ok, with some replicas and snapshots we're in a much more fault-tolerant state. Next, we need to scale the write throughput of our system as our replicas don't solve for the problem of having a massive firehose of incoming data. Your mind should immediately go to sharding/partitioning here.
Bad Solution: Fixed partitioning by ID
Good Solution: Elastic partitioning
Ok cool, now we have a basic in-memory solution which is both fault-tolerant and (somewhat) scalable. But we haven't solved all our functional requirements yet. On to those pesky time windows.
Potential Deep Dives (opens in a new tab)
1) Handling Time Windows
While our "All-Time" solution conveniently can aggregate views forever, to handle time windows we need to age out views that happened outside that window. As an example, if a video got a single view at time T=0, if our time window is 1, by T=2 we need to make sure that video has a count of 0.
One advantage we have is that the time windows we're working with are fixed and small: we only have 3. One disadvantage is they are very different granularities: from 1 minute to 1 month.
This is complicated so our best strategy is to start with something basic and probably bad then use it as inspiration to try come up with alternative solutions.
Some candidates are so afraid of saying something wrong that they get stuck in a spot where they can't iterate to an optimal answer. The trick is communication! Let your interviewer know you understand the complexity and that you're going to start with something poor then iterate toward something good.
This avoids two common failure modes: (1) getting stuck trying to jump two steps, and (2) having your interviewer hop in to correct you prematurely because they think your initial bad idea is ... well ... bad.
Your interviewer is going to be looking for how you can think through this problem, not (necessarily) that you get the optimal answer. Identifying pinch points, solving them, and not getting stuck is critical. But if you can think of the best solution go for it!
Bad Solution: Naive micro-buckets
Good Solution: Heap expirations
Great Solution: Use two pointers
2) Large number of incoming requests
So far we've been talking about how to handle a lot of views/writes, but what about reads? Given we have 1 minute between when a view happens and when it needs to be tabulated, the most natural solution is to add a cache. We can put a 1 minute TTL on the cache so results are never more stale than our requirement. When a request comes in, we can either serve it from cache or we query all Counters for the given heap of the request and then store the merged values back in the case.
Full design with cache