What's TinyURL?
TinyURL (opens in a new tab) is a URL shortening service that allows users to convert long URLs into short URLs. Similar services include bit.ly, goo.gl and t.co.
Why do we need TinyURL?
URL shortening is used to create shorter aliases for long URLs. We call these shortened aliases “short links.” Users are redirected to the original URL when they hit these short links. Short links save a lot of space when displayed, printed, messaged, or tweeted. Additionally, users are less likely to mistype shorter URLs.
For example, if we shorten the following URL through TinyURL:
https://interview.ligangyan.com/system-design/problems/tinyurl
We would get:
https://tinyurl.com/vzet59pa
The shortened URL is nearly one-third the size of the actual URL.
URL shortening is used to optimize links across devices, track individual links to analyze audience, measure ad campaigns' performance, or hide affiliated original URLs.
If you haven’t used tinyurl.com before, please try creating a new shortened URL and spend some time going through the various options their service offers. This will help you a lot in understanding this chapter.
Requirements
Start your interview by defining the functional and non-functional requirements. For user facing applications like this one, functional requirements are the "Users should be able to..." statements whereas non-functional defines the system qualities via "The system should..." statements.
Prioritize the top 3 functional requirements. Everything else shows your product thinking, but clearly note it as "below the line" so the interviewer knows you won't be including them in your design. Check in to see if your interviewer wants to move anything above the line or move anything down. Choosing just the top 3 is important to ensuring you stay focused and can execute in the limited time window.
When evaluating non-functional requirements, it's crucial to uncover the distinct characteristics that make this system challenging or unique. To help you identify these, consider the following questions: CAP theorem: Does this system prioritize availability or consistency? Note, that in some cases, the answer is different depending on the part of the system -- as you'll see is the case here. Read vs write ratio: is this a read heavy system or write heavy? Are either notably heavy for any reason? Query access pattern: Is the access pattern of the system regular or are their patterns or bursts that require particular attention. For example, the holidays for shopping sites or popular events for ticket booking.
1. Functional Requirements
Core Requirements:
- The shorter URL should be unique and not guessable.
- A user should have the option of picking a custom shorter URL.
- URLs will expire either after a user specified timespan, or if one is not given, a default time.
- Redirection should happen in real-time with minimal latency.
2. Non-Functional Requirements
Core Requirements:
-
The system should handle a large number of requests efficiently, providing fast redirections with minimal latency and scaling horizontally to accommodate increased load.
-
Security and Reliability: The service should ensure secure HTTPS redirection, prevent brute-force attacks, and maintain high reliability with minimal downtime, aiming for 99.99% uptime.
-
Usability and Maintainability:The system should be user-friendly with clear feedback, easy URL management, and a maintainable codebase that supports future updates and enhancements.
3. Capacity estimation
We now have our access patterns for the URLs. The next thing we look at is bandwidth and storage requirements.
We have 10 million URLs being created a month, which means around 4 creations a second. If we assume URL access is uniform then we will have 40 accesses a second. The one thing we don’t know yet is the size of a request. Let’s call this x
bytes for the time being.
This means our bandwidth requirement for creation would be 4x
bytes, and 40x
bytes for reading.
In terms of storage, let’s take a worst case scenario. If we create 10 million URLs a month for a year we will have 120 million URLs. Therefore our storage requirements will be 120,000,000x
bytes.
Core Entities Design
根据我们的Delivery Framework,我们收集完成requirements后,我们可以开始核心实体的设计。在这个阶段,我们没有必要一下子就列出每个实体的所有细节,但是我们至少要根据我们的需求,将必要的实体都列举出来,随着我们设计流程的推进,我们可以再逐渐补充列细节,或者更多实体类。
TinyUrl的系统设计,目前我们只需要设计一个实体类就可以,那就是URL。我们可以定义一个URL类,包含以下属性:
- originalUrl
- shortUrl
- shortKeyl
- expiryAt
- createAt
System interface Design
Our system has two main functionalities: creating and retrieving URLs. It’s reasonable to assume we will use a REST format, with a POST request to create a URL, and a GET request to retrieve a URL for redirection.
Some terminology before we dig in. Thinking back to the introduction, we will be providing shortened URLs of the form https://www.shorturl.com/{key}
, where the key is a random string. In the rest of our design we will not think about this fully qualified URL, instead worrying only about the key.
We have used Java Spring Boot (opens in a new tab) to define the endpoint for creating a new URL.
@PostMapping("/url")
public ResponseEntity<Url> createUrl(
@RequestBody Url shortUrl
) {
...
}
Where the Url
object resembles the below.
{
"longUrl": "<Value of URL to be shortened>",
"key": "<Optional, requested path of short URL>",
"expiryTime": "<Optional, requested expiry time in seconds>"
}
Retrieving a URL can then be done via
@GetMapping("/{key}")
public RedirectView getUrl(
@PathVariable(**"**key**"**) String key
) {
...
}
Our response codes could then be:
302
: We have found the long URL relating to the input path, and will be redirecting to it. All this does is force the browser to make a secondGET
to the new URL. Note, we need to include the (opens in a new tab)[location](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/302)
header (opens in a new tab) with the original URL.404
: This shortened URL does not exist.
预估 假设读写比例200:1, read/write ratio 100million 每月写入 100,000,000/30/24/3600=40urls/s 每秒写入40 40*200 = 8000每秒读
shortening algorithm For shortening a url we can following two solutions
- url encoding
- base62. base62 就是0-9,a-z,A-Z 10+26+26=62
- md5
- key generation service
5位base62 = 62^5 = 916m 6 -> 62^6 = 56b 7->62^7=3500b
technique 1 - short url from random numbers 我们可以从字母表中随机取7次,生成一个字符串,然后取数据库检查是否存在,如何存在的话,就重试以上过程。
private static final int NUM_CHARS_SHORT_LINK = 7;
private static final String ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";private Random random = new Random();public String generateRandomShortUrl() {
char[] result = new char[NUM_CHARS_SHORT_LINK];while (true) {
for (int i = 0; i < NUM_CHARS_SHORT_LINK; i++) {
int randomIndex = random.nextInt(ALPHABET.length() - 1);
result[i] = ALPHABET.charAt(randomIndex);
} String shortLink = new String(result); // make sure the short link isn't already used
if (!DB.checkShortLinkExists(shortLink)) {
return shortLink;;
}
}
}
Technique 2- 基于转换 Technique 3- MD5 Technique 4-KGS