Video streaming platforms are one of the most challenging web apps to develop. You can find resources that talk about designing the backend (all the major resources are for the backend only), but there is hardly any resource that talks about designing the frontend system.
Real-life examples:
From your favorite stand-up comedy to your favorite anime to your favorite sport, we can binge-watch all this from the comfort of any device and live thanks to the video streaming platforms.
Video streaming platforms are engineering marvels where the backend and frontend are equally difficult to design and develop as we have to deal with multiple complexities such as video encoding, adaptive streaming, content delivery networks, user interface design, and more.
In designing the frontend system of a video streaming platform like Youtube, it is crucial to prioritize a seamless user experience with intuitive navigation, responsive layouts, and fast loading times. Additionally, incorporating features like personalized recommendations, user-generated content interactions, and social sharing functionalities can enhance user engagement and retention on the platform.
It is a vast topic to be covered in this article. Rather, we will just cover the important things that are at the core of the frontend or the web app of any video streaming platform.
Create a video streaming platform like YouTube or Amazon Prime where the user can browse through the recommendations, search for or query the videos, and play them.
A video streaming platform can be split into two different parts, each of which could be tackled independently.
We will first design the website's skeleton, and then we will discuss the design of the video player. This will help us cover all the features.
A video streaming website heavily relies on SEO for growth and business, so the rendering strategy that will have to be applied should provide optimal SEO results. At the same time, video streaming websites are also very performance-intensive, so we have to keep them fast.
We can use any JavaScript framework that provides incremental static re-rendering, like Nextjs or NuxtJs, for developing the web app.
Youtube uses server-side rendering with the 3-tier rendering strategy of "as little as possible, as early as possible," popularized by Facebook. Delivering only what we require is important, and we should aim to have the resources arrive just in time.
It splits the rendering into three parts:
In the first part, YouTube renders the skeleton UI on the server side, which loads the page extremely fast with all the necessary SEO meta and adhering to the core web vitals.
In the second part, it hydrates the skeleton UI with the video meta's like the text and the thumbnail, which are included in a JavaScript bundle.
When hovering over the video, you won't see the video duration, as it is just an image placeholder.
And after that, the playback bundle of JavaScript is loaded parallely, which handles all user actions and hydrates the video cards with the video details so that they can be played on hover.
You can review the same by running YouTube.com in slow 3G network throttling.
Netflix, on the contrary, renders the HTML page with all the images and video meta data populated.
You will notice that on the slow network, the video thumbnails are still there, but once JavaScript is loaded and parsed, the images will scale on hover and the video will be autoplayed.
Examining both popular websites, it is clear that we can use server-side rendering along with progressive and selective hydration to get optimal results. Images can be pre-loaded so that they can be visible as early as possible.
What and how you want to do the hydration is completely subjective and comes from the various experiments and lots of researchs within the organizations.
This will provide us with the fast load time of client-side rendering and better SEO because of server-side rendering, providing a great experience to the users.
We are going to rely heavily on client-side caching for this type of application.
On the homepage of Netflix, except for the recommendation list and the new arrivals that can be updated frequently, all the genre lists can be cached for a given amount of time to boost performance.
Similary, the preference, the theme, and the settings of the website can be cached to avoid round-trips to the server.
Thus, we can use React-Query for query caching and Zustand or Redux for state management.
We can perform state management using the unidirectional flow of the Flux architecture, which Zustand and Redux both use.
Zustand provides a more native experience with hooks, making it easier to work with. Redux with the toolkit can also achieve the same, but it could become complex to scale the state.
The video list and recommendations can be in the shared state, while the video playback data will be in the local state as they will be specific to each video.
Given that all major video streaming platforms follow a design system for consistency and to provide the same user experience across different platforms, we can adopt the same.
All the components can be styled using styled-components, which helps to write CSS in JavaScript, making it easier to extend or mutate the styles of the existing components.
But you can also choose to have a separate style sheet for your application.
Because we have to create a responsive application, we can follow the approach of progressive enhancement while styling, which enforces mobile-first development and progressively updating the styles for different screen sizes. This results in less CSS.
If you are opting for the desktop first approach, you do the graceful degradation of the style, in which you will have to override the styles for lower screen sizes.
Learn about 3 different ways to write CSS in React.
It is important to focus on accessibility and internationalization during component creation and styling so that each component is designed to provide the best user experience and support different languages and typographies.
Accessibility refers to the practice of making web content and applications usable for people with disabilities, ensuring that all users can access and navigate through the components easily. This includes considerations such as providing alternative text for images, using proper semantic markup, and ensuring keyboard accessibility.
Internationalization, on the other hand, involves adapting the components to different languages and cultural preferences, allowing users from various regions to understand and interact with the content effectively. By incorporating accessibility and internationalization into component creation and styling, developers can create inclusive.
The website should be accessible to all the different types of users with disabilities through keyboard, mouse, touch devices, and screen readers.
Similarly, users from different demographics and languages should also be supported. All types of fonts (Japanese is the most challenging language to handle as they write from top to bottom). We should also be able to provide support to the languages that are written from right to left, like Hebrew, Arabic, and Urdu.
From the requirements for the website, we can finalize the routes that we are going to need in the application.
To design the current system, we will need only two routes.
The reason why the single video page is taking the video URL as a query parameter on the watch page is that we can also pass additional details in the parameter, like whether the video is subtitled or not, the time from which the video should start playing, etc.
You can have it as a request parameter /watch/:video-slug?q=""
which is also fine.
Depending on the platform, we can have public as well as private pages that require the user to be authenticated to access them. For example, in Netlfix, we can browse the homepage and see a part of the video or trailer, but we will have to login for full access.
We can define outlets in React-Router-V6 to redirect the routes if not authroized.
privateRoutes.jsimport React from "react"; import { Outlet, Navigate } from "react-router-dom"; import useStore from "../store/user"; const Private = () => { const isLoggedIn = useStore((state) => state.isLoggedIn); return isLoggedIn ? <Outlet /> : <Navigate to="/login" />; }; export default Private;pulicRoutes.js
import React from 'react'; import { Outlet, Navigate } from 'react-router-dom'; import useStore from '../store/user'; const Public = () => { const isLoggedIn = useStore((state) => state.isLoggedIn); return !isLoggedIn ? <Outlet /> : <Navigate to="/dashboard" />; }; export default Public;route.js
import { BrowserRouter, Routes, Route } from "react-router-dom"; import Login from "./pages/login"; import Signup from "./pages/signup"; import PrivateRoutes from "./routes/private"; import PublicRoutes from "./routes/public"; const App = () => { return ( <BrowserRouter> <Routes> <Route path="/" element={<PublicRoutes />}> <Route index element={<h1>Browse</h1>} /> <Route path="login" element={<Login />}></Route> <Route path="signup" element={<Signup />}></Route> </Route> <Route path="/" element={<PrivateRoutes />}> <Route path="dashboard" element={<h1>Dashboard</h1>}></Route> </Route> <Route path="*" element={<NotFound />} /> </Routes> </BrowserRouter> ); }; export default App;
There can be restrictions on the page as well, pushing the user to login or sign up to perform certain actions.
Referencing both YouTube and Netflix, or any video streaming website, we can think of the following common components:
We will need a single module to display the list of videos; now this list can be vertical (YouTube) as well as horizontal (YouTube and Netflix).
The card and the module will both be presentational components that will have the local state only for navigation or lazy loading of the details.
The data that has to be viewed in these components will be passed to them as props. This is known as the Hook/View component design pattern, which helps to maintain separation of concern while making the components reusable.
In both applications, there is a provision to create a family account to onboard friends and family; thus, providing a good authentication and authorization experience is a vital part of the frontend application.
Youtube, being a part of the Google family, allows single sign-on on Gmail's credentials along with two-factor authentication.
Later, if you opt for the premium features, you can add limited users by using the email addresses of your friends and family to share it with them.
Which means the email address is the primary key for authentication and authorization on Youtube.
Similar to Netflix, you can login with an email and password for password-based authentication or a phone number + otp for password-less authentication, and then later take the email address as a secondary identifier.
Providing quick access to the application is key to user retention, and then, using the email or phone number, the account can be shared with a limited set of users.
After a successful login, either token-based or cookie-based authentication can be used. A token-based approach is well suited for the microservice architecture.
Read more about authentication and authorization best practices.
Data modeling for the video streaming platform is fairly straightforward, as we don't have to store any type of media on the client side. The CDN URL of the media will be provided, which we will have to consume for streaming.
The following can be considered as the architecture of the video streaming platform with the feature set that we are designing:
All the API responses will be cached using query-cache and will persist in local storage. We can also define the stale time for the cache to invalidate them and pull the new data.
Things that are common to the components, like the video's current playtime, playback rate, etc., will be stored in the local state.
Things that have to be shared, like auto-next-play, have closed captions enabled for all videos, etc., should be stored in the shared state.
Having a normalized state will really help to create a robust and scalable frontend application.
Exploring one of the most important features, which is the list of videos by recommendation, the new arrival, or the genre,.
There is no need to load all the data at once, but rather the twice-list of videos that will be visible at a time on the screen width can be pulled for each category, and all others can be lazy loaded.
To avoid redundancy and duplication of data, we will follow the byId and allId structures for state normalization.
A single video data schema could be like this:
{ id: "xxx-xxx-xxx", thumbnail_url: "xxx-xxx-xxx.jpg", title: "abcd", description: "", created_on:"", updated_on:"", chapters: [], pulisher_details: { }, video_meta: { available_resolutions: [], playback_speed: [], subtitles: [], closed_captions: [] } }
And in the normalized state, we can store the video details as IDs.
const normalized_data = { byIds: { "video-1": { id: "video-1", thumbnail_url: "xxx-xxx-xxx.jpg", title: "abcd", description: "", created_on:"", updated_on:"", chapters: [], pulisher_details: { }, video_meta: { available_resolutions: [], playback_speed: [], subtitles: [], closed_captions: [] } }, "video-2": { id: "video-3", thumbnail_url: "xxx-xxx-xxx.jpg", title: "abcd", description: "", created_on:"", updated_on:"", chapters: [], pulisher_details: { }, video_meta: { available_resolutions: [], playback_speed: [], subtitles: [], closed_captions: [] } }, "video-3": { id: "video-3", thumbnail_url: "xxx-xxx-xxx.jpg", title: "abcd", description: "", created_on:"", updated_on:"", chapters: [], pulisher_details: { }, video_meta: { available_resolutions: [], playback_speed: [], subtitles: [], closed_captions: [] } } } }
And then, for each category, we can have an array with the list of video IDs.
const normalized_data = { byIds: { ... }, categorized: { recommended: { label: "", description: "", videos: ["video-1", "video-2", "video-3"] }, commedy: { label: "", description: "", videos: ["video-1", "video-2", "video-3"] }, action: { label: "", description: "", videos: ["video-1", "video-2", "video-3"] }, } }
This will allow us to have a single source of data for the videos, and whenever a new video is fetched, it will be stored by ID, and its ID will be added to the list.
Also it allows fetching the list of videos in linear time and accessing the video details in constant time.
const getVideoById = (id) => { return normalized_data.byIds[id]; } const getVideosByCategory = (category) => { return normalized_data.categorized[category].videos.map((e) => getVideoById(e)); }
On the browse page, use as little data as possible, and all the other video details can be pulled when the single video page is loaded.
As this data is specific to the video, it does not have to be shared and can be kept in the local state.
The recommendation list on the browse page and the single video page can be different, so they can be stored in separate categories.
To meet the requirements of the feature sets we are designing, we are going to need only two APIs.
To get the list of videos by category, we can have a single API with a query parameter that will fetch us the videos by genre or category.
1. Path: /list?category="recommendation"&start=0&limit=10, method: GET
request-payload: null response:200 - ok response-payload: [{ id: "xxx-xxx-xxx", thumbnail_url: "xxx-xxx-xxx.jpg", title: "abcd", description: "", created_on:"", updated_on:"", chapters: [], pulisher_details: { }, video_meta: { available_resolutions: [], playback_speed: [], subtitles: [], closed_captions: [] } }]
We have added the pagination option to the API to lazy load the list.
Second, to get the details of the single video.
Path: (/watch?v="video:slug") or (/watch/:video-slug), method: GET
request-payload: null response:200 - ok response-payload: { id: "xxx-xxx-xxx", thumbnail_url: "xxx-xxx-xxx.jpg", title: "abcd", description: "", created_on:"", updated_on:"", chapters: [], pulisher_details: { cast_details: {}, crew_details: {}, }, video_meta: { resolutions: [], playback_speed: [], subtitles: [], closed_captions: [], audio_tracks: [] } }
It will contain the complete details of the video with the slug.
Performance is the key to creating video streaming platforms; every aspect has to be considered where the application can be optimized.
Video player, or playback, is the core of video streaming platforms and requires extensive knowledge to provide the best possible user experience to the users.
In this part, we will see how the streaming moguls have designed their video player over time and what we should have knowledge about so that we can design the same.
Before exploring video player creation or designing the video playback experience, let us understand many things regarding media streaming over the internet.
These glossaries will help you understand things better:
The simplest way to render and play a video on the web is by including a video source in the video HTML element. It can directly source the video in multiple formats and play it.
<video width="320" height="240" controls> <source src="movie.mp4" type="video/mp4"> <source src="movie.ogg" type="video/ogg"> Your browser does not support the video tag. </video>
Multiple sources are added as fallback; if the video tag is not supported, then the text passed after the source will be displayed.
While it is possible to render a high-quality video with the native HTML5 video element, it does not provide the optimization to serve the video on the flaky internet.
HTML5 video elements use the conventional approach of progressive video downloading, in which the videos are linearly streamed. The video will be downloaded completely, irrespective of the available network bandwidth of the user, and it will start playing as soon as it has enough data downloaded for uninterrupted playback. The video is played simultaneously as it is being downloaded.
The video can be sought at any given point using the HTTP range request. The range request downloads the appropriate segment to resume the playback. Media players can benefit from an HTTP Range request, which instructs the server to return only a portion of an HTTP message to the client in order to facilitate random access to a file.
This conventional method of progressive download is a very expensive way of streaming video, as the same quality of video will be downloaded irrespective of the user's network bandwidth and device capabilities. For example, even if you are on a mobile device where you can view only 720p video, the full 1080p (HD) video will be downloaded.
All the modern video streaming giants use adaptive bitrate streaming (ABR) to provide the best possible video streaming experience to their users, irrespective of their network bandwidth and device capabilities.
Adaptive bitrate streaming introduces dynamic delivery of the videos. With ABR, the video is split up into sections that are manageable, each of which is encoded at many different quality levels and sent to the client by the streaming server. The best-quality portion that can be broadcast smoothly at that moment is chosen by the streaming client, which is the video player that is currently being utilized, after analyzing the viewer's internet connection. This implies that in order to prevent interruptions to the video playing, the video quality is automatically modified if the viewer's internet speed decreases.
For example, an uploaded video is converted to multiple video formats, and then it is broken into multiple segments of length anywhere between 2 and 12 seconds. Which means a 2-hour-long video broken into chunks of 10 seconds each would yield 720 segments.
segments/These segments will be downloaded as the user keeps streaming while maintaining a buffer in case of a fluctuating network connection.
ABR takes this one step further by creating a video of multiple resolutions (1080p, 720p, 480p, 360p, and 240p) of each format and creating segments in each of those.
1080p/This makes it easier to serve the best-quality videos to the user depending on their network bandwidth and their device's support, providing the experience of viewing the same video on slower networks.
It also helps them to switch the video quality in case a poor network is encountered, as the next segment can be pulled from the lower quality. All that had to be made sure of was that all the segments were equally divided for each resolution.
The manifest files that store the segment details are in charge of handling this video segmentation. They follow a master slave pattern, where there is a master manifest file that has the details of all the available resolutions, and then a manifest file for each resolution has the segment details.
In adaptive streaming, the manifest files are passed to the video source, which contains details about the segments and the video resolution using which the videos will be streamed on-demand.
Sample maifest file of the DASH protocol (.mpd file):
🔒 Please enroll into the course to get the full access