I frequently get asked about H.264 SVC, so I decided to write this post to share my thoughts. My background is in the service provider space, so let me first say that what I am about to talk about relates to SVC used for real time two way communication.
Lets first start with some background. Scalable Video Coding, (SVC) is the Annex G extension of H.264 AVC approved in July 2007. As the name implies, the idea behind SVC was to create a bit stream that allows scalable subset of bit streams that represent different resolutions, quality, and frame rates. You can think of SVC as layered stream where the decoder is allowed to decode all, or just the base subset of the streams depending on the desired quality. If all that is needed is the base stream, the decoder simply discards the packets from the other layers.
If you do a search on H.264 SVC, you quickly are overwhelmed by the flood of pages singing the praises of SVC. The sad part is, most of them are authored by marketing guys rather then technologist who understand the technical differences between SVC and standard H.264 AVC.
Lets first take a look at some of the proclaimed benefits of SVC and the reality when it comes to service providers.
SVC allows up to 20% packet loss without effecting quality.
A number of vendors use SVC layers to provide redundancy information to a bit stream. Providing redundancy is nothing new, a 8 disk RAID 5 array can lose 1 disk without any information lost. A RAID 6 array is able to lose 2 disks without losing any data. This redundancy however comes at a cost, depending on the desired level of redundancy extra data is sent making the total bit stream 10 – 30% larger. Many times this increase in bandwidth may itself lead to quality issue, however lets for right now pretend that it is worth sending the extra data. I have seen many tests showing how great using SVC layers to provide redundancy is. They show lots of loss and you can clearly see that quality is far superior to standard AVC. However, if you dig deeper you start to see the flaw of this approach. If you run a IP network, you know that most of your packet loss is not random, it is burst mode. A routers queue gets full and a bunch of packets are dumped on the floor. In cases like that, the most common packet loss on the internet, redundancy information does not do a thing because the base stream AND and the redundant information are both lost!
SVC allows a single encoder to provide different resolution, quality, or frame rates.
Yes, this is very true, in fact, it is what SVC was designed for. The issue becomes how relevant this is in two way real time video communication on the internet. The problem is that ALL of this data is in the full bit stream. As I mentioned before, if a decoder does not need the more advanced levels, it simply discards the packets, they are still sent! Many vendors have a solution to this, you simply buy their boxes and put them all over your network and your customers networks. The devices then drop the unneeded layers. While I find this approach great from the guys selling more boxes, from a network engineering side there are many drawbacks.
SVC is a standard, just like H.264 AVC
Again, this is correct, however just because there is a standard for SVC it does NOT mean that vendors who are deploying SVC are able to inter operate. Lets face it, in our industry most of our “standards” are know as RFCs out of the IETF. I am a big fan of the IETF and I love the process. It has HUGE some advantages over other standards bodies such as the ITU where standards become inch thing documents and the process is sometimes more political then technical. At the same time, lets not forget that RFC stands for “Request For Comment”, many times they are only a dozen or so pages in length where much of the implementation is left to the reader. Some “standards” are only internet-drafts that are published by only 1 or 2 people that never turn into a RFC from a company who just wants to say they are following a “standard”. We are still today working out issues with H.264 AVC over the public internet, SVC with a much smaller following in my opinion will never be “standardized” by the market, something that is much more crucial then the technical standard.
SVC provides a totally new way of offering low cost conferencing
One interesting approach to SVC is the idea of an application router that simply routes SVC layers between users rather then a typical MCU approach that decodes and then encodes the video stream. While I like this is interesting, I don’t think it is more then a niche play. First let me point out that this idea is far from new. Skype and others who do not have centralized MCUs have been routing streams between users for years with H.264 AVC and other CODECs. A big problem with this approach is that it shifts the burden from a centerlized MCU to in many respects the client. If 8 people are in the conference, it is forced to decode 7 video streams rather then just one. The bandwidth that is required to receive and send video to / from 7 other users is far greater then a single flow to a MCU. To make matters worse it goes in the opposite direction that the industry has been trying to move. We have been working on technologies like rtcp-mux and bundle to allow voice, video, + the 2 RTCP flows to all be on one flow rather then 4. With the application router approach, even if rtcp-mux and bundle are used, you still require port bindings for each of the other participants audio, video and signaling streams.
SVC and quality based on users bandwidth.
While closely related to encoder being able to provide different resolutions, quality or frame rates, this is more focused on the end device. The issues of last mile bandwidth has always been a real issue when it comes to two way video services. The idea here is that the device can somehow request just the layers it has bandwidth for rather then everything. However if we look at last mile bandwidth closer, this is trying to fix the wrong direction! The problem lies in that fact that most last mile bandwidth is asymmetrical, I wont bore you with the reasons, but upload speed is normally a fraction of download. If a device could back off on what is is receive (the big direction), it does nothing for what it is sending. Does the device simply send AVC, or does it try to send base plus high quality layers using SVC up the small bandwidth side of the pipe? If one wants to be smarter about this, a better approach would be to use standard AVC on both ends and allow both encoders to be adjusted in real time rather then at the start of the call.
SVC encoding speed
The SVC guys like to say its faster, but I am sorry, in the real world it is not. Yes, encoding 1080p + 720p + VGA + QVGA with H.264 is generally slower then SVC with QVGA + the other layers, however why would we be sending ALL of that to one device? You are much better off encoding just the required video feed for each endpoint rather then trying to encode everything. The SVC guy will then say, but if you do it once with SVC you can send that same feed to everyone. While I think their are applications where the exact same content can be sent to every user, in the real world this is rarely the case. Different users want different layouts, options, etc, the world of pushing the same static thing to everyone is going away, people want it their way! If you add to this the issue of CPU and especially DSP optimization for AVC over SVC, it becomes even more clear.
Is SVC good for anything? Yes, believe it is, SVC is great for the content storage industry allowing them to store one video file with many different option levels. It’s great for the security industry, allowing video feeds to be processed by systems at a base level and then expanded in quality if something needs to be analyzed further. However, I think most of the people making noise about SVC today are just looking for marketing spin, rather then real technical advantages.