A Post on LinuxTV Quality

Hello fellow LinuxTV enthusiasts!

I spent yesterday’s post talking about API deprecation, and got onto a tangent about quality. For those of you who have a tuner which “generally works, but isn’t reliable enough to use in production”, I feel your pain. An unfortunately high percentage of the TV tuner products supported at all under Linux tend to have some glaring problems either from a reliability standpoint or in terms of application compatibility.

Not sure what I’m talking about? Let’s look at a couple of examples:

The HVR-12xx and 18xx: these are relatively popular PCIe hybrid tuners currently sold by Hauppauge. In pretty much all the variants out there the digital side of the board works well enough for production use (ATSC/ClearQAM). The analog side is a completely mixed bag though. If you have one of the older 1250 or 1800s, then there is some analog support, but it isn’t reliable enough to be used under MythTV (due to some race conditions in the driver that MythTV in particular exposes). If you have a 1255 or 1850, then you’ve got no analog support at all.

The HVR-950q: generally a pretty reliable driver for both analog and digital, except for the edge case which causes analog to lockup occasionally. Oh, and the IR support isn’t implemented. And did I mention you have to play around with some modprobe options to make it work under MythTV? This results in the “out of the box” experience is pretty lousy unless you know what magic incantations to cast.

The Nova-T 500: a PCI DVB-T tuner which finally is pretty reliable with the latest firmware, as long as you never do a soft-reboot of your PC, or else the card will be put into an unknown state. Oh, and because of the history of it being unreliable, there’s a ton of misinformation in the various docs and wikis out there about how to make the tuner work well.

I could go on and on. For practically every device I’ve worked on (which is over two dozen by this point), there are edge cases that people point out which effect the end-user experience.

You could argue I’m a pretty crappy engineer (and it’s easy to feel that way if you personally suffer from one of these issues). In reality though (and this is my opinion/speculation), this stuff is *hard* to get right. It can be fairly straightforward to get a new product up and running to the point where you’re seeing video, but then an enormous amount of additional effort is required to go from “basically works” to “rock solid”. Tuners provide real-time delivery at tens or sometimes hundreds of megabits per second, and unlike a hard drive, there is no “retransmit” mechanism. A couple of lost packets per second (out of an ATSC stream providing about 13,000/s) can result in pixelation, blocking, and a generally crappy TV watching experience.

Unlike getting a board to work, finding the edge cases really is hard work, and in some cases it highly disproportional to the work required to get the device to work in the first place. I could definitely share stories where I got a board up and running in a few hours, then spent the next two weeks chasing down a glitch in the video that only occurs once every ten to twenty seconds.

The problem is compounded by user’s antenna and reception conditions, the signal quality of whatever cable provider they are using, the applications the user is trying to use, and whether the user’s PC has interrupt latency problems perhaps introduced by other device drivers in the system. These things make it difficult to reproduce end-user problems, and in some cases you burn alot of time trying to understand whether the user is reporting a generic problem that always happens or something specific to his/her environment.

And let’s not forget that currently available products have a relatively short shelf-life, being replaced by newer products which often use different components. This means that a driver that takes a couple of months to write and get stable may only be useful in a product that is shipping for 12 to 18 months. For a driver developer, this always puts you between two extremes: Brand new products have low adoption in the beginning, meaning doing the work doesn’t help that many users. But should time really be spent writing drivers for a product that is no longer shipping or is near end-of-life?

Regressions make the situation even worse. Drivers for products that were working at one point can stop working when users update their kernels. This is usually a result of poor quality control combined with users submitting support for some new device in the same family as an existing device, which causes breakage for the existing device. You cannot really blame the user who caused the regression, since he probably won’t have all the products that are effected by a given driver. Another class of regression occurs when somebody makes improvements to the internal framework, which exposes bugs in individual drivers which relied on some implicit behavior. Again, it simply isn’t practical to test *every* product which is touched by the framework since no developer has every single product supported under Linux.

But it’s not like anybody is being paid to try out every board with every new kernel release. Hence regressions often go undetected until the kernel gets released, at which point real users are the first to actually try out the driver with their device and are shocked to find it now broken.

None of the above should be interpreted as a scathing review on the LinuxTV developers. They tend to be hobbyists who are not compensated for their contributions, and hence you could easily argue that users “get what they pay for”. And likewise for each of the class of problem described above, I’ve actually been a culprit of it myself. I’ve caused regressions in products unrelated to the one I was developing for. I’ve introduced device support that worked “good enough” but been loaded with edge cases. I’ve written new drivers without verifying compatibility with all the possible open source applications.

There are no easy answers to any of the above issues. Let’s take a minute and look at some possible things we could do:

  • Early feedback from users trying out the drivers before the new kernels are released is very useful, but recompiling a kernel from source is something that many users are not up for (and given it can render their computer unbootable, I can hardly blame them).
  • More time cleaning up warnings from the compiler and the Sparse tool is not a bad use of time, but in reality I don’t think a whole lot of real bugs are being fixed preemptively as a result of those tools. And in some cases well-meaning users providing patches for such warnings without being able to test them against the real hardware can actually result in causing subtle breakage in something that was previously working.
  • More driver developers would certainly help, but it seems that in most cases once a board is working people are unwilling to invest any real time in using that knowledge to work on other boards, or even to help test to ensure that his/her board *stays* working as new kernel releases are being developed. There’s also a rather high barrier to entry in terms of learning curve, so developers have to be willing to make a significant investment to be useful to the project. And it goes without saying that NDAs and general lack of datasheet availability just increases this barrier to entry.
  • Automated periodic regression testing of boards would be great (at least for some popular subset of boards). But this has costs both in terms of manpower to write/maintain the tools as well as actual costs in terms of hardware doing the testing. And while automated regression testing does catch some classes of bugs by exercising popular use cases, it often misses more subtle issues such as race conditions.

If you’ve read this far, you can see that I spend alot of time thinking about these sorts of issues. If anyone has ideas on how to improve the situation, constructive comments are always welcome below.

5 thoughts on “A Post on LinuxTV Quality

  1. I would recommend to create a test suit.

    It probably won’t be fully automatic, as e.g. IR remote control can’t be easy automated. Tuner testing needs valid signal and analog tv encoders are rare. DVB muxers are not only rare but also expensive.

    Still, it would be great if there is a way to test the full functionality of device and all known edge cases. It would help at bugreports and if there are enough enthusiasts, it could provide wide enough test net, at least for kernel release regressions.

  2. Devin Heitmueller

    Hello iive,

    As I pointed out, a regression test suite would indeed be nice to have, but the costs here are in both time and money. Somebody has to build a relatively comprehensive test suite, and then it has to be hosted. I’ve got something like 25-30 products alone, and in order to host it I would need a dozen PCs as well as numerous signal sources. Also, in many cases it is necessary to do things like change cabling (for example, for tuners that support multiple possible signal types on a single input of different modulations). This doesn’t even account for the ongoing maintenance necessary to analyze the results and fix bugs.

    We cannot even get people to look at Hans’ daily build report and fix compile-level problems on a regular basis. We cannot even ensure that the media_build tree compiles successfully! These are things that are black and white (either it compiles or it doesn’t), the fixes are usually trivial, and nobody is willing to do that even level of ongoing maintenance. Do we really think we have resources to do any *real* testing?

    Devin

  3. A driver writer has to satisfy a number of interface specifications, some formal and some in constant flux:

    1) hardware control interfaces
    2) hardware DMA engine and interrupt unit
    3) basic linux kernel infrastructure: PCI, USB, modules, deferred work handlers, mutexes, spinlocks, file handles, device model, driver model, etc.
    4) DVB API
    5) V4L2 API
    6) ALSA API
    7) V4L2 infrastructure (to help driver writers implement the V4L2 API)

    Those all have some level of information hiding, which makes performing proper locking and getting decent performance difficult until you dig into all the gore of all the helpers.

    And the V4L2 API is by far the biggest hurdle. The spec is just huge. There are over 50 different ioctl()’s alone. There are also a good number of TV formats worldwide that driver writers should consider when bringing things up.

    Many of the drivers I consider “in bad shape” have implemented a portion of the V4L2 API in such a way that it is hard to move forward and implement the rest. Quick hacks to making something work for limited use cases, e.g. only for NTSC or only for PAL-B/G, add to the quality perception problem.

  4. @Devin,
    Forget automated testing. Just give users (and developers) something to test the driver for their hardware. Something that doesn’t involve hacking the source on their own.

    • Devin Heitmueller

      Hi iive,

      We can barely get end-users to install the current media_build tree and try it with regular applications like mplayer or tvtime. Forget about the weird edge cases – there are many products that are flat out broken (e.g. crashes the system on driver load or on first attempt to capture video).

      Worse, we have lots of reports of this nature (for those cases where a user actually does try the device), and very few people to triage them and do fixes. So in many instances even if a user does go through all the trouble to try the latest code, there is no developer who both has the hardware *and* actually cares enough to debug the issue and make a fix.

      For evidence, just look at how often you see posts on linux-media that say things like “I plugged in board X and my kernel panic’d” where nobody even replies to the email.

      Devin

Leave a Reply