Hello fellow LinuxTV enthusiasts!
I spent yesterday’s post talking about API deprecation, and got onto a tangent about quality. For those of you who have a tuner which “generally works, but isn’t reliable enough to use in production”, I feel your pain. An unfortunately high percentage of the TV tuner products supported at all under Linux tend to have some glaring problems either from a reliability standpoint or in terms of application compatibility.
Not sure what I’m talking about? Let’s look at a couple of examples:
The HVR-12xx and 18xx: these are relatively popular PCIe hybrid tuners currently sold by Hauppauge. In pretty much all the variants out there the digital side of the board works well enough for production use (ATSC/ClearQAM). The analog side is a completely mixed bag though. If you have one of the older 1250 or 1800s, then there is some analog support, but it isn’t reliable enough to be used under MythTV (due to some race conditions in the driver that MythTV in particular exposes). If you have a 1255 or 1850, then you’ve got no analog support at all.
The HVR-950q: generally a pretty reliable driver for both analog and digital, except for the edge case which causes analog to lockup occasionally. Oh, and the IR support isn’t implemented. And did I mention you have to play around with some modprobe options to make it work under MythTV? This results in the “out of the box” experience is pretty lousy unless you know what magic incantations to cast.
The Nova-T 500: a PCI DVB-T tuner which finally is pretty reliable with the latest firmware, as long as you never do a soft-reboot of your PC, or else the card will be put into an unknown state. Oh, and because of the history of it being unreliable, there’s a ton of misinformation in the various docs and wikis out there about how to make the tuner work well.
I could go on and on. For practically every device I’ve worked on (which is over two dozen by this point), there are edge cases that people point out which effect the end-user experience.
You could argue I’m a pretty crappy engineer (and it’s easy to feel that way if you personally suffer from one of these issues). In reality though (and this is my opinion/speculation), this stuff is *hard* to get right. It can be fairly straightforward to get a new product up and running to the point where you’re seeing video, but then an enormous amount of additional effort is required to go from “basically works” to “rock solid”. Tuners provide real-time delivery at tens or sometimes hundreds of megabits per second, and unlike a hard drive, there is no “retransmit” mechanism. A couple of lost packets per second (out of an ATSC stream providing about 13,000/s) can result in pixelation, blocking, and a generally crappy TV watching experience.
Unlike getting a board to work, finding the edge cases really is hard work, and in some cases it highly disproportional to the work required to get the device to work in the first place. I could definitely share stories where I got a board up and running in a few hours, then spent the next two weeks chasing down a glitch in the video that only occurs once every ten to twenty seconds.
The problem is compounded by user’s antenna and reception conditions, the signal quality of whatever cable provider they are using, the applications the user is trying to use, and whether the user’s PC has interrupt latency problems perhaps introduced by other device drivers in the system. These things make it difficult to reproduce end-user problems, and in some cases you burn alot of time trying to understand whether the user is reporting a generic problem that always happens or something specific to his/her environment.
And let’s not forget that currently available products have a relatively short shelf-life, being replaced by newer products which often use different components. This means that a driver that takes a couple of months to write and get stable may only be useful in a product that is shipping for 12 to 18 months. For a driver developer, this always puts you between two extremes: Brand new products have low adoption in the beginning, meaning doing the work doesn’t help that many users. But should time really be spent writing drivers for a product that is no longer shipping or is near end-of-life?
Regressions make the situation even worse. Drivers for products that were working at one point can stop working when users update their kernels. This is usually a result of poor quality control combined with users submitting support for some new device in the same family as an existing device, which causes breakage for the existing device. You cannot really blame the user who caused the regression, since he probably won’t have all the products that are effected by a given driver. Another class of regression occurs when somebody makes improvements to the internal framework, which exposes bugs in individual drivers which relied on some implicit behavior. Again, it simply isn’t practical to test *every* product which is touched by the framework since no developer has every single product supported under Linux.
But it’s not like anybody is being paid to try out every board with every new kernel release. Hence regressions often go undetected until the kernel gets released, at which point real users are the first to actually try out the driver with their device and are shocked to find it now broken.
None of the above should be interpreted as a scathing review on the LinuxTV developers. They tend to be hobbyists who are not compensated for their contributions, and hence you could easily argue that users “get what they pay for”. And likewise for each of the class of problem described above, I’ve actually been a culprit of it myself. I’ve caused regressions in products unrelated to the one I was developing for. I’ve introduced device support that worked “good enough” but been loaded with edge cases. I’ve written new drivers without verifying compatibility with all the possible open source applications.
There are no easy answers to any of the above issues. Let’s take a minute and look at some possible things we could do:
- Early feedback from users trying out the drivers before the new kernels are released is very useful, but recompiling a kernel from source is something that many users are not up for (and given it can render their computer unbootable, I can hardly blame them).
- More time cleaning up warnings from the compiler and the Sparse tool is not a bad use of time, but in reality I don’t think a whole lot of real bugs are being fixed preemptively as a result of those tools. And in some cases well-meaning users providing patches for such warnings without being able to test them against the real hardware can actually result in causing subtle breakage in something that was previously working.
- More driver developers would certainly help, but it seems that in most cases once a board is working people are unwilling to invest any real time in using that knowledge to work on other boards, or even to help test to ensure that his/her board *stays* working as new kernel releases are being developed. There’s also a rather high barrier to entry in terms of learning curve, so developers have to be willing to make a significant investment to be useful to the project. And it goes without saying that NDAs and general lack of datasheet availability just increases this barrier to entry.
- Automated periodic regression testing of boards would be great (at least for some popular subset of boards). But this has costs both in terms of manpower to write/maintain the tools as well as actual costs in terms of hardware doing the testing. And while automated regression testing does catch some classes of bugs by exercising popular use cases, it often misses more subtle issues such as race conditions.
If you’ve read this far, you can see that I spend alot of time thinking about these sorts of issues. If anyone has ideas on how to improve the situation, constructive comments are always welcome below.