Adventures in OTA Land
Vrlo interesantan post od Steve Kondik-a, narocito za sve koji se pitaju sto nema OTA ili zasto je toliko kasnio...
Hey everyone.. I wanted to write a little bit about the crazy adventure we had when rolling out Cyanogen OS 12. This was possibly the most troublesome OTA we've done for any device yet, not because of the software (which is pretty damn good IMHO), but because of the process required to actually install it when coming from CM11. This is pretty technical stuff, but I wanted to be transparent about it and mostly get it off my chest because it was as frustrating for us as it was for some of you.
Moving up to L was a big project. This was only the second time in the history of CM where we decided to "start over" using a clean AOSP + CAF base, reevaluate our features vs. the new stuff Google did for L, and bring all of our stuff back in. We did this when ICS came out too. It's a bit more than a "rebase" since large amounts of code had to be refactored or redesigned due to the extensive platform changes in L. It's good to do this from time to time when dealing with an upstream like AOSP, because you can make sure your code stays "fresh" and works well with the system. It took longer than we had anticipated, but we did the bulk of the work in the open so that diehards could use the nightly builds and so that other ROMs could build off of us too while we got it production ready.
With all that out of the way, we moved onto certification. And of course, this was not a "fire and forget" situation either as it had been in the past. The first issue we hit was with the camera stack. While we have our own camera application in CM, the factory team owns the actual imaging stack as it integrates a number of third-party products that we surface as features like Clear Image, Smart Scene, and RAW capture. Unfortunately that stack was designed for Kitkat and doesn't pass certification. While we have our own stack that did pass, it didn't have these third-party features and we wanted to avoid axing these awesome features at all costs. So we got that sorted out and while the solution isn't perfect, the quality and feature set is still good as it has been. Simultaneously, Google released a new test suite which "fixed" one test which you were previously "allowed" to fail in the DRM video playback tests. As it turns out, the reason we failed it wasn't actually the reason that failures were waived, and we discovered that we had a bug in our extended media codec support (I lost a lot of sleep over this one). With these out of the way, we decided to ship.
It's no secret that we use a slow rollout system. We do this so we can monitor the feedback you guys post, and also view diagnostic data that we get from the process. We found an issue where Play Services could wreck your battery, and we also found that encrypted devices were failing to upgrade. We also found that some alternative recoveries would fail to install the update, or otherwise leave the device in an unusable state. This was serious enough that we stopped the rollout and resolve the issues before causing any more carnage. With the Play Services issue fixed, and the upgrade process for encrypted devices all set, we started the rollout again. We started getting feedback that people weren't able to upgrade their encrypted device. This made no sense at first- we tested this a hundred times end-to-end and could not reproduce the issue at all. Fortunately, we were able to locate a user in Seattle (where our engineering office is) who had the issue and he brought the device down to use to have a look at. After much head scratching, what we found was a highly demonic bug lurking in the "uncrypt" code. Traditionally, the "/cache" partition is used to download OTA files before flashing them, but this was a big update that was much bigger than the small cache partition on the Bacon. The solution to this is to download it to the /data partition, but the /data partition is encrypted and you can't read it from the recovery system. Google came up with a clever solution called "uncrypt" which runs in Android before shutting down after downloading the OTA file. Uncrypt rewrites the decrypted file back into the raw partition and creates a "block map" which the recovery system can use to reconstruct the file without having to actually mount the encrypted storage. Sounds crazy, but it works well. Well, it does now. After a bit more lost sleep, our engineers found an integer overflow in the uncrypt code which manifested when the /data partition is full enough that would cause this block map to be inaccurate, so recovery would fail to verify the downloaded file (which was random encrypted data). Mystery solved!
We restarted the rollout today and plan to have it out to everyone by the end of the week.
We're also working hard to get 12.1 rolled out in the coming weeks which brings in the new changes from Android 5.1.1 as well as a few new features of our own like LiveDisplay and of course bugfixes.
Ceo thread
https://forums.oneplus.net/threads/adventures-in-ota-land.306573/.