[Web] The traps of window.applicationCache－Foxbrush

From: http://alistapart.com/article/application-cache-is...

-----------------------------------------------------------------------------------------

Now, I don’t mean he’s useless or should be avoided, you just have to be very careful when and how you work with him. If you get it wrong, the douchebaggery oozes through onto the end user. By reading through my own painful experiences with AppCache, you’ll know what to expect from AppCache and how to deal with it.

When is offline access useful?

We’re better connected than we’ve ever been, but we’re not always connected. For instance, I’m writing this on a train hurtling through the data-barren plains of West Sussex. Alternatively, you may choose to be offline. When I use data abroad I can almost hear the champagne corks popping in the offices of my network provider. I know I’m haemorrhaging money when I data-roam, but the internet has the data I need, and it won’t let me at most of it without a working connection.

Sites that are useful offline generally fall into two categories, ones that let you do stuff and ones that let you look stuff up.

“Look stuff up” sites include Wikipedia, YouTube, and Twitter. The heavy lifting tends to be on the server and there’s a large amount of data available but users only use a fraction of it.

“Do stuff” sites include Cut the Rope, CSS Lint, and Google Docs. With these sites the heavy lifting tends to be done on the client. They offer a limited amount of data, but you can use that data in multiple ways, or create your own data. This is the case Application Cache was designed for, so we’ll look at that first for a nice easy introduction.

Offlining a “do stuff” site

Yes, that’s right, I verbified “offline.” Yes, I verbified “verb.” Feel free to inbox me grammar complaints that I’ll trashinate.

Sprite Cow fits into the “do stuff” box. CSS sprites are great for performance, but finding out the size and position of an item in the sprite sheet can be fiddly. Sprite Cow loads sprite sheets, and spits out the CSS to display a particular portion of it. It’s one html file, a few assets, and all the processing is done on the client; the server does nothing except serve files.

It would be nice to be able to use Sprite Cow on the train, as we can with native apps. To do this, we create a manifest file listing all the assets the site needs:

CACHE MANIFEST
assets/6/script/mainmin.js
assets/6/style/mainmin.css
assets/6/style/fonts/pro.ttf
assets/6/style/imgs/sprites1.png

...then link that manifest to the html page via an attribute:

<html manifest="offline.appcache">

The HTML page itself isn’t listed in the manifest. Pages that associate with a manifest become part of it.

In practice, if you visit Sprite Cow with a data connection, you’ll be able to visit it subsequently without one.

The resource tab in Chrome’s Web Inspector will show you the files picked up by the manifest, and which page pointed to it. If you need to clean these caches, seechrome://appcache-internals/.

At first, this can seem like a magic bullet to the problem. Unfortunately I was lying when I said this would be an easy introduction. The ApplicationCache spec is like an onion: it has many many layers, and as you peel through them you’ll be reduced to tears.

GOTCHA #1: FILES ALWAYS COME FROM THE APPLICATIONCACHE, EVEN IF YOU’RE ONLINE

When you visit Sprite Cow, you’ll instantly get the version from your cache. Once the page has finished rendering, the browser will look for updates to the manifest and cached files.

This sounds like a bizarre way of doing things, but it means the browser doesn’t have to wait for connections to time out before deciding you’re offline.

The ApplicationCache fires an updateready event to let us know there’s updated content, but we can’t simply refresh the page at this point, because the user may have already interacted with the version they already have.

This isn’t a big deal here, because the old version is probably good enough. If need be, we can display a little “An update is available. Refresh to update” message. You may have seen this on Google apps such as Reader and Gmail.

Oh, remember four paragraphs ago when I said that ApplicationCache looks for updated content after rendering the page? I lied.

GOTCHA #2: THE APPLICATIONCACHE ONLY UPDATES IF THE CONTENT OF THE MANIFEST ITSELF HAS CHANGED

HTTP already has a caching model. Each file can define how it should be cached. Even at a basic level, individual files can say “never cache me,” or “check with the server, it’ll tell you if there’s an update,” or “assume I’m good until 1st April 2022.”

However, imagine you had 50 html pages in your manifest. Each time you visit any of them while online, the browser would have to make 50 http requests to see if they need to be updated.

As a slightly unusual workaround, the browser will only look for updates to files listed in the manifest if the manifest file itself has changed since the browser last checked. Any change that makes the manifest byte-for-byte different will do.

This works pretty transparently for static assets which are ideally served via a content delivery network and never change. When the CSS/JavaScript/etc., changes it’s served under a different url, which means the content of the manifest changes. If you’re unfamiliar with far-future caching and CDNs, check out Yahoo!’s performance best-practice guide.

Some resources cannot simply change their url like this, such as our HTML pages for instance. ApplicationCache won’t look for updates to these files without a friendly nudge. The simplest way to do this is to add a comment to your manifest and change that when necessary.

CACHE MANIFEST
# v1whatever.html

Comments start with # in manifests. If I update whatever.html I’d change my comment to # v2, triggering an update. You could automate this by having a build script output something similar to ETAGs for each file in the manifest as a comment, so each file change is certain to change the content of the manifest.

However, updating the text in the manifest doesn’t guarantee the resources within will be updated from the server.

GOTCHA #3: THE APPLICATIONCACHE IS AN ADDITIONAL CACHE, NOT AT ALTERNATIVE ONE

When the browser updates the ApplicationCache, it requests urls as it usually would. It obeys regular caching instructions: If an item’s header says “assume I’m good until 1st April 2022” the browser will assume that resource is indeed good until 1st April 2022, and won’t trouble the server for an update.

This is a good thing, because you can use it to cut down the number of requests the browser needs to make when the manifest changes.

This can catch people out while they’re experimenting if their servers don’t serve cache headers. Without specifics, the browser will take a guess at the caching. You could update whatever.html and the manifest, but the browser won’t update the file because it’ll “guess” that it doesn’t need updating.

All files you serve should have cache headers and this is especially important for everything in your manifest and the manifest itself. If a file is very likely to update, it should be served with no-cache. If the file changes infrequently must-revalidate is a better bet. For example, must-revalidate is a good choice for the manifest file itself. Oh, while we’re on the subject…

GOTCHA #4: NEVER EVER EVER FAR-FUTURE CACHE THE MANIFEST

You might think you can treat your manifest as a static file, as in “assume I’m good until 1st April 2022,” then change the url to the manifest when you need to make updates.

No! Don’t do that! *slap*

Remember Gotcha #1: When the user visits a page a second time they’ll get the ApplicationCached version. If you’ve changed the url to the manifest, bad luck, the user’s cached version still points at the old manifest, which is obviously byte-for-byte the same so nothing updates, ever.

GOTCHA #5: NON-CACHED RESOURCES WILL NOT LOAD ON A CACHED PAGE

If you cache index.html but not cat.jpg, that image will not display on index.htmleven if you’re online. No, really, that’s intended behaviour, see for yourself.

To disable this behaviour, use the NETWORK section of the manifest

CACHE MANIFEST
# v1index.htmlNETWORK:
*

The * indicates that the browser should allow all connections to non-cached resources from a cached page. Here, you can see it applied to the previous example. Obviously, these connections will still fail while offline.

Well done; you made it through the simple playing-to-its-strengths ApplicationCache example. Yes, really, that’s the simple case. Sorry. Let’s try something a bit tougher.

Offlinerifying a “look up stuff” site

As I mentioned at the start of this article (Remember how much happier you were back then?), Lanyrd recently launched a mobile website so people could look up conference schedules, locations, attendees, etc. Offline access to this data is important while you’re travelling and faced with data roaming charges.

There’s too much content to offlinificate everything, but a single user is generally only interested in events they’re participating in.

The late Dive into HTML5 gives us an example of how you could offlinerize Wikipedia, another “look up stuff” site. It works by having an almost-empty manifest which every page links to, so as users navigate around the site, those pages implicitly become part of their cache. While offline, they’ll be able to visit any of the pages they previously visited.

That solution is brilliantly simple, but thanks to a few “lumpy bits” in the specification, it’s completely disastrous. For starters, the user isn’t given any indication of which content is available while they’re offline, and there isn’t a JavaScript API we could use to get at that information. We could download and parse the manifest with JavaScript, but all these Wikipedia pages are implicitly cached, so they aren’t listed.

Furthermore, remember Gotcha #1: the cached version will be shown rather than a version from the server. The page will be frozen for the user when they first look at it, but as we found in Gotcha #2, we can trigger the browser to look for updates by changing the text in the manifest file. However, when do we change the manifest file? Whenever a Wikipedia entry is updated? That would be way too frequent, in fact if a manifest changes between starting and ending an update, the browser will consider this an update failure (step 24).

The frequency of these updates is a problem, but it’s the weight of these updates that’s the real killer. Consider the number of Wikipedia pages you’ve browsed—hundreds? thousands? An AppCache update would involve downloading every single one of those pages. AppCache doesn’t give us a way to remove implicitly cached items, so that number is going to keep growing and growing until it hits some kind of browser cache limit and the world explodes. That’s not great.

What do we want from an offlinable reference site?

My requirements for a reference site with offline capabilities are thus. It must:

show up-to-date data while online, as it would without ApplicationCache,
allow us (the developers) to control which content is cached, when it’s cached, and how it’s cached,
allow us to defer some of that control to the users, perhaps in the form of a “Save offline” or “Read later” button, and
a single visit to any page must give the browser what it needs to show content offline.

ApplicationCache, for all its bragging, doesn’t make this easy. If a page has a manifest, it becomes cached. You can’t have a page tell the browser about offline capabilities without that page being cached itself.

Limiting the reach of the ApplicationCache

The easiest way to make a page behave as it would without ApplicationCache is to serve it without a manifest. We can tell the browser about the offline stuff by injecting a hidden iframe pointing to a page that does have a manifest.

Let’s give that a spin. Visit this page while online. Once you’ve done that you’ll be able to visit this page while offline. The first page isn’t part of the cache, so the user will always get the most up-to-date information from it.

That’s not very impressive though. If you visit the first page while offline, you’ll get nothing.

Falling back

The ApplicationCache manifest lets us specify a fallback resource to use when another request fails.

CACHE MANIFEST
FALLBACK:
/ fallback.html
/assets/imgs/avatars/ assets/imgs/avatars/default-v1.png

This tells the ApplicationCache to display fallback.html whenever a request fails, unless the request fails within /assets/imgs/avatars/, in which case a fallback image will be used.

To see how this works in practice, visit this page. Visit it again without a network connection, and a fallback page will be shown instead. Notice how it isn’t a hard redirect? The url for the original page remains, and this will be useful.

Incidentally, having a fallback relaxes the network blocking rules we encountered in Gotcha #5. Now connections are allowed within the same domain, but you’ll still need to use the network wildcard for connections to other domains.

Now we’re getting somewhere—we’re showing a cached page only if the regular page didn’t succeed.

Using ApplicationCache for static content only

By “static content,” I mean content that never changes. Images, scripts, styles, and our fallback page.

CACHE MANIFEST
js/script-v1.js
css/style-v1.css
img/logo-v1.png
FALLBACK:
/ fallback/v1.html
/imgs/avatars/ imgs/avatars/default-v1.png

If we needed to make a change to the JavaScript, we’d upload a new file at a new url,script-v2.js, for instance.

This works around Gotchas #1 and #2: The user will never be served an out-of-date script or style because it’ll have a fresh url when it changes. We don’t have to deal with comment-based version numbers in the manifest because the text change in the url is enough to trigger an update. All resources will have a far-future cache, so only changed files will need an http request to update.

GOTCHA #6: BYE BYE CONDITIONAL DOWNLOADS

Oh come on, you didn’t think you’d get though a whole article without a reference to responsive design did you?

Do you have two sets of design images? Is one of them much smaller and lighter for people viewing on mobile devices? Do you use media queries to decide which of these to display? Well, ApplicationCache hates you and your family.

All images required to render your site go in the manifest and the browser downloads all of these. In the case of responsive images, the user ends up downloading both versions of the same asset. This defeats the point. Just use desktop-resolution imagery and resize it down on the client using CSS background size.

If the mobile version has a completely different design, at least put them into a sprite sheet along with the “high res” graphics so they can benefit from palette-based png compression together.

The same rule is true of fonts. I saw some idiot recommending using lots of font formats, which is all well and good for regular sites, but we can’t have all those in the manifest. For offline use, go with True Type Fonts (TTF) only. “Hey, isn’t Web Open Font Format (WOFF) the future?” Yeah, probably, but only for legal reasons. There’s no technical benefit to WOFF over TTF. Ok, WOFF has built-in compression but it’s no better than gzipping a TTF. Also, WOFF isn’t supported by the older versions of many browsers, whereas TTF support extends much further.

Anyway, back to Application Cache.

Using LocalStorage for dynamic offlinerification

We can’t offlinerify all our content because there’s too much. We want the user to pick what they want to have available offline. We can use LocalStorage to store that data.

Yes, LocalStorage is just a shelf, but it’s an extremely useful shelf that’s very simple to use. You can put whatever text data you want in there and get it back later, from any page on the same domain.

LocalStorage is stored on disk, so using it is cheap but not free. Because of this, we should keep the number of reads and writes down and we don’t want to be reading and writing more than we have to per read/write.

We’re going to use an entry for each page we store, and an additional entry to keep track of what we’ve saved offline, along with their page titles. This means we can list all the pages we have cached with one read, and display a particular page with two.

So, to save the page articles/1.html for offline use, we do this:

// Get the page content
var pageHtml = document.body.innerHTML;
var urlPath = location.pathName;
// Save it in local storage
localStorage.setItem( urlPath, pageHtml );
// Get our index
var index = JSON.parse( localStorage.getItem( index' ) );
// Set the title and save it
index[ urlPath ] = document.title;
localStorage.setItem( 'index', JSON.stringify( index ) );

Then, if the user visits articles/1.html without a connection, they’ll get fallback.html, which does the following:

var pageHtml = localStorage.getItem( urlPath );
if ( !pageHtml ) {
 document.body.innerHTML = 'Page not available';

}

else {

 document.body.innerHTML = localStorage.getItem( urlPath );

 document.title = localStorage.getItem( 'index' )[ urlPath ];

}

We can also iterate over localStorage.getItem( ‘index’ ) to get details on all the pages the user has cached.

Putting it all together

Here’s a demo of the above in action. The article pages can be cached via a button in the top-right of the page, and the index page will indicate which pages are available offline.

All of the code is available on GitHub. Any page can be cached with a call tooffliner.cacheCurrentPage(). This is called on every visit to the index page, and on every visit to a page the user wishes to be cached.

If the user ends up on the fallback page, offliner.renderCurrentPage() is called, which renders the intended page. If we don’t have a page to show, an error message is displayed. Oh, that reminds me…

GOTCHA #7: WE DON’T KNOW HOW WE GOT TO THE FALLBACK PAGE

When we can’t display a particular page, the error we show is rather vague. According to the spec, the fallback page is shown if the original request results in “a redirect to a resource with another origin (indicative of a captive portal), or a 4xx or 5xx status code or equivalent, or if there were network errors (but not if the user cancelled the download).”

This is good in some ways. If the user is online but our site goes down, their browser will just show cached data and the user might not even notice! Unfortunately, we’re not given access to the reason for the fallback. It could be because the user has no connection, it could be that they’ve followed a broken url or mistyped it, or we could have a server fault. We just don’t know.

Oh, did you spot that little bit about redirects?

GOTCHA #8: REDIRECTS TO OTHER DOMAINS ARE TREATED AS FAILURES

Yes, that’s right, another gotcha. I’m surprised you’re still reading. If you want to go and lock yourself in a toilet cubicle and refuse to come out until the internet has gone away, I’d totally understand.

If one of your urls decides it needs to redirect to Twitter or Facebook to do some authentication, our friendly Application Cache will decide that’s NOT ALLOWED and show our fallback page instead.

This rule has good intentions. If the user tries to visit your site and the wifi they’re using redirects them to http://rubbish-network/pay-for-wifi-access, showing our site’s fallback page instead is great.

In the case of intended spontaneous auth redirects, white-listing these in the NETWORKsection doesn’t work. Instead, you’ll have to use a JavaScript or a meta-redirect. Urgh.

Downsides to the LocalStorage approach

“Oh, we’re onto the downsides now? And what exactly was the rest of the article?” Yes, I know, please don’t hit me. There are some disadvantages over the plain ApplicationCache solution.

JavaScript is required, whereas Sprite Cow’s use of ApplicationCache isn’t JavaScript-dependent (although the rest of the site is). I’m going to stick my neck out and say there’ll be very few users with ApplicationCache support but no JavaScript support. That’s not to say no-JavaScript support isn’t important. Lanyrd’s mobile site works fine without JavaScript. In fact, we avoid parsing JavaScript on older devices to keep things simple and quick.

The experience on a temperamental connection isn’t great. The problem withFALLBACK is that the original connection needs to fail before any falling-back can happen, which could take a while if the connection comes and goes. In this case, Gotcha #1 (files always come from the ApplicationCache) is pretty useful.

It doesn’t work in Opera. Opera doesn’t support FALLBACK sections in manifests properly. Hopefully they’ll fix this soon.

What does m.lanyrd.com do differently?

The demo I showed earlier is a simplification of what we do at Lanyrd. Instead of storing the page’s HTML for offline use, we store JSON data in LocalStorage and templates for that data in ApplicationCache. This means we can update the templates independently of the data, and use one set of data in multiple templates.

Our JSON data is versioned per conference. These version numbers are checked when you navigate around the site. If they don’t match what you have stored, an update is downloaded. This means you only make a request for updated data when there’s an update.

Rather than provide a button to let the user store a particular page offline, we cache data for events the user is tracking or attending. As a result, the server knows what the user wants offline, so if they change devices or lose their cache somehow, we can quickly repopulate it.

Switching pages is done via XMLHttpRequest and pushState. This is much faster on mobile devices as it doesn’t have to reparse the JavaScript on each page load, and it makes it feel a bit more like an app than a site.

Oh, go on, for old times sake…

GOTCHA #9: AN EXTRA HOOP TO JUMP THROUGH FOR XHR

You can make XHR requests to cached resources while offline, unfortunately older versions of WebKit finish the request with a statusCode of 0, which popular libraries interpret as a failure.

// In jQuery…
$.ajax( url ).done( function(response) {
 // Hooray, it worked!
}).fail( function(response) {
 // Unfortunately, some requests
 // that worked end up here
});

In the wild, you’ll see this on the Blackberry Playbook, and devices that use iOS3 and Android 3/4. Android 2’s browser doesn’t have this bug. Oddly, it seems to run a newer version of WebKit. It also supports history.pushState whereas the browsers on later versions of Android do not. FANKS ANDROID. Here’s how you work around this issue:

$.ajax( url ).always( function(response) {
 // Exit if this request was deliberately aborted
 if (response.statusText === 'abort') { return; } // Does this smell like an error?
 if (response.responseText !== undefined) {
  if (response.responseText && response.status < 400) {
   // Not a real error, recover the content    resp
  }
  else {
   // This is a proper error, deal with it
   return;
  }
 } // do something with 'response'
});

And here’s a demo where you can test that out.

ApplicationCache: your friendly douchebag

I’m not saying that ApplicationCache should be avoided, it’s extremely useful. We all know someone who talks themselves up, or needs “observing” more than others in case they do something really stupid. ApplicationCache is one of those people.

ApplicationCache can, under careful instruction, do stuff no one else can. But when he says “You don’t need to hire a plumber! I’ll fit your bathroom for you! I did all the bathrooms in Buckingham Palace, y’see…” turn him down gently.

If you’re creating anything more complicated than a self-contained client-side “app,” you’ll have happier times using it to an absolute minimum and getting LocalStorage to do the rest.

Foxbrush

Foxbrush 發表在痞客邦留言(0) 人氣()

E-mail轉寄

Foxbrush

Where I Record Myself