Category Archives: Offline support

For us, the largest benefit of Javascript templating is reduced size

There are quite a few Javascript templates. In my projects however, there are very few cases where I would prefer using any of them in place of regular HTML being pushed out from the server (running Ruby-on-Rails). The same can be said of the Ponzu conference system.

As far as I understand, the benefits of using Javascript templates are 1) reducing the load on the server (generating JSON is less load than generating full HTML), 2) speed if used in combination with a single page design.

The downside is the additional work that browsers have to do, which can be a problem on mobile where the devices are not as powerful as their desktop counterparts.

I’ve touched this subject before in these two posts [1](https://code.castle104.com/?p=289), [2](https://code.castle104.com/?p=291).

As discussed by David Heinemeier Hansson, the same benefits can be achieved without Javascript templates by using a PJAX/Turbolinks/Kamishibai like system that eliminates reloading Javascript and CSS on each page transition, and the use of aggressive caching on the server side to reduce the load of HTML generation.

There is a real case however, where I feel a strong need for a Javascript tempting language.

That is when I try to cache responses in browser side cache. The issue is that HTML is extremely verbose, and is a real killer in terms of storage consumption when you are working with repetitive content.

For example, the following is a “social box” that we use in Ponzu for the like button and a voting button. It takes about 2,000 bytes. Each social box is associated with a presentation so we have hundreds to thousands of social boxes for each conference. This can easily fill up the limited browser side cache.

<div class="" id="presentation_326_social_box"><div class='like_box'>
<div class='like' id='like_button_326'>
<span class='social_controls'>
<!-- To invalidate Like related paths, we need a like object -->
<a href="/likes?like%5Bpresentation_id%5D=326" class="button icon like" rel="nofollow">like</a>
<div class='prompt_message'>
(
To add to your schedule, please &quot;like&quot; it first.
)
</div>
</span>
<div class='social_stats'>
<img alt="Like" src="/assets/like-c3719a03fc7b33c23fda846c3ccfb175.png" title="いいね!を押すと、応援メッセージになります。またあなたのスケジュールに登録されます。" />
<a href="/presentations/326/likes?user=1">15 people</a>
liked this.
</div>
<div class='likes_list' id='likes_presentation_326'></div>
</div>

</div>
<div class='vote_box'>
<div class='like' id='vote_button_326'>
<span class='social_controls'>
<form accept-charset="UTF-8" action="/likes/vote" class="new_like" id="presentation_326_new_like" method="post"><div style="margin:0;padding:0"></div>


<span>

<label for="presentation_326_like_score_1">Excellent!</label>
</span>
<span>

<label for="presentation_326_like_score_2">Unique!</label>
</span>
<span>

<label for="presentation_326_like_score_0">No Vote</label>
</span>
</form>
</span>
</div>

</div>
</div><div id="session_details_presentation_326"></div>

Most of this content is repetitive and will be identical for each “social_box”. In fact, the content that is unique to each individual social box can be summarized in the following JSON.

[{ 
    presentation_id:326,
    user_id:1,
    score: 0,
    liked: 0,
    scheduled: 0
}]

If we could use Javascript templating to generate the 2,000 byte HTML from this small set of JSON, local storage savings would be huge.

This is one feature that we will be adding to Kamishibai and Ponzu in the near future, to enable the ultimate goal of complete offline viewing.

On HTTP caching

Kamishibai provides support for storing Ajax responses on the client using either localStorage or WebSQL (and Indexed DB is planned in the future). This enables us to dramatically speed up page loads by not sending the request out to the server, but retrieving the response from internal storage. It also allows us to provide offline access to pages.

HTTP itself provides the HTTP cache protocol which allows the server to control browser cache through the “Cache-Control”, “Expires”, “Last-Modified”, “If-Modified-Since”, “ETag” and “If-None-Match” HTTP headers. Ruby-on-Rails also provides methods that allow us to easily manage these headers.

The question is, why did we create our own caching mechanism for Kamishibai, instead of using HTTP cache. In the following, I hope to provide the answer.

Many Ajax requests are Private Content

The following is an excerpt from the above link describing use-cases from HTTP cache.

Private content (ie. that which can be considered sensitive and subject to security measures) requires even more assessment. Not only do you as the developer need to determine the cacheability of a particular resource, but you also need to consider the impact of having intermediary caches (such as web proxies) caching the files which may be outside of the users control. If in doubt, it is a safe option to not cache these items at all.

Should end-client caching still be desirable you can ask for resources to only be cached privately (i.e only within the end-user’s browser cache):

In Ponzu, our scientific conference information system with social network features, a lot of the content is “Private content”. For example, we generally only show the abstract text to conference participants (non-participants can only view the presentation titles and authors). Hence Ajax requests for presentation detail pages cannot be handled with HTTP cache.

URLs alone are not sufficient as the cache key

HTTP caching uses the URL only as the cache key. If the content changes depending on values in cookies, then HTTP caching doesn’t work.

However with Ponzu, we use cookies to store the current user id and we also store the locale. We display slightly different content depending on the privileges of each user and we also provide different translations. We do all this while keeping the URL the same. Keeping the URL the same is important to maximize social sharing.

Hence in Ponzu, URLs alone are not sufficient as the cache key.

Flexible purging of HTTP cache is not possible

HTTP cache does not allow flexible purging of the cache. For example, if you set “max-age” to a large value (e.g. for a few days), then you cannot touch the cache on the browser until the cache has expired. If you suddenly have an emergency notification that you need to put up, you can’t do it. You have to wait until the cache expires, whatever happens.

With Ponzu, we want to set very long cache expiry dates to maximize fast browsing. On the other hand, we want to be able to flush the cache when an emergency arises. An emergency might be an urgent notification, but it also may be a bug.

Hence HTTP cache is not particularly suitable, and we would not want to set long expiry times with it unless we were extra sure that the content would not change.

Summary

As we can see, HTTP cache is not suitable for the majority of Ajax requests (HTML requests) in Ponzu. Although we use it to serve content in the Ruby-on-Rails asset pipeline, we don’t use it for dynamically created content at all. Instead, we use Kamishibai caching which provides more flexibility.

Using local data storage space efficiently

As described in my previous post, “Choosing the browser-side data storage API”, storing data locally is tricky. Despite the work that is being done for data storage in HTML5, there still isn’t a good storage option that allows us to easily scale from a small data set to a large one. Storage limitations differ with each browser and we cannot rely on them to provide us with sufficient space.

Therefore, it is important that we use the space that we have as efficiently as possible.

In Kamishibai, we store HTML fragments as is in local storage. The size of the HTML fragments range from a whole page to a small “like” box. Breaking up a page into multiple HTML fragments is advantageous for effective cache management (each fragment might require a different expiry date) and would promote code reuse. However, multiple HTML fragments mean multiple HTTP requests. Downloading the page as a single fragment is the most efficient in terms of HTTP requests.

This means that we have to strike a compromise. Either we chose to optimize cache management by breaking up into small HTML fragments, or we optimize for network speed with larger HTML fragments.

The choice depends on how many pages we want to store locally, and the level of redundancy in each page. If we want to store a large number of pages, then it will be more efficient to use small fragments. Especially if the level of redundancy is high, we will be able to reuse small fragments effectively. Hence the choice will tend towards smaller fragments in conference systems.

Let’s look at the level of redundancy in conference systems.

The main pages in a Ponzu conference system are;

  1. A list of sessions.
  2. A list of presentations within a session.
  3. A presentation page (with a list of related presentations).
  4. A list of presentations in the search results.

A lot of the pages show a list of presentations. It therefore makes a lot of sense to store each element (title, authors and author affiliations of a presentation) separately so that we can construct different lists simply by combining elements.

Furthermore, the title, authors and author affiliations section of each presentation is quite large. In addition to the text, the authors section is composed of links to user profile pages and we additionally have markup of superscript. The markup is significant, and we often see more than 1,000 characters per presentation for the heading alone (sans abstract text).

In MBSJ2012, we did not break up lists into fragments. In addition to the large size, we also observed that rendering took a lot of time when the number of presentations in a list were large. Rendering the author list required a lot of string manipulation and often triggered garbage collection, resulting in long response times (several hundred milliseconds).

In future versions of Ponzu and Kamishibai, we will break up presentation lists. Each presentation will have a long expiry time so that the version cached inside the browser will be used. Additionally, we will use caching on the server. Our current test show that it should improve responsiveness in most cases.

Choosing the browser-side data storage API

Kamishibai has limited offline support at present, and we plan to bring it to full support within 2013. By full support, we mean that despite Ponzu being a web-page, we want to make the vast majority of pages available even when offline. The reason we take this very seriously is because WiFi at conferences tends to be very sketchy and temperamental.

Choosing the data storage API

Although HTML5 has support for offline applications, it is still very much in its infancy.

The Application Cache API is relatively well supported on all new browsers (> IE9). However, this API, while easy to use for the most basic cases, tends to be difficult for more advanced uses. Most programmers reserve it for static assets.

To store each presentation, session and user profile, we need a cache that is more flexible. Hence, in these cases, we use a data storage API that allows us to create, update and remove records programmatically and at will (the Application Cache API can only update all entries or zero entries). The APIs for this have not been well developed. We have localStorage, webSQL and indexedDB APIs.

localStorage is the simplest to use but also has the least features. The storage capacity is also rather small and no mainstream browser allows more than 5MBytes of storage.

webSQL is basically a sqllite3 database wrapped in Javascript. Since it can use SQL statements, there is a lot that you can do with it. Unfortunately, it has been abandoned by the W3C working-group because it is too tied to the sqlite3 implementation. webSQL can be expanded to 50MBytes of storage even on mobile devices, and is well supported on both iOS and Android. Safari and Chrome desktop browsers also support this API.

indexedDB is the replacement for webSQL. It is currently supported on IE10, Chrome and Firefox. Safari does not support it on both desktop and iOS versions. Also, the Android stock browser does not support this. It is the newest standard, and will most likely become the standard in the future. However, it will take several years before it becomes mainstream enough.

We ended up combining localStorage and webSQL for Kamishibai. Our rationale is discussed below. In the future, we plan to add indexedDB support.

webSQL looks great on features, but falls short on some important details

Having the full power of an SQL database and also broad support on mobile devices, webSQL looked like the prime candidate for Kamishibai data storage. Furthermore, since the 5MByte storage limit for localStorage would most likely be insufficient for large conferences, we initially planned to go 100% webSQL. However, we hit some snags.

First of all, when we use the openDatabase command to create a new webSQL database on Safari, and databases with an initial size exceeding 5MBytes brings up a modal dialog to confirm with the user. This is perfectly OK if the user is already familiar with the Ponzu system. However, a large number of people would be afraid and alarmed. They might opt to not allow access. More seriously, they might be deterred from visiting the site.

Memcached like LRU (least recently used) management

Since localStorage can only store 5MBytes and WebSQL puts up a modal dialog to confirm usage of more than 5MB, we have to consider how we are going to manage with storage that is smaller than the conference scientific program (it will easily exceed 10MBytes).

One idea is to purge entries that are infrequently accessed. A good scheme is the LRU management in memcached. The idea is to detect when the storage quota has been exceeded, and then to delete old entries to make space. Old entries are those that have not been recently accessed.

Implementing an LRU-type scheme is possible in localStorage. localStorage fires a QUOTA_EXCEEDED_ERR when we try to store more than the quota (5MB). We can catch this error and delete old entries, then retry.

However, webSQL at least on Safari is different. A quota exceeded error fires, but only after the modal dialog has shown up. This means that we cannot transparently delete old entries. This makes it very difficult to use webSQL as a LRU cache. Hence webSQL is good if we can get the user to agree to using a large quota (like 50 MB). However, using webSQL within the 5MB limit is very difficult if the user rejects the request.

Final scheme

Our final scheme is to initially use localStorage, using an LRU-based storage scheme. Since we will not be able to store the whole scientific program, this is basically a history-like cache.

We will also provide a store data locally button that the user can click to download the whole scientific program. This will use 50MB of webSQL storage. A modal dialog will pop up, but the user is expected to approve it. If they don’t, then we will fall back to localStorage. If the user uses webSQL, then we have no further need for localStorage, and we work 100% in webSQL.

Why Kamishibai uses hash-bangs for navigation

Kamishiba URLs look like

http://mbsj2012.castle104.com/ja#!_/ja/presentations/3366

We are using hash-bangs which are hated by some. I’ll like to outline why we use them despite their unpopularity.

  1. The most important reason is because we want to allow offline viewing. We thought over many alternative ways to do this, but our conclusion was that we needed a boot loader that we could keep in Application Cache. In the above URL, http://mbsj2012.castle104.com/ja is the bootloader. /ja/presentations/3366 is where the content resides.
  2. The stock Android browser for version 4.0 and above do not support popState. Neither does IE9 and below.

Obviously, the first reason is the most important for conference systems because the network connection is generally very unreliable. The only solution to do offline apps without a hash-bang scheme is outlined in Jake Archibald in A List Apart, and it’s a good solution. However, as Jake himself admits

The experience on a temperamental connection isn’t great. The problem with FALLBACK is that the original connection needs to fail before any falling-back can happen, which could take a while if the connection comes and goes. In this case, Gotcha #1 (files always come from the ApplicationCache) is pretty useful.

Unfortunately, a temperamental connection is exactly what you get at conferences. We need a solution that allows us to programmatically define when to FALLBACK using either a timeout or the results of previous requests.

Below our some of our thoughts about the hash-bang and how we are providing solutions to its ill effects.

  1. We think that the hash-bang is necessary to provide a better experience to users. Offline viewing is the most important but there are other smaller things.
  2. The argument against hash-bangs is mostly about the nature of the web and crawlability. Our solution is to provide a mobile website that uses regular URLs. Crawlers don’t have to come to our sophisticated, Javascript-heavy website. We make sure that the mobile website contains all important information and that URLs for the mobile website are automatically converted to our Kamishibai ones.
  3. Although a hash-bang webpage requires two HTTP requests for the HTML, the first request is for the boot loader and will be stored in Application Cache. Hence the first request is local if the user has visited this website before. As a results, only one HTTP request goes over the network.
  4. Although many people state that the hash-bang is temporary and that widespread popState support will make it unnecessary, I disagree. Hash-bang is the only way I know of that will support offline websites.
  5. GMail still uses hash-bangs for most of their stuff. Obviously, GMail doesn’t want bots crawling their website.

Interchangeability of regular URLs and hash-bang URLs

In Kamishibai, we support both regular URLs and hash-bang URLs. Regular URLs will be used if the browser is mobile (iMode) or unsupported or has Javascript disabled. Because we have two URLs, we have to provide mechanisms to reroute, etc.

From regular URL to Kamishibai URL

This will be done within Rails and will be simple rerouting on the server. 302 redirects generally honor the hash fragment, so everything should be OK.

  1. Rails will receive the regular URL.
  2. If the browser that requested it is supported in Kamishibai, then redirect to the corresponding Kamishibai URL via 302. If the browser is not supported, then it gets a regular HTML response. (this may be implemented with a Rails before filter because not all actions can be converted easily).

From Kamishibai URL to regular URL

iMode is Javascript capable, but in reality, a large number may have Javascript disabled. Other carriers do not have Javascript. We cannot rely on Javascript of mobile sites. However, rudimentary Javascript should be provided for our non-supported browsers.

  1. Rails will receive the Kamishibai URL.
  2. Rails will return the top page, not the bootloader. The top page will have a short Javascript snippet that detects whether the URL was actually a Kamishibai URL or not. If it was a Kamishibai URL, then we automatically redirect to the corresponding regular URL. This will work if Javascript is ON but not if Javascript is OFF.
  3. If Javascript is disabled, then there is nothing we can do.

Summary

All clients with Javascript enabled will be able to handle both Kamishibai and regular URL. If Javascript is disabled, they will be able to handle regular URLs but will only show the top page for Kamishibai URLs.

With regard to Googlebot, we try to make it index only the regular URLs. Requests to regular URLs return full iMode pages (if Googlebot). If Googlebot tries to crawl a Kamishibai URL, which might happen if it followed an external link, then it might get a bit confused but because the Javascript is complicated, we expect it to just find the top page. Finding the hash-bang, it might also try to index using the _escaped_fragment_ scheme. We should redirect any of these requests to the regular URL version (using 301 redirects).