Offline data synchronization, Part 2: Advanced strategies to address a crucial challenge for mobile apps
In my last post, I talked about offline data synchronization for mobile devices, one of the most crucial challenges in app development. I also introduced some basic strategies for mobile data synchronization. Now, let’s explore some more advanced strategies.
1. Modification of offline data
Using mobile devices offline not only requires getting data on the device, it also requires handling data modifications offline. Of course, it’s best if data modification actions are not allowed or limited when the user is offline. This is feasible for some business scenarios. In case of an emergency situation such as a blackout, the main requirement for users is to have all relevant data offline available to fix the blackout. There is no immediate need to modify any of the data. However, a different situation would be asset maintenance. Users would certainly like to complete and document their maintenance tasks for the asset while offline.
By their core nature, offline data modifications are commonly only for documentation and not collaboration with other workers. It’s sufficient to cache this data and play it back to the server the next time the user has an online connection. Most conveniently for the user, this can happen seamlessly in the background. However, it should be considered that the user would need to have some control or need to be informed if and when the data has been played back to the server.
2. Shared data sync
Modification of offline data requires caching data until network connectivity is available again. For data that is exclusive for a specific user, such as work orders or individual assignments, modifications are easy to handle. The real challenge starts with shared data. If one dedicated user can modify the data, many users should be able to as well. Examples include work orders assigned to a group, a pool of work orders with user self-assignment or many users working on the same assets and needing to change status information.
Providing offline functionality for such scenarios bears a lot of risks. By the very nature of this problem, conflicts can’t be avoided. If users are offline, the modified data simply can’t be made available to other users. So, different users can work on and modify the same data without knowing what other users do with it at the same time. If such scenarios are unavoidable and there is a strong business need to let users work offline on shared data, you need to set a strong focus on exception handling and defining business rules on how to handle those exceptions. A viable approach for many scenarios is that the first update of a data set wins, and any other updates on outdated data will be ignored.
Another important consideration is investigating which conflicts can be avoided or better managed by the underlying work processes. If users have shared work orders for maintenance tasks but work in different locations, data conflicts are avoided by the defined work process. Or, if users need to track assets taken out of storage, other users could not take the same assets out because they are physically not there anymore. Exception handling can be significantly simplified in these cases. Hence, these scenarios need to be investigated and taken into consideration when defining potential conflicts for offline functions on shared data.
3. Auto-sync versus manual sync
Modern apps have gotten users accustomed to automatic synchronization processes that run in the background. The user does not need to worry about data updates and is not blocked from doing work. This is very convenient, and many clients expect this approach as standard today. However, this might not be the best approach in complex synchronization scenarios. In cases with long sync processing times, the app might be in an inconsistent state, especially if the user can work on the data while the sync process is running in the background. Many new exception handling scenarios can occur that are very difficult to handle. If a large amount of data needs to be synced, the user might need to be able to control when and where it starts to download. And if data modifications are cached, the user would need to gain the control to reliably send them back to the server within a certain amount of time, such as before the end of a work shift.
For all of this, it’s often advisable to apply a manual synchronization process that can be controlled by the user. If this sync blocks the app, many nasty conflicts can be avoided. This leads to more reliability and certainty for the user about what data is synced and when. As an example, the user can manually start a sync when he comes to work in the morning and prepares to go in the field. After the sync has finished, he can be sure his data is up-to-date and doesn’t need to worry about being offline for the rest of the day.
4. Push versus pull
Another major topic to look at is how to initiate the synchronization. The two major approaches are pushing the sync by the server and pulling it by the client.
Push is usually used for small changes so that every time a small piece of data is modified, the server sends out a push notification to all clients. The advantage of this approach is that data is only synced when needed and is updated very quickly. However, using this approach requires careful considerations. Standard push notification mechanisms do not guarantee if and when the client receives them. Also, if more than one notification is in the queue, it doesn’t ensure a sequential order. Moreover, when a client is offline for a while, notifications might get lost or the queue might grow larger.
Having many notifications in the queue is generally a problem no matter if it’s caused by long offline times or a lot of small changes delivered by the server. Processing times to step through the queue and process all data updates can take a long time on the client devices, and the results for each client can significantly differ and be inconsistent due to different processing order and latency times.
The other approach — a pull mechanism — provides more reliability. A client contacts the server and requests all data updates since the latest request. This can either happen automatically in defined intervals, or the user can manually initiate it. Although the disadvantage is that the client data is not in sync in between the pull requests, the higher reliability outweighs this disadvantage in many scenarios. The only situation in which we successfully used a push mechanism was for the synchronization of a back-end system with data in the mobile middleware. However, for the synchronization of larger numbers of mobile clients only, a pull mechanism has proven to be reliable so far for us.
Offline data handling and data synchronization is a complex topic for many business scenarios. Not only do organizations need to investigate the technical options, but even more importantly, they need to fully understand every detail of the business requirements to deliver viable solutions. The solution needs to work technically, be usable and be in accordance with the business requirements and processes of the users. This can only be achieved by considering all options, even if at first glance, they do not seem to be state-of-the-art.