The ability to store date & time (“datetime”) in a system is essential, but usually not given the consideration it deserves. It’s relevant for all kinds of data – sometimes reflecting a real event (“when does a sale end” or “when did an incident happen”), other times simply letting us know when data within a system has changed (“createdAt” or “updatedAt”).
A common pattern is to convert datetime information to Coordinated Universal Time (“UTC”) before it is stored in databases. This means all the datetime data is stored consistently, which makes it easier to work with in the future. If you wanted to generate a report showing what happened in a certain hour, for example, you’d just need to find records with the relevant datetimes (which are already in UTC, so you don’t need to worry about additional logic to convert them). This works really well. That is, until you realise the original time zone is actually essential information you shouldn’t have discarded.
We have this problem within our own business. While we’re primarily based in Melbourne, some of our team works remotely, in different timezones. Some then move around a fair bit, and in doing so, frequently change timezone as well.
This poses an interesting problem when you consider that our billing-cycles are based on the work done in each month. When exactly does our billing month start and end?
A hypothetical scenario:
We ask our team to log the hours they work in a timesheet. If all these timesheets are stored in UTC, and we include only those hours worked that have a timestamp in the month we want to bill for, one could argue the following: a team member in Melbourne could work until 11AM on the 1st of February and it would still be the 31st of January UTC time, as at that time the offset to UTC is +11 hours in Melbourne. Similarly if we had a team member sitting somewhere on the west coast of North America, where the offset to UTC is -08 hours on that day, they could not work past 4PM on the 31st of January to make the cut for that bill. Not exactly an ideal scenario.
A different level of complexity is added to this problem if we need to consider overtime. Timesheet entries in UTC can't tell us if a person has worked during their regular hours or if they've pulled a late night. 3pm UTC sounds reasonable, until we realise it was 4am in their local timezone. We need to know what time it was for them - to have the timezone offset - to be able to determine if which hours they worked impacts billing in any way.
Recently, we ran into this issue with the time-sheeting software we use. We have a reporting tool that gives us an overview of the hours we use on each project. This tool uses data from our time-sheeting software (by querying the API).
As noted by the software provider in their docs, they store all timesheet entries in UTC. The issue is that not only is data stored in UTC, but that inconveniently the timezone information has been stripped off, entirely and irreversibly. The time-sheet entry cannot be related back to the original timezone in which it was recorded.
There is, however, only one source of information about a user’s timezone in the timesheeting system. That is the time zone a user has currently set on their user profile. We can use this timezone to convert the UTC entries we receive, but of course, it only reflects what timezone that user is in now – not where they’ve been historically.
Now, you may be able to tell where this is headed. There are two major problems with this approach:
It took us time and effort to fully understand this problem. For some time, we’d known something was a little off when comparing reports from the timesheeting tool and our internal system – but not why. The API documentation provided by the time-sheeting software was also not particularly forthcoming about what was going on with their data, so we had to launch our own investigation.
We delved into GitHub issues, and spent a decent amount of time looking at the data we received from the timesheeting tool’s API. Eventually we realised, although they had stripped off the timezone information from the timesheet entries when they were recorded, they were substituting in new timezone data that made it non-obvious. What an adventure it was to figure this out.
Luckily for us we have an overseeable number of employees, and a good understanding of what timezone everyone is in – and where they’ve worked in the past. We are able to retrospectively convert the messed up data back to timezone-aware entries, but we shouldn’t have to.
Apps that collect and relate data to a particular person in a particular place at a particular time should always keep track of all three things and regard it as an entity. Don’t make an assumption that one of these things is not important because the use case isn’t immediately obvious. The timezone conveys critical information about a person in regards to their local business hours. Only storing timezones in UTC might seem like the simpler option – until the original datetime is needed.
It’s also good practice to ensure that, however you handle timezone data, it should be well documented. This exercise made it clear that different providers (including us) have different assumptions on how datetime should be stored. Save yourself and your users the headache – be as clear and consistent as possible.
As frustrating as these issues have been for us as an end-user, it was a good lesson to learn. We’re now even more aware of how we handle datetime when building our own software. As mentioned earlier, it’s not something to be overlooked – give it the attention it deserves.