25 June 2013

21 June 2013

An Analysis for Transition from Spreadsheets to Fusion Tables, part 3

This is part of a multipart post taken from my essay entitled "An Analysis for Transition from Spreadsheets to Fusion Tables". It may sound boring, but a questioning reader may find it rather useful to understand the value of Google Sheets and Google Fusion Tables in the workplace and readers may also find additional ideas for improving their usage of either product.

Costing Evolution

While Fusion Tables promises much in the way of streamlining operations and improving managed data sharing, it is not without some drawbacks. Even when entering the Promised Land, some of the wonders of Egypt had to be left behind.

Return to Structure

To start with, Fusion system would be a return to the database structure abandoned by Company data. While column changes would still be possible, they would be more difficult than currently supported (but still much easier the previously). Further, Views and data merges are non-additive. Once they have been completed columns can be removed but no new ones—or old ones, including one previously removed—can be added. To accomplish addition requires building a new merge or View (unless introducing a completely new data set). Because of this, it is likely to take several attempts to get the proper blend of data. Removing old attempts will invalidate links and bookmarks requiring them to be updated.

Lack of Presence

Those “feel good” social markers (namely the cell highlights) are completely absent from Fusion Tables. In fact, working on a Fusion Table is a bit like working in a black hole: there is no indication that any other user has a Fusion Table open let alone where they are working. The only indication that one is not working alone is if data changes after a browser refresh.

Stale Data

There is no real-time updating. All updates to a Fusion Table happen on refresh. This can increase the chances of overlapping work, duplicate data entry or functioning on old data.

More Data, Less Interpretation

Fusion Tables does not permit anything near the robust formulas of a spreadsheet. While they permit basic math and a few other limited functions, if/then and other comparative functions are limited at best but are frequently missing entirely. Useful features like links to clients folder based on online enrollment suddenly become impossible.

While it is possible, and even likely, that more powerful functions will be added as Fusion Tables continue to develop, they are not available now. The same is true for our conditional formatting. This shifts some of the analysis that Company data does automatically back to the user. Missing information will blend in with complete information, poor ratings will sit quietly beside good ones and words will go back to being just a number relying of the user to remember what the values mean. These issues can be migrated with concerted training but it is less elegant to have to track such drab details manually.

No Printing

Perhaps one of the most glaringly absent features is an inability to print. None of the data sets, Views or reports can be printed. Instead, they are locked securely in their digital existence. The nearest ability to print them is to export them and print from the exportation.

Dismissal of Menus

Currently, Fusion Tables lacks the ability to be scripted with custom menus. This means that running custom scripts will no longer be as simple as going to the Scripts menu and telling it to run. In fairness, this will only effect two scripts that were tied directly to Company data directly and thus should not be considered a major concern.

19 June 2013

An Analysis for Transition from Spreadsheets to Fusion Tables, part 2

This is part of a multipart post taken from my essay entitled "An Analysis for Transition from Spreadsheets to Fusion Tables". It may sound boring, but a questioning reader may find it rather useful to understand the value of Google Sheets and Google Fusion Tables in the workplace and readers may also find additional ideas for improving their usage of either product.

The Promises of Fusion

Some number of months ago, Google released a new product, Fusion Tables, to the public. While Fusion Tables are designed to handle large amount of data, they present a compelling opportunity for evolution to the Ecosystem. Such progression, however, does not come without some sacrifice.

Fusion Tables represent a return to the database format we had originally abandoned when creating the Ecosystem, but with some twists. With an eager excitement I began a new project called “Fusion system”, a functional prototype of our Company data spreadsheet converted to a set of Fusion Tables, to test out the potential of this new system. Adoption of Fusion system would not mean throwing out all of our spreadsheets, just converting our largest, most complicated spreadsheet into a Fusion Table. The rest of the Ecosystem would remain the same.

Custom Data Combination

One of the biggest benefits of the Fusion Tables is the ability to easily combine data across different tables without having to permanently combine the data. In fact, temporary “joins” as they are called in database jargon, are part of the Fusion Table backbone. It allows for almost endless, non-destructive data merging. This, in conjunction with what we smart data propagation, makes it advantageous to segregate data into logical groupings rather than a large conglomerate as we did with Company data because relevant data can be pulled and combined whenever desired for an enhanced view and, unlike importRange, this is accomplished more reliably.

Data Propagation

Data, no matter where it was originally stored, where it ends up being presented or with what other data it ends up being merged, always remains editable. More importantly, changes to data—again, without regard to where and what—are propagated throughout all iterations of the data so that wherever the data is presented, it is always the newest revision.

This ability for universal editing means that there would be no reservation in creating customized data sets because there would be no break in the data being viewed. In theory, each user could have a set of tables that have been tailored for efficiency in their specific task. This is vast improvement over our current spreadsheet that enable one-way transmission of data (they can read data from other sheets but cannot push edits back).

Record Editing

Fusion Tables have a built in record editing mechanism that we have not been able to mimic with satisfactory reliability in the Company data. Moreover, this edit mechanism adapts to the columns presented in the current view of the data. That is, users are always presented with an editing screen that matches the current table.

This improved presentation of editing help to ensure that data is only intentionally edited as the editing mode must be expressly entered (instead of simply typing over existing data) and prevents users from unknowingly switching their editing to a different row.

Filtering

Fusion Tables presents a simplified data filtration system that makes it easier to restrict the current data view based on any number of criteria. While the Fusion Table filtering is not as robust as that found in Sheets, it is more suitable for our purposes.

Custom Presentations

While Fusion Tables focuses on managing large data sets, it is also built to present the data with great flexibility. To this end, the system has a built in ability to present the data in many different ways including rows and cards. Rows are essentially echoes of the original table but cards allow for an almost endless customization of the data presentation. Using simplified HTML tags for the layout, cards permit users to quickly build templates for how the data will be presented. The presentation engine is not the most feature rich; still it enables basic reporting on a level that is extremely difficult within a spreadsheet all with ease.

Limited Views

Views, as they are called in Fusion Tables, are custom built representations of data sets. They can be crafted from a single table or several tables, they can include custom filters or present full data sets. Of most interest, however, is that once a View has been created and shared it is locked in. That is, the filters and tables connected to the View cannot be altered. All of the previously mentioned features are still available: the View will always show the most recent data and, if editing is allowed, will pass data edits back to the original data set. This is particularly useful in solving the problem of presenting each client with the most current data while not having to manage separate data stores and, at the same time, keeping other clients’ data protected.

An additional use of Views is in considering parent/child relationships. For example, some clients need access to information from several other clients by way of leasing agreement but the leasing clients should not have access to other leasing client data. Using Fusion Tables we can provide a View for the Lessor client that includes all the leasing client while providing individual leasing clients with Views of only their own data. This process would not require any additional data entries or any extra data updates. All data would be current and protected.

Improved Reporting

It would be wrong to say that Fusion Table will represent vastly improved external reporting (i.e. reports generated for email or print). Instead, reporting will go much the same as it has before, just run a little faster. Currently, entire data sets are reviewed in the process of finding the relevant information; scripts compare, line by line, data against a series of criterion. Fusion Tables uses an SQL like query methodology which means instead of importing a whole data set and then sifting through it to find the few relevant lines we can simply request the data that matching a given criterion and dispense with the sifting entirely.

17 June 2013

An Analysis for Transition from Spreadsheets to Fusion Tables, part 1

This is part of a multipart post taken from my essay entitled "An Analysis for Transition from Spreadsheets to Fusion Tables". It may sound boring, but a questioning reader may find it rather useful to understand the value of Google Sheets and Google Fusion Tables in the workplace and readers may also find additional ideas for improving their usage of either product.

Beginning the Ecosystem

Back when I first started working for my current employer, they were using an Access database. I say “they” because I did not use it any more than I had to. Instead, I instantly started adapting their systems to something I knew would handle the work better: Google Sheets.

I knew some of the issues the old systems had before even starting work. Being traditional software, the software and the data were bound to the hardware. Employees had to be at work in order to do work. This meant frequent calls to the office while on the road, daring “read-only” copies of the database for extend trips (and corresponding lists of changes that needed to be made) and messing remote control setups when phone calls were not enough.

Even on-site, there were still issues. Certain changes and reports could only be made while everyone was out of the system. Many days there was heavy coordination between users to determine who should access when and for how long and to make what changes.

I am no expert on Access change tracking (or Access in general) but there was no revision history. The database was always live and living. Catastrophic fails could only be recovered from the last back up. For small projects, this is an acceptable risk but considering that the entire company was centered around that single data store made the current revision controls difficult.

Finally, reporting was dismal. This is likely more because I was inexperienced with the tools, but even considering that, I knew that there were easier ways to liberate the data and combine it in new and interesting ways.

My first project was to create what would be dubbed “The Ecosystem”. It was so named because it would be a collection of spreadsheets, forms, documents, scripts and drawings that were all designed to flow and interact together. While the old database had provided a mostly unified data front, it was difficult to get elegant reports (again, more likely an issue with my skill than the actual software) and, perhaps more importantly, it was difficult to get the database to talk to anything beyond itself, let alone beyond our network.

Google Drive

The first leg of the project was actually not the data, it was the files. We had an onsite file server which was used to store all client documents. PDF files galore! There were folders for adding units, folders for removing units, folders for thinking about changing units, folders for actually doing it and more. Inside each folder were the pertinent files for the work done. Accuracy of records is rarely an issue because we tend to keep a record of everything.

The transition was smooth and easy. One day everything was on the server, the next day it started moving off. Files that were in transit to Google Drive were simultaneously moved to a new location on the server so that everyone knew that the files were in transit and not to make changes to the original documents. Using Google Drive, files were downloaded as soon as they had been uploaded so shortly everyone had access to the files.

There are a few added bonuses to using Google Drive for file storage. This method provides a form of distributed backup in that the software synchronizes a file across all of the computers to which the file has been shared. Previously, all data was hosted on the server which meant that a server crash would be detrimental to our operations. Now a server loss would be unfortunate but not detrimental. In fact, loss of any one computer would not affect operations beyond the function of the assigned user. This distributed scenario also works well for traveling users who can now access all but the most recent documents offline.

On the negative side, we have noticed that mass file changes can take an exorbitant amount of time to synchronize and has a high failure rate.

Company Data

The key component of the Ecosystem is a massive spreadsheet called “Company data”. This spreadsheet replaced the original database. While we lost some flexibility by converting a database into a spreadsheet, we gained far more power than we lost.

Access: Anytime, Anywhere

Putting Company data online instantly evolved our access options. Users can now access our data on their phones, tablets and computers while at home, across the state and, of course, at the office. No matter where we are, we can access the system. Google Drive apps provide adequate offline capability so that even if a device does not have an active internet connection, users can still see the data and edit it when connection is reestablished.

(While it would be superior to be able to edit the data while offline and sync when connected again, I can understand the immense technical issues with such a feat—not the least of which is figuring out conflict notification and resolutions—and we are quite content with the portability of data we currently enjoy; a vast improvement over our previous options.)

Most of the data transitioned very easily from the database into the spreadsheet. A separate sheet was created within Company data for each major table (i.e. “Truck data”, “Trailer data”, “Driver data”, etc.). While in some cases this combined multiple tables into a single sheet, we found that it generally worked with efficiency while reducing overlap and duplication of information.

Separate spreadsheets could have been created to handle the different tables and data types but we found it more efficient (for formulas, reporting and users) to put everything “under the same roof.” It is easier to say, “Everything that used to be in the old database can now be found here.”

Ad Hoc Modification

An unexpected benefit of Company data being a spreadsheet is that it became easy to modify the data structure. We began rearranging the column based on their logical groupings (i.e. keeping specialized columns together), moving columns most frequently referenced to the front and removing superfluous columns entirely.

While this was possible with the old database, such changes could not be done during live operations while everyone was in the database. The ease of change the structure in the spreadsheet means that changes are often instigated more by active users than the data keepers.

The changes could also be done by anyone, which is not necessarily a good thing. It is possible for a disgruntled or absent minded employee to easily rearrange the spreadsheet. We estimated the risk of such happening to be minimal as the changes are likely be noticed quickly and are easily recovered.

Formulas

One of my greatest qualms with a database is the incredibly static nature of the data or rather what can be done with it. Data in a spreadsheet is also inherently static but, being a spreadsheet, can easily be brought to life with formulas.

It is better to not think of formulas in the boring algebraic sense but in the more dynamic programmatic sense. A simple example is an early column that we added to Company data to indicate whether a client had registered for our online services or not. The first formula we had was an example of simple algebra: “=if(isblank(onlineKey), “No”, “Yes”)”, this simply puts a “Yes” or “No” in the column indicating if the client has an online key (an indication of online services registration). This formula was eventually replaced with a more programmatic one: “=if(isblank(onlineKey), “Not available”, hyperlink(‘http://spreadsheetURL’ & onlineKey, ‘Open’))”, which would provides users with a link to the client’s online services menu—a much more helpful feature in troubleshooting than simply saying, “Yes, the client is registered but you have to figure out how to get into their account before you can walk them through their problem.”

The ability to include not just basic information but more advanced features (such as the hyperlink) based on certain information allowed for an already valuable tool to become more useful and allowed for quick integration with new features and components of the Ecosystem as they came online. It should also be noted that this rapid adaptation via formulation was only possible when used in conjunction with the ability to simply and quickly alter the columns of the tables.

Collaboration

In the old database each user worked in a high degree of isolation. While we could all be social offline, there was no sociality online. Previous to my experiences, I would likely have scoffed at someone suggesting the need to be socially engaged online with people who sit within a few feet of your desk. I have learned, however, that it is more comfortable to work collaboratively online with people when there is a social presence online too.

Our Company data spreadsheet provides two tools that make online collaboration easy and natural. The first (and less useful of the two) in an in-sheet chat feature. With a click and a few keystrokes we can send a message to everyone looking at the spreadsheet. This is useful for moments when we are about to rearrange a sheet or, more frequently, resort the data (while the data will remain, its row will likely change, so it is a nice courtesy to inform other users before doing it).

The second tool (and, by far, the more useful of the two) is the cell highlight. It is an automatic feature that puts a brightly colored border around whatever cell each of the other users has clicked on. The highlight is a unique color for each user (though not each instance), moves in near real-time and includes a cell shading feature when the user is actually editing the cell (useful for knowing when something is about to update). While this presence feature may seem almost trivial, I have seen it profoundly affect our online workflow: reducing the online mystery, streamlining data input and work assistance.

Online collaboration is fraught with many pitfalls, one of which is overlapping work and conflict resolution. It is frustrating to be working through some data updates only to find out that someone else has been updating the same data, either right before or after you. In either case, time and effort were wasted. The cell highlight lets the users know where each other are working and, by inference, what they are working one. Duplication of work almost never happens with real-time collaboration tools.

This reduction of mystery also allows us to streamline data input. For example, if a user is adding a new row of data, another user can jump in and copy down any formatting and formulas to complete the input faster. The first user can focus on inputting the information (which only they have) while the second user can focus on the mundane (which they can access). This allows for more efficient work on a project when previously we would have been forced to separate the work for fear of overlap.

Beyond streamlining there are many times when working on a project with other users across the office where someone will be inquiring about a piece of data. In these cases, being able to see their highlight makes it easier to help quickly understand the context of their inquiry and to guide them (through one’s own highlight) to the answers they are looking for. This is especially true when they are on the phone with a client and we can casually identify the information they are looking at, anticipate what they will probably need and show them the relevant information.

Colors

While visual appeal should by no means be a “deal breaker,” it is certainly nice to be able to manipulate the presentation of Company data to look nice. Beyond just looking nice, we are able to adapt the spreadsheet to include the color schemes used in our branding. This helps to maintain a consistent image internally and thus externally. In general, we have matched our colors consistently across the work flow process so that banners printed on forms (and sometimes the color of the form itself) and shown in ancillary spreadsheets coordinate with the banner atop the appropriate Company data sheet.

In addition to making the spreadsheets look prettier, we also use their conditional formatting to quickly communicate important information through color coding. For example, cells with missing, non-vital information is flagged in a subtle yellow while missing vital information is flagged in a dull red (not too distracting but very apparent). Ratings and status updates also benefit from color coding; good is green, bad is red. This color coding helps users to quickly identify the most critical information, or lack thereof, in a data set.

Reporting

With formulas that incorporate programmatic formulas, we found a greatly reduced need for many basic reporting functions. Who wants to wait for a report to run when they can instantly sort, filter and read information they are looking for in the sheet already open on the screen?

Advanced reporting, however, has blossomed in the Ecosystem. This is for two reasons: the first is a semi-dedicated report designer (being myself) and a robust information infrastructure designed to be tapped for data.

Naming

This reporting friendly design came in the transition process from the old database to the new spreadsheet and was comprised mostly of consistent naming. For example, each of the sheets are labeled with a single plural descriptor and the word “data”. Additionally, each of the column names from the old database were tuned for optimal communication of the contents of the columns. Sometimes the column names are a bit lengthy, but that is preferable to not knowing what information can be found in the column.

While naming may seem something of a trivial nature, there can be a vast difference between building a report with highly cryptic placeholders that leave little indication for the casual user as to the data that should be expected and a robust naming scheme that removes most doubt about the expected data.

In a similar vein, placeholder tags in template reports are also verbosely named. For example, all client data related placeholders are flagged with a preceding “co” and our company data related placeholders (e.g. ratings generated using proprietary methods) are flagged with a preceding “tt”. While not as friendly as reading a completed report, this method still allows easy reading.

Styles

We identified two possible venues for running the reports and we use both depending on the usage case.

The first method generates the report entirely using internal process. This is best used for shorter reports that do not need to be visually sophisticated but where functionality is preferred. Most frequently, these are reports that we email out (either for employee usage or directly to clients). While such reports lack visual impact because we strip them of almost all visualizations, the bland environment actually improves readability because there are fewer distractions. In this case, blandness works very favorably for us: simple visual effects, such as font size changes, bolding or italicizing, can be used to quickly guide the reader to the most critical information.

The rule of simple visual presentation is particularly important considering that this style is most frequently used for emailed reports in which advanced visuals garner little respect and often discarded entirely leaving the recipient with a horrid jumbled mess of code and words. Keeping the presentation simple with infrequent accents reduces the chance for an unreadable message while keeping the message clean and readable.

Templates

The second method is almost the complete opposite of the first. While the data arranged and concatenated internally, the report layout is prebuilt in a documented template. This allows for the templates to be highly visually sophisticated, though in many cases less functional. Such a tradeoff is acceptable because generally these reports are delivered through print or specialized online methods and are intended to be more attention grabbing than day-to-day communications.

While we still strive to keep the layout clean and simple by reducing visual clutter and mitigating unnecessary distraction (i.e. not bolding and italicizing without good cause) we do allow for added accents to make it look better. For example, headers vary in color (to coordinate with the Company data color scheme) to differentiate the sections and a standardize company footer may be included.

Perhaps most importantly with the templates is that they are shared documents which means that they can be edited by anyone. That is, it does not take a feat of programming to alter the presentation of the reports. This was an important feature to prevent “personnel lock-in” while also greatly simplifying the process of programming the reports. Even simple visual alteration can be difficult to program, especially when compared with the ease of changing the look in even a basic text editor. Using templates allows for more time to be spent programming functionality instead of troubleshooting design through code.

Custom Menu Options

Using Google Apps Scripts, we were able to add a custom menu to the Company data spreadsheet (as well as some of our other ancillary sheets). These custom menus allow for the execution of scripts and thus expose the scripts to our users in an easily accessible manner.

One of the greatest learning experiences in building these menus options was building in protection against accidental execution. The first round of scripting merely required activation to run. This quickly led to problem of accidentally activating the wrong script. We added confirmation boxes to almost all scripts that describes what the script is about to do and then requests the user to confirm that they wish for the script to run. This generally adds a few seconds to the script’s running time but greatly reduces the number embarrassing, confusing and frequently redundant emails sent to our clients and users.

Connections

While Company data hosts most of our data, some of it feeds in from other spreadsheets. Google Sheets allows us to import this data nearly live by using a function called importRange. A good example of the usefulness of this function is our proprietary Ratings data. The source data and formulas are too complicated to want to include them in Company data, so they were set up in their own sheet. All the data in this spreadsheet is fed to a Ratings summary page which is then fed to the Company data sheet. While the data is not “live” it is very “fresh” (generally less than a minute old) and updates automatically. Breaking the two data sets apart while still being able to cross reference them made management of the data much easier.

Another example is one in which we “import” account numbers to a shared spreadsheet that tracks our progress, by company, through our work flow. Importing the data saves us the time of flipping between the tracking sheet and the Company data. Instead, all of the information we need is presented in a single view.

A drawback to using importRange is that in sharing the sheet for access, we have to enable full access to the reference sheet and the reference URL is exposed. This means that we cannot “import” data to show clients on spreadsheets that we share with them. To do so would expose other client data and represent a significant data breach. Instead, online reports generated for our clients have to have their data “pushed” to them instead of simply pulling it; this is a much more difficult process that is also less reliable.

06 June 2013

Paul Bloom: Why Do We Like What We Like?

Bloom said it well: "When you come to understand something, when you make sense of it and when you get pleasure from it, you're responding not merely to what you're looking at, not merely the physical features of the thing but to your beliefs about it: where you think it came from, who touched it, how it was made, what its deeper essence is."

Why Do We Like What We Like? (RSA)