The Shape of Data
The Shape of Data
I can picture data, typically for me a table is cuboidal.
On the left side of the face are columns and columns of keys. The first column is the primary key. The second column is the most important foreign key. We have document storage tables that have 30 foreign keys – 1 key for each shape that the data takes. Then we have various columns for meta data and interesting stuff. Somewhere on the far left are the numbers. At the very far right of the face of my cube are the fundamental audit columns answering who and when updated/create the data row.
The data is dimensional. Even inter-dimensional. It is visual to me. When looking at the document table, I see only the invoice and customer data when I am studying invoices. Then if the task relates to inventory, different columns stand in the mental foreground. This is more obvious when create an entity relationship diagram for a mature and robust database system that has been fully normalized and has like data consolidated with like data. For example, regardless of the nature of a document, put all documents in one table. Not necessarily guidance for others, but it is our best practice. “Like with Like” is a good database rule. It is the first step in normalizing data. Put your client profile data together, but don’t fill it up with phone numbers and email. If you have a lot of phone numbers, then clearly to place “like with like”, you need a phone number or an address or an email table.
There is a process of identifying and recognizing the shape of data when talking with clients.
Invoice data looks a certain way and it has for decades. At a minimum it requires two tables. There is a table that represents the header and footer seen on paper documents. This invoice is for someone, valued at some amount, issued on some day, due on some other day. The taxes are X and the discount for paying on time is Y. Each line, normally get set into a table that track the specific of what is being invoice: line number, quantity, product, description, unit price, extended price, taxes and total.
Like Plato and his ideal plane, these abstracted forms exist within the mind’s eye. When we, database developers, listen to civilian clients we compare their expectations with the ideal data model for each structure: client, contact people, concert ticket, airline ticket, inventory part, tractor repair service call (which is just like a car service).
In the recent quarter, my team encountered a request (demand) from the client that read: Create this table for these data. This is a temporary solution that we will shortly replace.
I was out of the office that day. I didn’t read the ticket and I wasn’t in a meeting. My team said: “That’s stupid. The client is wrong.” By the end of the week, the client was furious. “You didn’t build me the table I told you to make. FAIL.” And the ticket returned to the to-do column.
By the way, I skimmed the ticket as well. All of the features the client wanted were completed except for this one table. All of the visual work and functionality existed and worked perfectly.
The table was to be a bizarro intersection between service data, document data, and a mixed bag of other information. No sense of “like with like,” but hey, we accomplished the mission without some non-sensical, misaligned, never-planned table that was to be “temporary” in a massive enterprise application that we’ve been building for years.
I was happy with the work.
Until the day the sprint ended.
Client said: “I need this table.”
“Oh, why is that. We met all the requirements.”
“Because we are uploading thousands of rows of data from an external system using that table next week.”
A courtroom lawyer would state that those were facts not yet in evidence. I lost my temper because one does on the last day of a sprint when new and urgent requirements are invented at 7 in the morning.
I should never lose my temper. I must always be a database developer and an analyst.
“What is this for?”
“What is the shape of the data we are bringing in?”
“How soon do you need this?”
For the client, the answers were correct and perfect. For me, the answers were wrong and out of sync.
I reviewed our tables and sitting in Plato’s ideal plane was the exact table the client needed. Wait, I am wrong, it wasn’t in Plato’s ideal plane, it was in our existing table structure. At the conception of the project, we drafted robust tables to meet 90% of the client’s needs based on the experience of building related systems.
We were asked to create a table that bridged the data between a field service order and the resulting report created by the technician. Here’s the report data. Here are the service order details. We recognized years ago, we need a table that matrixed and related service order to report result to a report document (PDF). We knew it needed to be authored and signed by a technician and linked with precisely one service order. It needed a pass/fail score.
It was intuitive, existing, and not-a-surprise requirement given it had been there since the first draft of our data table structure. It existed not because we had sample data or because the client stated it as a requirement, but because to develop a complete and robust data model, this piece needed to exist.
An experienced eye can look at the data saying: this service order report table needs to exist because its data are unlike other data. These data are more like to each other, and they don’t belong other places. Oh, and these data need hooks (keys) to all sort of related data such as client, service order, service order line, employee, the report, and the PDF document.
The developer tasked with planning then importing these new data didn’t understand our structure or he didn’t research. The client listening to one development team told another development team to execute on a specific task while obscuring or not revealing the motivation for the new table.
What is it to see the shape of data and interpret it?
Is the hero, the one who had the vision in the ideal plane? No, not if that vision wasn’t shared well enough. What if that table already existed and was named in accordance with standards? No, not if other’s didn’t see what they were looking for?
Data fits within a model and that model, when well executed, has a shape. Within that structure, there should be no missing pieces. It ought to resemble a completed jigsaw puzzle except in multiple dimensions.
My answer, now knowing a new business/technical requirement was to add a series of fields:
- Data Source (imported, native, etc)
- Import date
- Imported source key/ID
I put a few database comment fields there like breadcrumbs. Maybe, possibly, when the other team sees these, they will populate these data points during import so we can provide an audit backwards into the legacy data.
These structures to data provide a common framework for construction of an enterprise system. It is an abstract blueprint.