The Shape of Data

The Shape of Data

I can picture data, typically for me a table is cuboidal.

On the left side of the face are columns and columns of keys. The first column is the primary key. The second column is the most important foreign key. We have document storage tables that have 30 foreign keys – 1 key for each shape that the data takes. Then we have various columns for meta data and interesting stuff. Somewhere on the far left are the numbers. At the very far right of the face of my cube are the fundamental audit columns answering who and when updated/create the data row.

The data is dimensional. Even inter-dimensional. It is visual to me. When looking at the document table, I see only the invoice and customer data when I am studying invoices. Then if the task relates to inventory, different columns stand in the mental foreground. This is more obvious when create an entity relationship diagram for a mature and robust database system that has been fully normalized and has like data consolidated with like data. For example, regardless of the nature of a document, put all documents in one table. Not necessarily guidance for others, but it is our best practice. “Like with Like” is a good database rule. It is the first step in normalizing data. Put your client profile data together, but don’t fill it up with phone numbers and email. If you have a lot of phone numbers, then clearly to place “like with like”, you need a phone number or an address or an email table.

There is a process of identifying and recognizing the shape of data when talking with clients.

Invoice data looks a certain way and it has for decades. At a minimum it requires two tables. There is a table that represents the header and footer seen on paper documents. This invoice is for someone, valued at some amount, issued on some day, due on some other day. The taxes are X and the discount for paying on time is Y. Each line, normally get set into a table that track the specific of what is being invoice: line number, quantity, product, description, unit price, extended price, taxes and total.

Like Plato and his ideal plane, these abstracted forms exist within the mind’s eye. When we, database developers, listen to civilian clients we compare their expectations with the ideal data model for each structure: client, contact people, concert ticket, airline ticket, inventory part, tractor repair service call (which is just like a car service).

In the recent quarter, my team encountered a request (demand) from the client that read: Create this table for these data. This is a temporary solution that we will shortly replace.

I was out of the office that day. I didn’t read the ticket and I wasn’t in a meeting. My team said: “That’s stupid. The client is wrong.” By the end of the week, the client was furious. “You didn’t build me the table I told you to make. FAIL.” And the ticket returned to the to-do column.

By the way, I skimmed the ticket as well. All of the features the client wanted were completed except for this one table. All of the visual work and functionality existed and worked perfectly.

The table was to be a bizarro intersection between service data, document data, and a mixed bag of other information. No sense of “like with like,” but hey, we accomplished the mission without some non-sensical, misaligned, never-planned table that was to be “temporary” in a massive enterprise application that we’ve been building for years.

I was happy with the work.

Until the day the sprint ended.

Client said: “I need this table.”

“Oh, why is that. We met all the requirements.”

“Because we are uploading thousands of rows of data from an external system using that table next week.”

A courtroom lawyer would state that those were facts not yet in evidence. I lost my temper because one does on the last day of a sprint when new and urgent requirements are invented at 7 in the morning.

I should never lose my temper. I must always be a database developer and an analyst.

“What is this for?”

“What is the shape of the data we are bringing in?”

“How soon do you need this?”

For the client, the answers were correct and perfect. For me, the answers were wrong and out of sync.

I reviewed our tables and sitting in Plato’s ideal plane was the exact table the client needed. Wait, I am wrong, it wasn’t in Plato’s ideal plane, it was in our existing table structure. At the conception of the project, we drafted robust tables to meet 90% of the client’s needs based on the experience of building related systems.

We were asked to create a table that bridged the data between a field service order and the resulting report created by the technician. Here’s the report data. Here are the service order details. We recognized years ago, we need a table that matrixed and related service order to report result to a report document (PDF). We knew it needed to be authored and signed by a technician and linked with precisely one service order. It needed a pass/fail score.

It was intuitive, existing, and not-a-surprise requirement given it had been there since the first draft of our data table structure. It existed not because we had sample data or because the client stated it as a requirement, but because to develop a complete and robust data model, this piece needed to exist.

An experienced eye can look at the data saying: this service order report table needs to exist because its data are unlike other data. These data are more like to each other, and they don’t belong other places. Oh, and these data need hooks (keys) to all sort of related data such as client, service order, service order line, employee, the report, and the PDF document.

The developer tasked with planning then importing these new data didn’t understand our structure or he didn’t research. The client listening to one development team told another development team to execute on a specific task while obscuring or not revealing the motivation for the new table.

What is it to see the shape of data and interpret it?

Is the hero, the one who had the vision in the ideal plane? No, not if that vision wasn’t shared well enough. What if that table already existed and was named in accordance with standards? No, not if other’s didn’t see what they were looking for?

Data fits within a model and that model, when well executed, has a shape. Within that structure, there should be no missing pieces. It ought to resemble a completed jigsaw puzzle except in multiple dimensions.

My answer, now knowing a new business/technical requirement was to add a series of fields:

  • Data Source (imported, native, etc)
  • Import date
  • Imported source key/ID

I put a few database comment fields there like breadcrumbs. Maybe, possibly, when the other team sees these, they will populate these data points during import so we can provide an audit backwards into the legacy data.

These structures to data provide a common framework for construction of an enterprise system. It is an abstract blueprint.

Video killed the Radio Star

During my career, I have witnessed so many changes and happily rode the wave of innovation. I bought my first IBM PC in the year they were released. First mobile phone: late 1980s.  I worked as a field engineer at Cisco Systems during the massive expansion of the internet in the early 2000s. I have published 4 technical books about software development before my 30th birthday. I wrote my first database application in 1985. I designed and built my first enterprise software application with Oracle in 1996.

That’s me.

My grandfather, a nationally syndicated radio guy, once described television as a passing fade. His colleagues on the NBC Red Network (yeah, dials and tubes in a box) each signed contracts to join NBC television. These guys went on to be the black-and-white faces in the 1950s and 1960s. We celebrate these TV pioneers in movies and books.

My grandfather faded into obscurity with the A.M. radio. My grandfather was a radio star. During WWII, he had unlimited gas rations to facilitate his role as a news guy reporting on global war. He was dead before I was born, if you’re trying to do that math.

Daily, weekly, I explore if I am a pioneer or am I a radio star. My sistah-by-another-mistah bought a Compaq luggable the week it got released. We bought the first laptops, first mobile phones. There is scene of me sitting on a carpeted floor manually keying an IP address and mask into my computer before dialing into IP based bulletin board. In the late 1990s and early 2000s, I was using the internet protocol to exchange tsunami data between satellites and buoys floating in the mid-Pacific. My team built a deployed the first telemedicine system in Alaska during that same time. That earns me full marks in pioneer.

Then I have these voices in my head shying away from newer tech. Am I suddenly the Radio Star?

Until that moment when I have to stop my pandemic-era truck and reboot it repeatedly to keep it going. I have to reboot my truck to get the nonsense on my dashboard working. Weekly, when I drive the thing, it tells me that it is doing a software patch….Oops, the internet connection dropped in the rural mountains where I live. “Please do not disconnect”. Right, the truck assumes that mobile signal is ubiquitous. It isn’t. I served as a rural paramedic for decades. 90% of my town of 40 square miles has no mobile phone signal. In fact, I think 70% of the two southern counties in my state have no or limited mobile. I drive a truck that presumes that the internet is a viable, sustainable, always available element of public infrastructure.

Why do I need my truck to reboot? Ever? I have a 20-year-old diesel farm tractor. It is happy with some tax-free diesel and a few hours of maintenance per year. I plug it during cold winters so it starts with one or two turns of the ignition key.

I now live in a world where our own tech competes against itself. My truck desperately wants me to drive down the mountain and park in a lot for an hour allowing it to actually update software. I don’t care to go down the hill and frankly, I don’t care if my truck gets a software upgrade. That is until I tried to back towards a trailer in a busy farmyard. It engaged auto-breaks when tall grasses moved in the breeze. It freaked with alarms when I tried to back between 2 other trailers to get to the one I was connecting. Then did it ever freak when the trailer I plugged into did not meet the specifications within the existing digital definitions of trailers. Yes, my truck auto-braked and screamed with alarms when the weight of the trailer dramatically changed. I had to hack the system by disconnecting and reattaching the trailer’s wiring. Oh, right I unloaded a tractor. My truck panic and though the load was unsafe.

My truck broke a sensor while driving through a farm field in 4WD to check on a friends hogs (not motorcycles but pre-bacon critters). It spent 2, nearly 3 weeks at the shop because that failure generated a cascade of electronic failures that cost someone a lot of money to fix (not me, warrenty).

When did I become the anti-tech person? I am a tech pioneer. I am that soldier that says ‘follow me.’

There is a specific place for human, real, intelligence in our approach and employment of technology. My friend and neighbor, who’s husband was a Nobel laureate and Pulitzer winner finally quit teaching creative writing at a prestigious New England university when 100% of the papers she had to grade were generated by AI. I commiserated with her with genuine but distant sympathy.

Then I started encountering AI generated PL/SQL code in our application development. Geez,man I get it doing a rollup summary report without code references or a library can be challenging. But then it fails. In decades of teaching and leading teams of developers, suddenly, I am being told that this code is good because it was AI generated.

And yet, it fails. Sure, AI generated PL/SQL code will improve. We publish basic rules, standards, within our guidelines. Hey, validate your parameters. Who knows what other procedures or pages will call that code? Evaluate the quality of the data before, during and following processing. Data quality is a variable. And what Oracle programmer hasn’t had to learn the challenge of nulls within our data. Or had to manage the regional assumptions of NLS settings. Suddenly, you get errors about “mask terminated” when a date format doesn’t match the format of the date data presented.

During the recent years, we have written and supported an enterprise application that also generates invoices. As of last month, we exceeded 100K invoice since going live 18 months ago. Our error rate is below 0.01%. These errors are 100% due to poor quality data coming via an API from a legacy system. We are not permitted a math error. We cannot ball up the tax calculations. There are legal and regulatory impacts to mistakes at that level. My own firm has a commercial time/expense billing system that has significantly over 1M rows. And during our time supporting hurricane recovery in Puerto Rico, we used an Oracle database application to manage $5B in federal grant funding and over 400K PDF documents.

Know who is not allowed to make mistakes? Us.

Oracle makes these jobs a joy to work with. The power and the prestige of the Oracle brand and Oracle tech permit us to operate within these parameters.

A colleague offered great wisdom to me a decade ago when I called for help with a problem. He said, “Christina, there is no random code generator in your application. If your data is doing something, it is happening because you told it so.”

That’s right! I have the expertise to know, understand, and audit 100% of the data manipulations. I can be wrong in my thinking, my approach, or even my typing. Data we pull in from legacy systems can be wrong, incomplete, or just plain screwy. It is our job to fix or reject that data. But once in our system, the tools that manipulate our data were written by skilled software developers with years of training and skills.

There I am positing myself as a luddite. These are the words of a data pioneer, a tech pioneer.

The strength of an Oracle database is predicated on the legacy of database pioneering from the 1970s. The power, strength, and capabilities of PL/SQL sits in the very boring, very real, very stable, world of generating invoices, purchase orders, inventory systems, issuing airline tickets, tracking and balancing bank balances and credit card balances. I recognize the Radio Star in me when I question the role of AI in the mundane development of enterprise application systems that are used to manage public funds, public trusts, and public documents.

I try to recruit people into our world of Oracle database development and writing code in PL/SQL. It isn’t sexy. It isn’t glamorous. It is barely even in full color. It is a trade. We are the modern plumbers and modern carpenters. Our work is precision. Our work is elegant. Our work succeeds or fails based our own skills to comprehend business data and model it within an digital framework. Then when needed generate results that have an error rate that is as close to zero as possible.

Did I become my own grandfather who died before I was born? Did I become the crusty old guy that says, “back in my day?” And I the radio star?

There is a way of thinking that is database centric that we teach, and young developers need to learn. We need to instantly look at data and evaluate its fitness, quality, and normalization. We need to write code that works beyond the narrow scope of perfect data. It gracefully tolerates poor quality data, null data, data presented in unexpected formats.

Like the Fortran and Cobal programmers before me, I knew that Oracle PL/SQL is both contemporary and legacy at the same time. We are building massive systems that will live for decades and decades pushing dollars, documents, and data at the behest of humans. We must build systems that survive every audit. We must preserve data that can take the beating placed on it by certified or chartered accountants.

A PL/SQL developer stepping into the market today will be employed through their entire career if they honestly develop the technical skills to write outstanding, but boring, code that manages and manipulates massive data sets. I celebrate this.

Pioneer? Or Radio Star? You decide.

A note about the title:

“Video killed the Radio Star” was the very first music video that played on MTV at 12:01AM on 01 AUG 1981. In February of 2000, when I was at Cisco Systems, it became the One Millionth video to have played on MTV. Today, there are those who might ask: What is MTV?

About my grandfather

He wasn’t a complete looser, guys! He published 20 novels and numerous films. Although most of his films are lost to history due to the nature of cellulose film. My father published over 20 novels as well. And during the autumn of 2024, I have my first novel being published. That would be my 5th book.