PUID

In computer systems, data that is stored onto static storage such as a flash drive or hard drive is referred to as “data-at-rest”. In contrast, data that is sent from point A to point B such as chat messages or network traffic is referred to as “data-in-transit”. The mindsets and toolsets that address these two states of data are very different especially when it comes to securing the data.

Data-at-rest techniques usually involve some form of encrypting the data prior to saving it onto the storage mechanism. This is usually called file encryption. Data-in-transit techniques usually rely on utilizing secure protocols which are methods of creating a secure pathway between the two endpoints thus anything sent between the two points are deemed secure. Nowadays, there are hybrids of these methodologies called End-To-End Encryption (E2EE) where messages may be encrypted prior to sending between the two points via a secure communication session. Today’s E2EE solutions offer some semblance of security but are often non-standard, hard to integrate, hard to manage and/or fall short of securing both states of data in a seamless, cohesive fashion.

In Data Centric Design (DCD), there is only one state of data: transmission. Data is generated, transmitted, and then consumed. The NUTS paradigm augments that DCD sequence to: data is generated, secured, transmitted, authenticated, and then consumed.

Storing data is an act of transmitting the data to the future.

For example, to perform data-in-transit securely usually requires two participants, Alice and Bob, to form a communications channel between them using a secure protocol and move data through the channel at time T_NOW. When this is done properly, Alice can send Bob a message in near real-time securely. If we replace Bob with Alice and change the time to T_LATER, this can become data-at-rest: Alice is securely sending a message to her future self.

Why does that matter? It simplifies what is traditionally considered two separate methods of securing data down to one unifying view. Thus, we are left to solve only the single problem of securing data for transmission. In this definition, the distinctions between data storage and transmission are blurred to mean the same thing, it’s just a matter of timing. In the T_NOW case, the consumption of the data is done nearly instantaneously by Bob, but in the T_LATER case, the consumption of the data is done at a later time by Alice. The example can be expanded to allow Bob or anyone else to consume the transmitted data at a later time.

Time = T_NOW	To Alice	To Bob
From Alice	Data-in-transit	Data-in-transit
From Bob	Data-in-transit	Data-in-transit

Time = T_LATER	To Alice	To Bob
From Alice	Data-at-rest	Data-at-rest
From Bob	Data-at-rest	Data-at-rest

A functional expression:

transmit(m, s, t_s, r, t_r)

where

m message

s sender

t_s send time

r receiver

t_r receive time

therefore

send_message(m, Alice, Bob) ≈ transmit(m, Alice, t₀, Bob, t₀)

save_to_disk(filename, m) ≈ transmit(m, Alice, t₀, filename, t₀)

read_from_file(filename) ≈ transmit(m, filename, t₁, Alice, t₁)

There are many ways to secure messages before transmission but very few offer a secure container that can be used for both states of data in a consistent, simplified and independent manner. What I’ve deduced over the years is that many problems that we have with our digital systems can be traced back to inadequately designed data containers. NUTS provides the technology to solve these inadequacies. In a later post, we’ll examine concepts called Strong and Weak Data Models.

This illustrates a core technique that Data Centric Design applies to problems big and small regardless of technical domains: root cause analysis. Finding the root cause of some problems may require you to re-frame the questions with a new perspective so if you solve the root cause then many of the symptoms never appear. The hard part is collecting a set of symptoms that appear unrelated and then looking for possible relationships.

Last week, DC CyberWeek presented very informative events, and many opportunities to network with some very smart people in the cybersecurity industry. The CyberScoop folks did an incredible job of organizing the whole affair. My deepest thanks to Julia Avery-Shapiro from CyberScoop for accepting our event idea, and for guiding us through hosting our first ever cyber security event. I cannot forget our attorney Jim Halpert for graciously offering the use of their offices at DLA Piper, and for the coordination wizardry of Susan Owens. I hope the folks who attended were rewarded with new information and ideas from our presentation.

This is the NUTS version of Data Centric Design (DCD):

A data centric model of computer software design is where user data may be prioritized over applications. A data centric software design may allow for data to be secured at the point of storage. The containerization of data may be an embodiment of a data centric design.

After searching for years, I did not find a definition of DCD I felt was appropriate for the term so I created one. I will introduce the thought process and the approach I took to get here. This segment will concentrate on the unique identity aspect of data.

DNA is a marvel of Nature, and Nature is a master of data management. Nature stores and manipulates its key data in organic form. I decided to examine it from a digital design perspective and see what lessons could be learned from this complex data structure that has been organically developed over billions of years. Nature’s development cycle is a bit more time consuming than the edit-compile-exec cycle, but it still does things that we can only dream about. Every day, geneticists and molecular biologists are gaining more knowledge and techniques for manipulating DNA. Many Nobel prizes have been awarded, animals have been cloned, the human genome mapped, and we have experimented with gene therapy. But yet, we are just beginning to understand the monumental task of figuring out Nature’s higher-level design patterns.

I’m not a geneticist or molecular biologist. I’m not trying to duplicate DNA in bits. In the end, bits are a simple form of data storage implemented in electromagnetic or optical devices and media. I wanted to ask some questions based on our understanding of DNA such as:

What are the characteristics of DNA that makes it useful to Nature? To us?
Which of those characteristics are useful for digital data?
How do I express it in digital form?
How can it affect digital data?

At the outset, these seemed like simple, straightforward questions, and five years later, I share my findings with you in the form of NUTS (eNcrypted Userdata Transit & Storage).

The first characteristic that popped out from DNA is identity. We have all seen one too many episodes of NCIS, or its crime drama equivalent, to know that DNA is constantly used to identify people. For the most part, DNA can be considered a unique identifier for a person. There may be exceptions, but let’s not ruin my party here. As I said, I’m not trying to replicate DNA in bits.

In the digital world, a unique identity, or identifier, is usually referred to as a Universally Unique identifier (UUID), or a Globally Unique Identifier (GUID). These IDs are usually a long digital number represented by up to 128 bits in length. There are suggested formats and methods for coming up with this unique number, and the probability of any two UUID’s being the same is small, but it is not zero. This is referred to as an ID collision. Note that the terms UUID and GUID have the words Universal and Global, yet everyone in the know recognizes that it is not so. A tad misleading, wouldn’t you agree? Look at what happened with the Y2K circus. Yes, I get it, different times and different constraints, but keep in mind that our current constraints are really limited by our imagination. We live in an age where a top of the line smart phone is now offered with 256GB of storage.

In NUTS, I have decided to call it a NutID, and my suggested starting size is 512 bits (there are many reasons why I like the acronym NUTS). The NutID is a rather large identifier by any measure, but it matches its large ambitions. This is big, HUUGE!! Trump’s vernacular is catchy to say the least.

The source information to create a NutID begins with a combination of environmental factors and other randomized components to make it as unique as possible. It will then apply a 512 bit SHA2 hash on this source data to generate the NutID. A SHA2 hash is a function that can derive a fixed length representation of variable length data, therefore a hash can be used for mapping tables and integrity purposes of the source data. Another reason for a hash rather than some standard format is to provide anonymity. If there is a well-known format, there will be implied information embedded in the identifier. A SHA2 512 hash makes that task a tad more difficult. Computer scientists consider a hash function like SHA2 512 to be essentially irreversible.

Such an identifier is not guaranteed to be unique, it’s the nature of hashes and the source entropy that we provide for it to chew on. Therefore, I gave it a more modest technical term as in Practically Unique Identifier or PUID. If there is one thing that you learn here is that if anyone ever lets on that a hash is guaranteed to be unique, walk out the door. Periodically, I will introduce new terms which I plucked out of thin air to describe various aspects of Data Centric Design.

What do I want to do with such a large number as a NutID?

Like “Harold and the Purple Crayon”, I want to stamp everything with it! I want to stamp every piece of storable data I (you) create with a NutID. Why?

Why not? It’s my (your) computer and I (you) can do what I (you) want with it.
Who said only institutions and companies are allowed to create serial numbers?
My data is more important to me than any other data.
If it’s good enough for Nature, it’s good enough for me.
Identifying data at the point of creation (you and your device) with a NutID allows it to be referenced forever.
Computers don’t process pathnames very well.

A NutID is meant to be created in a massively distributed and completely independent way by you and your computer without ever talking to anyone else. The NutID lays the groundwork for how a Nut will behave in its ecosystem. It will not replace existing conventions, it will coexist and enhance it. It has a few features that we will go over in later posts that you will never see anywhere else. We will continue this investigation of learning from DNA and Nature’s mastery of data management.

To every little piece of data out there,
you matter,
you have a name,
it’s your NutID.
Go forth unafraid.

NUTS Technologies will be hosting an event at Cyberweek in Washington D.C. on Tuesday, October 17 at 2pm. This will be a small discussion group with limited seating.

Internet of Data™

Simplifying Privacy

To rest, or not to rest, that is the question

What is Data Centric Design?