This is the NUTS version of Data Centric Design (DCD):
A data centric model of computer software design is where user data may be prioritized over applications. A data centric software design may allow for data to be secured at the point of storage. The containerization of data may be an embodiment of a data centric design.
After searching for years, I did not find a definition of DCD I felt was appropriate for the term so I created one. I will introduce the thought process and the approach I took to get here. This segment will concentrate on the unique identity aspect of data.
DNA is a marvel of Nature, and Nature is a master of data management. Nature stores and manipulates its key data in organic form. I decided to examine it from a digital design perspective and see what lessons could be learned from this complex data structure that has been organically developed over billions of years. Nature’s development cycle is a bit more time consuming than the edit-compile-exec cycle, but it still does things that we can only dream about. Every day, geneticists and molecular biologists are gaining more knowledge and techniques for manipulating DNA. Many Nobel prizes have been awarded, animals have been cloned, the human genome mapped, and we have experimented with gene therapy. But yet, we are just beginning to understand the monumental task of figuring out Nature’s higher-level design patterns.
I’m not a geneticist or molecular biologist. I’m not trying to duplicate DNA in bits. In the end, bits are a simple form of data storage implemented in electromagnetic or optical devices and media. I wanted to ask some questions based on our understanding of DNA such as:
- What are the characteristics of DNA that makes it useful to Nature? To us?
- Which of those characteristics are useful for digital data?
- How do I express it in digital form?
- How can it affect digital data?
At the outset, these seemed like simple, straightforward questions, and five years later, I share my findings with you in the form of NUTS (eNcrypted Userdata Transit & Storage).
The first characteristic that popped out from DNA is identity. We have all seen one too many episodes of NCIS, or its crime drama equivalent, to know that DNA is constantly used to identify people. For the most part, DNA can be considered a unique identifier for a person. There may be exceptions, but let’s not ruin my party here. As I said, I’m not trying to replicate DNA in bits.
In the digital world, a unique identity, or identifier, is usually referred to as a Universally Unique identifier (UUID), or a Globally Unique Identifier (GUID). These IDs are usually a long digital number represented by up to 128 bits in length. There are suggested formats and methods for coming up with this unique number, and the probability of any two UUID’s being the same is small, but it is not zero. This is referred to as an ID collision. Note that the terms UUID and GUID have the words Universal and Global, yet everyone in the know recognizes that it is not so. A tad misleading, wouldn’t you agree? Look at what happened with the Y2K circus. Yes, I get it, different times and different constraints, but keep in mind that our current constraints are really limited by our imagination. We live in an age where a top of the line smart phone is now offered with 256GB of storage.
In NUTS, I have decided to call it a NutID, and my suggested starting size is 512 bits (there are many reasons why I like the acronym NUTS). The NutID is a rather large identifier by any measure, but it matches its large ambitions. This is big, HUUGE!! Trump’s vernacular is catchy to say the least.
The source information to create a NutID begins with a combination of environmental factors and other randomized components to make it as unique as possible. It will then apply a 512 bit SHA2 hash on this source data to generate the NutID. A SHA2 hash is a function that can derive a fixed length representation of variable length data, therefore a hash can be used for mapping tables and integrity purposes of the source data. Another reason for a hash rather than some standard format is to provide anonymity. If there is a well-known format, there will be implied information embedded in the identifier. A SHA2 512 hash makes that task a tad more difficult. Computer scientists consider a hash function like SHA2 512 to be essentially irreversible.
Such an identifier is not guaranteed to be unique, it’s the nature of hashes and the source entropy that we provide for it to chew on. Therefore, I gave it a more modest technical term as in Practically Unique Identifier or PUID. If there is one thing that you learn here is that if anyone ever lets on that a hash is guaranteed to be unique, walk out the door. Periodically, I will introduce new terms which I plucked out of thin air to describe various aspects of Data Centric Design.
What do I want to do with such a large number as a NutID?
Like “Harold and the Purple Crayon”, I want to stamp everything with it! I want to stamp every piece of storable data I (you) create with a NutID. Why?
- Why not? It’s my (your) computer and I (you) can do what I (you) want with it.
- Who said only institutions and companies are allowed to create serial numbers?
- My data is more important to me than any other data.
- If it’s good enough for Nature, it’s good enough for me.
- Identifying data at the point of creation (you and your device) with a NutID allows it to be referenced forever.
- Computers don’t process pathnames very well.
A NutID is meant to be created in a massively distributed and completely independent way by you and your computer without ever talking to anyone else. The NutID lays the groundwork for how a Nut will behave in its ecosystem. It will not replace existing conventions, it will coexist and enhance it. It has a few features that we will go over in later posts that you will never see anywhere else. We will continue this investigation of learning from DNA and Nature’s mastery of data management.
To every little piece of data out there,
you matter,
you have a name,
it’s your NutID.
Go forth unafraid.
NUTS Technologies will be hosting an event at Cyberweek in Washington D.C. on Tuesday, October 17 at 2pm. This will be a small discussion group with limited seating.