What is Data Centric Design?

This is the NUTS version of Data Centric Design (DCD):

A data centric model of computer software design is where user data may be prioritized over applications. A data centric software design may allow for data to be secured at the point of storage. The containerization of data may be an embodiment of a data centric design.

After searching for years, I did not find a definition of DCD I felt was appropriate for the term so I created one.  I will introduce the thought process and the approach I took to get here. This segment will concentrate on the unique identity aspect of data.

DNA is a marvel of Nature, and Nature is a master of data management.  Nature stores and manipulates its key data in organic form.  I decided to examine it from a digital design perspective and see what lessons could be learned from this complex data structure that has been organically developed over billions of years. Nature’s development cycle is a bit more time consuming than the edit-compile-exec cycle, but it still does things that we can only dream about. Every day, geneticists and molecular biologists are gaining more knowledge and techniques for manipulating DNA. Many Nobel prizes have been awarded, animals have been cloned, the human genome mapped, and we have experimented with gene therapy. But yet, we are just beginning to understand the monumental task of figuring out Nature’s higher-level design patterns.

I’m not a geneticist or molecular biologist. I’m not trying to duplicate DNA in bits. In the end, bits are a simple form of data storage implemented in electromagnetic or optical devices and media.  I wanted to ask some questions based on our understanding of DNA such as:

  • What are the characteristics of DNA that makes it useful to Nature? To us?
  • Which of those characteristics are useful for digital data?
  • How do I express it in digital form?
  • How can it affect digital data?

At the outset, these seemed like simple, straightforward questions, and five years later, I share my findings with you in the form of NUTS (eNcrypted Userdata Transit & Storage).

The first characteristic that popped out from DNA is identity. We have all seen one too many episodes of NCIS, or its crime drama equivalent, to know that DNA is constantly used to identify people. For the most part, DNA can be considered a unique identifier for a person. There may be exceptions, but let’s not ruin my party here. As I said, I’m not trying to replicate DNA in bits.

In the digital world, a unique identity, or identifier, is usually referred to as a Universally Unique identifier (UUID), or a Globally Unique Identifier (GUID).  These IDs are usually a long digital number represented by up to 128 bits in length.  There are suggested formats and methods for coming up with this unique number, and the probability of any two UUID’s being the same is small, but it is not zero. This is referred to as an ID collision. Note that the terms UUID and GUID have the words Universal and Global, yet everyone in the know recognizes that it is not so. A tad misleading, wouldn’t you agree? Look at what happened with the Y2K circus. Yes, I get it, different times and different constraints, but keep in mind that our current constraints are really limited by our imagination. We live in an age where a top of the line smart phone is now offered with 256GB of storage.

In NUTS, I have decided to call it a NutID, and my suggested starting size is 512 bits (there are many reasons why I like the acronym NUTS).  The NutID is a rather large identifier by any measure, but it matches its large ambitions. This is big, HUUGE!! Trump’s vernacular is catchy to say the least.

The source information to create a NutID begins with a combination of environmental factors and other randomized components to make it as unique as possible. It will then apply a 512 bit SHA2 hash on this source data to generate the NutID.  A SHA2 hash is a function that can derive a fixed length representation of variable length data, therefore a hash can be used for mapping tables and integrity purposes of the source data.  Another reason for a hash rather than some standard format is to provide anonymity. If there is a well-known format, there will be implied information embedded in the identifier. A SHA2 512 hash makes that task a tad more difficult. Computer scientists consider a hash function like SHA2 512 to be essentially irreversible.

Such an identifier is not guaranteed to be unique, it’s the nature of hashes and the source entropy that we provide for it to chew on. Therefore, I gave it a more modest technical term as in Practically Unique Identifier or PUID. If there is one thing that you learn here is that if anyone ever lets on that a hash is guaranteed to be unique, walk out the door. Periodically, I will introduce new terms which I plucked out of thin air to describe various aspects of Data Centric Design.

What do I want to do with such a large number as a NutID?

Like “Harold and the Purple Crayon”, I want to stamp everything with it! I want to stamp every piece of storable data I (you) create with a NutID. Why?

  • Why not? It’s my (your) computer and I (you) can do what I (you) want with it.
  • Who said only institutions and companies are allowed to create serial numbers?
  • My data is more important to me than any other data.
  • If it’s good enough for Nature, it’s good enough for me.
  • Identifying data at the point of creation (you and your device) with a NutID allows it to be referenced forever.
  • Computers don’t process pathnames very well.

A NutID is meant to be created in a massively distributed and completely independent way by you and your computer without ever talking to anyone else. The NutID lays the groundwork for how a Nut will behave in its ecosystem. It will not replace existing conventions, it will coexist and enhance it. It has a few features that we will go over in later posts that you will never see anywhere else. We will continue this investigation of learning from DNA and Nature’s mastery of data management.

To every little piece of data out there,
you matter,
you have a name,
it’s your NutID.
Go forth unafraid.

NUTS Technologies will be hosting an event at Cyberweek in Washington D.C. on Tuesday, October 17 at 2pm. This will be a small discussion group with limited seating.

 

The God Key Problem: Digitizing the Dynamic Nature of Trust

By Yoon Auh,  Founder of NUTS Technologies, Inc., the world leader in Data Centric Designs of secure data systems.

Snowden. We only have to mention this single name to conjure up a plethora of views on the matter that plastered the headlines in 2013. Regardless of your personal views on the matter between the US Government and its ex-consultant, there are two main issues that this incident highlights and should be addressed objectively: 1) the dynamic nature of trust  and 2) the God Key problem.

Trust is serious business. We depend on trust in each other, in social systems, in courts, in childcare, in policing, and many other societal foundations in order to live a relatively secure and care free lives. Our trust in the sanctity of contracts, law and its enforcement mechanisms and the continuity of trust enables the average person to earn a living and plan their futures decades in advance. But trust changes over time. It is dynamic. This is why there are laws and the enforcement of those laws in order to keep everyone in a position of trust in check.

The dynamic nature of trust is something everyone learns and understands over time when growing up. It changes a lot especially when it comes to interpersonal relationships. If this were not the case, why would our national divorce rate be close to 50%? This happens more frequently in our professional lives and is generally considered a healthy thing. The senior engineer you entrusted your next big project to is jumping ship to a competitor to get more responsibility and better pay. Your top portfolio manager is leaving after establishing a track record at your firm to become a partner at a fast growing hedge fund. Even in the brutal world of drug trafficking, the betrayal of trust is dealt with terminal violence.

The nature of trust is mutual, unilateral and exquisitely temperamental. Both parties rely on the passage of future events to determine their level of trust in one another; therefore trust is mutual between the two parties. Trust is built over time by many trustful deeds and events between the two entities. The first instance of an untrustworthy event may nullify the entire history of trust in the relationship. The intended trustful relationship is unilateral because each party may independently violate the trust in the relationship. Whether the violating party decides to alert the other party to this change in the relationship results in the complicated saga of betrayals.

The trust in a relationship is further complicated by adding in the self-interest priorities of each participant. It’s a quagmire of chess-like strategies with imperfect information.

The God Key problem is an age old computer science issue and is a principal culprit of most cybersecurity scandals and hacks. Most computers designed and manufactured today have an administrative mode of operation which gives the user unfettered access to everything within the computer’s domain or physical hardware. There are some exceptions to this but this is the predominant model. Most companies that rely on computer processing for their business operations will have a group of administrators who have the God Key to all the business systems in the company. This is a necessary evil since computer systems do not administer themselves and they are in constant need of maintenance in both software and hardware. The God Key is not just one all access key, but it’s the combination of access credentials that are given to administrators to allow them unfettered access to all the systems within their domain.

In most corporate settings I’ve been in, it’s been a naturally occurring event to see smart young technically adept individuals being given ready access to the God Key of corporate systems and networks. Much of advanced technology is tamed by young people who are fresh out of school with the latest knowledge and techniques. Sometimes the best ones are self taught and nerdy renegades answering to the siren call of large paychecks for performing tasks they would do for free on their own time just for the hell of it. We sometimes call these people hackers both white hat and black hat.

The most frustrating thing about this situation is the quandary that managers are facing when pressed for time and talent in a crisis: who do you give the God Keys to so the job can get done in a hurry? You give it to the most skilled operator. In the world of IT, the most skilled does not correlate to the most experienced, the most knowledgeable, the most seasoned, nor the most trustworthy. The operator who is given the God Key may have been deemed trustworthy at one point but that trust may have changed over time. How is an institution supposed to measure that? How do they keep track of that? What if the operator hides the fact to his employer that he cannot be trusted anymore?

Snowden. This is exactly the scenario that played out between the NSA and Snowden. The dynamic nature of trust sucks in a digital world, even to the NSA who is in the business of trust. The point is that this problem exists in every institutional computer system. The NSA has plenty of company. Everyone deals with these problems in one way or another but most conventional ways are inadequate and does not address the issues related to the dynamic nature of trust.

To solve this thorny problem, you need to be able to separate the ability to administer the system from the ability to read everything within its domain. Most complex systems are designed to be centrally controlled. They may have distributed access, distributed storage or distributed processing, but administration is usually tightly controlled in a central manner. What company doesn’t like control? In fact, most institutions in the free world are hierarchically structured like little dictatorships. But these structures work and people adapt to them naturally so it is the predominant organizing mode of mass productivity.

All the sensitive data of the company needs to be secured to block out the curious gaze of the system administrators. We have many systems to do such containments but most of them require central management and some version of authorization token based access control: you see the circular logic problem here, central management requires an administrator with the God Key to that particular system which raises the same issues.

The solution lies in data containers that can act as its own reference monitors working with a truly independent and distributed key management system.

This problem took me over 5 years to solve. Along the way, the solution set that was crafted solved many other nagging issues. The approach that I came up with is called Data Centric Design. It is unrelated to any definition of Data Centric Design on the web today. This is a new technology. It is a radical technology. It is an adaptive technology.

It forges Applied Cryptography in new ways to construct a framework where Data can grow up and do some things for itself. That’s right. Data is personified in that last sentence. It’s about time that Data got smarter and learn a few tricks of its own rather than depending on applications to wipe its butt every time because we are learning that the butt wipers are not all that trustworthy nor competent in this computerized world of ours.

Welcome to the world of eNcrypted Userdata Transit & Storage or just NUTS!