IoD™: The Internet of Data™

By Yoon Auh,  Founder of NUTS Technologies, August 27th, 2022.

The Internet of Data™ (IoD™) enables direct access to data across any network. Accessing data directly is possible when data has at least two characteristics: identification and privacy.

Identification of Data

Identification of objects is conventionally implemented within narrow scopes such as VIN, SSN, cell number, MEID, MAC, IPv4, IPv6, etc. For IoD, any data is eligible for permanent identification using a Practically Unique ID (PUID) by any capable device.

Identification of data is as narrow or broad as required and easily implemented with an identifier: the larger the identifier, the bigger the possible universe of data. A large enough identifier can identify the data across space and time. The size of the data identifier can start small and grow as needed; as implemented, this identifier is defined as the NutID. NutIDs are unstructured identifiers to maximize anonymity; this has the implied consequence of making brute force guesses very expensive for an identifier to a piece of data.

Privacy of Data

Privacy of data is conventionally controlled through access gateways and obfuscations (ciphering), each of which present numerous technical challenges when done at scale. To secure data for IoD, it is convenient to have a compact, fine grained, access control system that is cryptologically implemented as data; we define data secured by a compact, independent, portable access control system as Zero Trust Data (ZTD). Other equivalent terms for ZTD are “Secure Container” and “Security at the Data Layer”.

A Secure Container implemented as an encapsulation can be used to envelope the payload to accommodate a wide variety of object and file formats. Within the secure container, any metadata is also protected (immutable).

We define a simple nut container as a secure container with an immutable NutID as metadata and the data to be protected as the payload where the access controls are expressed as sets of cryptographic keys configured into progressively revealing data structures.

Thus, data in a nut can be accessed directly by any key holder(s) on the Internet of Data.

Independent Data

Data in a nut is independent data. An IoD ecosystem can provide transport and locate services for independent data across any network. The nut can be addressed directly by its NutID, further, the nut can address other nuts by their respective NutIDs; therefore, a NutID is a permanent reference to a nut in contrast to impermanent URL paths. In essence, the secure container protects and identifies its payload. A nut with a payload of NutIDs is an example of data directly addressing other data.

It is nontrivial to store and forward URLs by other web servers, whereas a nut can be handled by any relay mechanism due to its intrinsic security and portability. URIs and URLs are ever changing and may not be the same the next time you visit them. Documents not visible to web searches are very difficult to track down. Document authentication is even more difficult. In contrast, a NutID of a document will never change and references the actual document which will self-authenticate upon presentation of a valid key.

IoD defines a new data plane where users are allowed to access independent data directly rather than through reference monitors requiring centralized registrations and/or administrations.

Independent data in a nut protects its payload and only the key holder(s) can easily access it; therefore, a nut can be safely stored anywhere on the IoD. Expressed as a generic file format such as JSON-base64, a nut is independent of most Operating Systems, File Systems and Cloud Systems. Getting a copy of a random nut is not enough to access its contents, one must present a valid credential in the form of cryptographic key. Since the access controls for the nut are embedded within its container material as cryptographic data elements, the payload is consistently protected in any environment independent of reference monitors.

Key Management

A secure system requires the safekeeping of secrets such as cryptographic keys in a systematic way on behalf of the user; a nontrivial problem. In the evolving world of data security,  the ownership of cryptographic keys is an important factor in establishing the ownership of ciphered data. Both concerns can be addressed effectively and simply within a IoD implementation using nut containers.

IoD as implemented by the NUTS ecosystem provides each user installation with a key management system (KMS) built on nut containers. Since a nut identifies and protects a payload of any storable digital data, a cryptographic key or digital credential stored as payload in a nut is private and an identified in a universal way.

A nut container can be configured with an arbitrary number of keyholes each of which is identified by a NutID (or within this context, a KeyID). This presents the raw building blocks to construct a secure, robust and modular KMS using nuts as key carriers: a simple, elegant, logical and massively scalable design.

Conclusion

Our research shows that when properly engineered, IoD can provide an individual user with features often associated with the most sophisticated IT organizations such as hybrid cloud data management, data resiliency, ransomware mitigation, insider threat mitigation, automatic backup, hot backup, secure data sharing, automated synchronization, cipher agnostic cryptography, on device key management, and key ownership. All of these features in one ecosystem in an integrated fashion expressed as protected, identifiable data storage units.

Conventional approaches may provide most of the features listed above but may require many solutions to be configured simultaneously by knowledgeable people with integration as an secondary concern. Insider threat mitigation is only attempted by organizations with deep pockets and the need whereas the NUTS ecosystem delivers Insider Threat mitigation within every nut container in an independent way; the epitome of Zero Trust Data.

The Internet of Data establishes a new abstraction layer for the way we can interact with data and the way data can interact with other data (Fig. 1). IoD puts forth an environment where Operating Systems, File Systems and Networks are commoditized and your Data is prioritized.

In all honesty, does any user authenticate into a system for the pure pleasure of logging into a system? No, because, in the end, it’s all about the Data.

To rest, or not to rest, that is the question

In computer systems, data that is stored onto static storage such as a flash drive or hard drive is referred to as “data-at-rest”. In contrast, data that is sent from point A to point B such as chat messages or network traffic is referred to as “data-in-transit”. The mindsets and toolsets that address these two states of data are very different especially when it comes to securing the data.

Data-at-rest techniques usually involve some form of encrypting the data prior to saving it onto the storage mechanism. This is usually called file encryption. Data-in-transit techniques usually rely on utilizing secure protocols which are methods of creating a secure pathway between the two endpoints thus anything sent between the two points are deemed secure. Nowadays, there are hybrids of these methodologies called End-To-End Encryption (E2EE) where messages may be encrypted prior to sending between the two points via a secure communication session. Today’s E2EE solutions offer some semblance of security but are often non-standard, hard to integrate, hard to manage and/or fall short of securing both states of data in a seamless, cohesive fashion.

In Data Centric Design (DCD), there is only one state of data: transmission. Data is generated, transmitted, and then consumed. The NUTS paradigm augments that DCD sequence to: data is generated, secured, transmitted, authenticated, and then consumed.

Storing data is an act of transmitting the data to the future.

For example, to perform data-in-transit securely usually requires two participants, Alice and Bob, to form a communications channel between them using a secure protocol and move data through the channel at time TNOW. When this is done properly, Alice can send Bob a message in near real-time securely. If we replace Bob with Alice and change the time to TLATER, this can become data-at-rest: Alice is securely sending a message to her future self.

Why does that matter? It simplifies what is traditionally considered two separate methods of securing data down to one unifying view. Thus, we are left to solve only the single problem of securing data for transmission. In this definition, the distinctions between data storage and transmission are blurred to mean the same thing, it’s just a matter of timing. In the TNOW case, the consumption of the data is done nearly instantaneously by Bob, but in the TLATER case, the consumption of the data is done at a later time by Alice. The example can be expanded to allow Bob or anyone else to consume the transmitted data at a later time.

Time = TNOW To Alice To Bob
From Alice Data-in-transit Data-in-transit
From Bob Data-in-transit Data-in-transit

 

Time = TLATER To Alice To Bob
From Alice Data-at-rest Data-at-rest
From Bob Data-at-rest Data-at-rest

 

A functional expression:

transmit(m, s, ts, r, tr)

where

m         message

s          sender

ts          send time

r          receiver

tr          receive time

therefore

send_message(m, Alice, Bob) ≈ transmit(m, Alice, t0, Bob, t0)

save_to_disk(filename, m) ≈ transmit(m, Alice, t0, filename, t0)

read_from_file(filename)  transmit(m, filename, t1, Alice, t1)

There are many ways to secure messages before transmission but very few offer a secure container that can be used for both states of data in a consistent, simplified and independent manner. What I’ve deduced over the years is that many problems that we have with our digital systems can be traced back to inadequately designed data containers. NUTS provides the technology to solve these inadequacies. In a later post, we’ll examine concepts called Strong and Weak Data Models.

This illustrates a core technique that Data Centric Design applies to problems big and small regardless of technical domains: root cause analysis. Finding the root cause of some problems may require you to re-frame the questions with a new perspective so if you solve the root cause then many of the symptoms never appear. The hard part is collecting a set of symptoms that appear unrelated and then looking for possible relationships.

Last week, DC CyberWeek presented very informative events, and many opportunities to network with some very smart people in the cybersecurity industry. The CyberScoop folks did an incredible job of organizing the whole affair. My deepest thanks to Julia Avery-Shapiro from CyberScoop for accepting our event idea, and for guiding us through hosting our first ever cyber security event. I cannot forget our attorney Jim Halpert for graciously offering the use of their offices at DLA Piper, and for the coordination wizardry of Susan Owens. I hope the folks who attended were rewarded with new information and ideas from our presentation.

What is Data Centric Design?

This is the NUTS version of Data Centric Design (DCD):

A data centric model of computer software design is where user data may be prioritized over applications. A data centric software design may allow for data to be secured at the point of storage. The containerization of data may be an embodiment of a data centric design.

After searching for years, I did not find a definition of DCD I felt was appropriate for the term so I created one.  I will introduce the thought process and the approach I took to get here. This segment will concentrate on the unique identity aspect of data.

DNA is a marvel of Nature, and Nature is a master of data management.  Nature stores and manipulates its key data in organic form.  I decided to examine it from a digital design perspective and see what lessons could be learned from this complex data structure that has been organically developed over billions of years. Nature’s development cycle is a bit more time consuming than the edit-compile-exec cycle, but it still does things that we can only dream about. Every day, geneticists and molecular biologists are gaining more knowledge and techniques for manipulating DNA. Many Nobel prizes have been awarded, animals have been cloned, the human genome mapped, and we have experimented with gene therapy. But yet, we are just beginning to understand the monumental task of figuring out Nature’s higher-level design patterns.

I’m not a geneticist or molecular biologist. I’m not trying to duplicate DNA in bits. In the end, bits are a simple form of data storage implemented in electromagnetic or optical devices and media.  I wanted to ask some questions based on our understanding of DNA such as:

  • What are the characteristics of DNA that makes it useful to Nature? To us?
  • Which of those characteristics are useful for digital data?
  • How do I express it in digital form?
  • How can it affect digital data?

At the outset, these seemed like simple, straightforward questions, and five years later, I share my findings with you in the form of NUTS (eNcrypted Userdata Transit & Storage).

The first characteristic that popped out from DNA is identity. We have all seen one too many episodes of NCIS, or its crime drama equivalent, to know that DNA is constantly used to identify people. For the most part, DNA can be considered a unique identifier for a person. There may be exceptions, but let’s not ruin my party here. As I said, I’m not trying to replicate DNA in bits.

In the digital world, a unique identity, or identifier, is usually referred to as a Universally Unique identifier (UUID), or a Globally Unique Identifier (GUID).  These IDs are usually a long digital number represented by up to 128 bits in length.  There are suggested formats and methods for coming up with this unique number, and the probability of any two UUID’s being the same is small, but it is not zero. This is referred to as an ID collision. Note that the terms UUID and GUID have the words Universal and Global, yet everyone in the know recognizes that it is not so. A tad misleading, wouldn’t you agree? Look at what happened with the Y2K circus. Yes, I get it, different times and different constraints, but keep in mind that our current constraints are really limited by our imagination. We live in an age where a top of the line smart phone is now offered with 256GB of storage.

In NUTS, I have decided to call it a NutID, and my suggested starting size is 512 bits (there are many reasons why I like the acronym NUTS).  The NutID is a rather large identifier by any measure, but it matches its large ambitions. This is big, HUUGE!! Trump’s vernacular is catchy to say the least.

The source information to create a NutID begins with a combination of environmental factors and other randomized components to make it as unique as possible. It will then apply a 512 bit SHA2 hash on this source data to generate the NutID.  A SHA2 hash is a function that can derive a fixed length representation of variable length data, therefore a hash can be used for mapping tables and integrity purposes of the source data.  Another reason for a hash rather than some standard format is to provide anonymity. If there is a well-known format, there will be implied information embedded in the identifier. A SHA2 512 hash makes that task a tad more difficult. Computer scientists consider a hash function like SHA2 512 to be essentially irreversible.

Such an identifier is not guaranteed to be unique, it’s the nature of hashes and the source entropy that we provide for it to chew on. Therefore, I gave it a more modest technical term as in Practically Unique Identifier or PUID. If there is one thing that you learn here is that if anyone ever lets on that a hash is guaranteed to be unique, walk out the door. Periodically, I will introduce new terms which I plucked out of thin air to describe various aspects of Data Centric Design.

What do I want to do with such a large number as a NutID?

Like “Harold and the Purple Crayon”, I want to stamp everything with it! I want to stamp every piece of storable data I (you) create with a NutID. Why?

  • Why not? It’s my (your) computer and I (you) can do what I (you) want with it.
  • Who said only institutions and companies are allowed to create serial numbers?
  • My data is more important to me than any other data.
  • If it’s good enough for Nature, it’s good enough for me.
  • Identifying data at the point of creation (you and your device) with a NutID allows it to be referenced forever.
  • Computers don’t process pathnames very well.

A NutID is meant to be created in a massively distributed and completely independent way by you and your computer without ever talking to anyone else. The NutID lays the groundwork for how a Nut will behave in its ecosystem. It will not replace existing conventions, it will coexist and enhance it. It has a few features that we will go over in later posts that you will never see anywhere else. We will continue this investigation of learning from DNA and Nature’s mastery of data management.

To every little piece of data out there,
you matter,
you have a name,
it’s your NutID.
Go forth unafraid.

NUTS Technologies will be hosting an event at Cyberweek in Washington D.C. on Tuesday, October 17 at 2pm. This will be a small discussion group with limited seating.

 

The God Key Problem: Digitizing the Dynamic Nature of Trust

By Yoon Auh,  Founder of NUTS Technologies, Inc., the world leader in Data Centric Designs of secure data systems.

Snowden. We only have to mention this single name to conjure up a plethora of views on the matter that plastered the headlines in 2013. Regardless of your personal views on the matter between the US Government and its ex-consultant, there are two main issues that this incident highlights and should be addressed objectively: 1) the dynamic nature of trust  and 2) the God Key problem.

Trust is serious business. We depend on trust in each other, in social systems, in courts, in childcare, in policing, and many other societal foundations in order to live a relatively secure and care free lives. Our trust in the sanctity of contracts, law and its enforcement mechanisms and the continuity of trust enables the average person to earn a living and plan their futures decades in advance. But trust changes over time. It is dynamic. This is why there are laws and the enforcement of those laws in order to keep everyone in a position of trust in check.

The dynamic nature of trust is something everyone learns and understands over time when growing up. It changes a lot especially when it comes to interpersonal relationships. If this were not the case, why would our national divorce rate be close to 50%? This happens more frequently in our professional lives and is generally considered a healthy thing. The senior engineer you entrusted your next big project to is jumping ship to a competitor to get more responsibility and better pay. Your top portfolio manager is leaving after establishing a track record at your firm to become a partner at a fast growing hedge fund. Even in the brutal world of drug trafficking, the betrayal of trust is dealt with terminal violence.

The nature of trust is mutual, unilateral and exquisitely temperamental. Both parties rely on the passage of future events to determine their level of trust in one another; therefore trust is mutual between the two parties. Trust is built over time by many trustful deeds and events between the two entities. The first instance of an untrustworthy event may nullify the entire history of trust in the relationship. The intended trustful relationship is unilateral because each party may independently violate the trust in the relationship. Whether the violating party decides to alert the other party to this change in the relationship results in the complicated saga of betrayals.

The trust in a relationship is further complicated by adding in the self-interest priorities of each participant. It’s a quagmire of chess-like strategies with imperfect information.

The God Key problem is an age old computer science issue and is a principal culprit of most cybersecurity scandals and hacks. Most computers designed and manufactured today have an administrative mode of operation which gives the user unfettered access to everything within the computer’s domain or physical hardware. There are some exceptions to this but this is the predominant model. Most companies that rely on computer processing for their business operations will have a group of administrators who have the God Key to all the business systems in the company. This is a necessary evil since computer systems do not administer themselves and they are in constant need of maintenance in both software and hardware. The God Key is not just one all access key, but it’s the combination of access credentials that are given to administrators to allow them unfettered access to all the systems within their domain.

In most corporate settings I’ve been in, it’s been a naturally occurring event to see smart young technically adept individuals being given ready access to the God Key of corporate systems and networks. Much of advanced technology is tamed by young people who are fresh out of school with the latest knowledge and techniques. Sometimes the best ones are self taught and nerdy renegades answering to the siren call of large paychecks for performing tasks they would do for free on their own time just for the hell of it. We sometimes call these people hackers both white hat and black hat.

The most frustrating thing about this situation is the quandary that managers are facing when pressed for time and talent in a crisis: who do you give the God Keys to so the job can get done in a hurry? You give it to the most skilled operator. In the world of IT, the most skilled does not correlate to the most experienced, the most knowledgeable, the most seasoned, nor the most trustworthy. The operator who is given the God Key may have been deemed trustworthy at one point but that trust may have changed over time. How is an institution supposed to measure that? How do they keep track of that? What if the operator hides the fact to his employer that he cannot be trusted anymore?

Snowden. This is exactly the scenario that played out between the NSA and Snowden. The dynamic nature of trust sucks in a digital world, even to the NSA who is in the business of trust. The point is that this problem exists in every institutional computer system. The NSA has plenty of company. Everyone deals with these problems in one way or another but most conventional ways are inadequate and does not address the issues related to the dynamic nature of trust.

To solve this thorny problem, you need to be able to separate the ability to administer the system from the ability to read everything within its domain. Most complex systems are designed to be centrally controlled. They may have distributed access, distributed storage or distributed processing, but administration is usually tightly controlled in a central manner. What company doesn’t like control? In fact, most institutions in the free world are hierarchically structured like little dictatorships. But these structures work and people adapt to them naturally so it is the predominant organizing mode of mass productivity.

All the sensitive data of the company needs to be secured to block out the curious gaze of the system administrators. We have many systems to do such containments but most of them require central management and some version of authorization token based access control: you see the circular logic problem here, central management requires an administrator with the God Key to that particular system which raises the same issues.

The solution lies in data containers that can act as its own reference monitors working with a truly independent and distributed key management system.

This problem took me over 5 years to solve. Along the way, the solution set that was crafted solved many other nagging issues. The approach that I came up with is called Data Centric Design. It is unrelated to any definition of Data Centric Design on the web today. This is a new technology. It is a radical technology. It is an adaptive technology.

It forges Applied Cryptography in new ways to construct a framework where Data can grow up and do some things for itself. That’s right. Data is personified in that last sentence. It’s about time that Data got smarter and learn a few tricks of its own rather than depending on applications to wipe its butt every time because we are learning that the butt wipers are not all that trustworthy nor competent in this computerized world of ours.

Welcome to the world of eNcrypted Userdata Transit & Storage or just NUTS!