Your company has data. Whether that data is email, chats, files, databases, or more complex structures, you and your users produce data, and at least some of that data must be protected against accidental or malicious loss or disclosure.
A critical decision in choosing any platform or solution, be it in the cloud, on your local device, or installed in a data center, is what protections are available to safeguard your data. However, not all data are created equally, and not all companies have the same needs regarding data protection. I’ve spent my career watching organizations spend countless hours (and dollars) investing in solutions that offer protections but stall in implementation.
Those stalls result in real and significant risk to organizational data, which is becoming increasingly important to manage as companies seek to reason over data with generative AI solutions.
Table of Contents
So let’s talk about data protection. What does it mean, where does it exist, what gaps need to be covered, and how do we deliver that coverage in a way that doesn’t disrupt work?
Data protection should be composed of, at a minimum:
Ensures that data at-rest is protected against loss and theft. It is redundancy in the event a disk fails or a rack is destroyed or even a data-center goes offline. It is encryption of the disk or device that protects you if the device itself is stolen. It can include backup and restore capabilities native to the system or layered in by external solutions. Storage system protections exist at both the server/service side and the client side.
Ensures that data is not exposed when traveling between the server/service and the client endpoint, whether that’s a web browser or an application.
This layers on top of storage system protections, typically at the server/service side, to ensure the correct permissions and/or rights are held by the right folks to have the right access to data. The permissions are assigned to a folder or container, and inherited by the contents of the container. When data is removed from the file structure, it typically loses the protections of the container.
This adds additional controls to the content itself, can be intrinsically tied to the data itself, and can travel with the data as it moves.
In a traditional on-premises datacenter environment, your data would be stored in a server. For files, that would be a file server. Servers offer robust options for data storage and protection, from redundant hot-swappable hardware and data backups to ensure data availability to disk-level encryption and file-system security (good ol’ NTFS) to ensure data security at-rest to TLS and SSL capabilities to ensure data security in transit. Server management also affords admins direct visibility into how that security is structured, and Active Directory groups ensure that structure works at-scale.
When a user puts a file into a drive mapped from a Windows server, they can generally trust that all of those things are working for them to ensure only the right people have access to the document, and that the server itself is relatively well-protected from malicious actors. But, as I’ve discussed with hundreds of clients over the years, the protections offered are inherited—not assigned. Those protections do not necessarily convey if the document is moved or copied. Simply moving the file to a different folder—sometimes even within the same mapped drive—can result in a permissions change.
It is exceedingly rare that I encounter an on-premises file server deployment that meets all 4 data protection requirements. Most cannot satisfy #4 at all, and many have gaps across the entire range.
One of the biggest challenges in this data-protection schema is inconsistent configuration and sprawl, where one set of servers—or even shares or NTFS permission (or groups! GPO’s! you name it) is maintained by multiple people over years of development. One group of servers with a policy to enforce drive encryption was well-managed and placed in the proper Servers OU, while a hastily-deployed solution saw servers dropped into the default Computers container, or added to the DefaultFirstSite in a complex AD, or just didn’t process policies correctly. Or a new file server is deployed and the admin leaves NTFS defaults and only manages Share permissions.
We’ve also discussed the challenges of default communications protocols in on-premises environments, but NTLM deserves another shout-out as an obsolete technology that is still used heavily in on-premises environments, particularly in orgs that call servers by IP addresses, have DNS aliases for servers, or run legacy operating systems. NTLM should not be considered a robust mechanism for protecting communications between systems.
Finally, on-prem admins could assign and manage Bitlocker or equivalent configurations to end-user devices to encrypt users’ hard drives. This is very powerful for ensuring a lost laptop doesn’t expose your company’s secrets, but the back-end management can be complex, so it’s not often deployed.
And then, no matter how good the server configurations, if a user attaches a server-protected document to an email and sends it to someone outside the organization, it’s no longer protected, and it is no longer your file. This is going to be a recurring theme.
So let’s leave the on-premises world and look at a common cloud deployment scenario.
This is where we really see a lot of different capabilities across cloud vendors, and lots of pricing options based on just how robust you want your data protections to be. Microsoft generally makes it easy to meet the first 3 goals of data protection, with all cloud storage encrypted at rest, all transfers requiring TLS and SSL by default, and file-structure security in Teams and SharePoint aligning document libraries to the Microsoft 365 group with which they were created.
Intune makes it easy to manage Bitlocker from the cloud. In fact it’s easier to manage in Intune than it is on-premises—so much so that any organization using Intune should strongly consider a device encryption policy to ensure end-to-end data security aligning to items 1 – 3, above.
The fourth goal, however: file and data level security, still often goes unaddressed. We can have data encryption and multi-geo and replication topologies that would make an on-prem admin weep, but the second that document goes into an email, we run into the same challenges as before.
And once we layer in generative AI, not only is the document gone, but the data inside it is at risk before it even leaves the back-end cloud storage.
To remedy this, we must address file and data level security. While this is typically a user-disruptive technology, it is quite literally the only way we can protect corporate data once it leaves our systems. Implemented correctly, it’s a mild disruption and an opportunity to involve the workforce in understanding and managing risk. And it is a key foundational requirement to a safe and secure deployment of generative AI solutions.
File and data security in Microsoft 365 can be user-driven or admin/system-driven. I’m going to start with user-driven with any license that grants app—not web—access. This can be achieved by a user by selecting the “Protect Document” button within the “File->Info” menu in the core Microsoft 365 apps.
Further, and extending the reach of these capabilities to web-only users, sharing of a document can be managed by selecting “Share” and “Manage Access” and selecting users and groups within the organization who should have access to the document.
Access can then be shared to named users or groups at differing levels from edit to view to restricting download permissions.
Users can do these things, but they cannot be asked to tweak all the dials and buttons every time they create a file. We need to give them ‘easy buttons’ to manage these settings in repeatable, scalable ways. For that, we turn to Microsoft Purview Information Protection’s Sensitivity Labels.
Sensitivity Labels give users a ‘one-stop shopping’ single click experience to manage document and email protections that can include encryption, watermarking, expiration, rights management (sharing, printing, emailing, etc), and more. Sensitivity labels offer revocation and tracking of document access even when the document is no longer in your data estate, so you don’t have to worry about a user looking through downloaded copies of your corporate data after resigning.
And critical to this generative AI era, Sensitivity Labels are the only prescriptive mechanism to ensure users don’t expose sensitive corporate data to AI.
Admins create the labels, publish the labels to users, and the users simply select the appropriate label for the document. And Microsoft has integrated this one-stop-shop into the default “save” experience for new data AND given us the ability to pre-define default sensitivity for documents to minimize the user disruption.
In a well-defined and orchestrated deployment, most users will not even realize that new protections are in place, and only those users who produce the most sensitive content will require minimal training to navigate the options. In this example, an organization has defined and published 5 color-coded labels that are integrated directly into the ‘save’ experience. The user can see that a document is encrypted by the presence of the lock icon, and each label provides a simple tooltip (not shown) explaining the purpose. Users can change the label and optionally be prompted to explain why the previous label was incorrect. And when we place this document into an email, the email itself can be configured to inherit the same label, so you don’t risk exposing even an explanation of what the content is.
In the image above, my user can see only the labels that are available to her, and only one, “Internal” offers encryption. I will cover the purposes of these labels in a later blog post, but there are additional labels in the organization that she cannot use because they are inappropriate to her role in the company. By default her content is created, saved, and shared at “General” and she doesn’t have to make any choices at all to work with and create data.
The other labels in the organization are what we might consider “above her pay-grade”, and the content protected with those labels will not be available to her or any Copilot prompts she can ask.
If she receives a file or message marked with a higher-level Sensitivity Label than what she can access, she’ll be denied access. If she loses access to a given label at a future date, she will also lose access to the content protected by that label. If she shares an “Internal” document to an external vendor, the encryption settings and permissions of the “Internal” label will deny access for the external user. If she places the file in DropBox or Egnyte or any file storage, the label and its protections will remain in place.
Once we have users protecting new content, we can then leverage those same Sensitivity Labels to discover and protect content that’s already in our secure storage solutions through auto-labeling policies, and extend coverage to on-premises file servers, Microsoft Fabric, other Azure Storage solutions.
This set of capabilities enables us to meet the 4 core requirements of data protection, from storage systems to file-level persistent revocable security that travels with our data.
Join me for part 2, where I will explore different strategies for managing Sensitivity Labels, including layering in data lifecycle management and how these solutions play a critical role in governance and compliance scenarios.