The Key to Data Security: Choosing the Right Content Labeling Strategy

In part 1 of this series, I covered the importance of file and data level security: how it fills the gaps created by user interactions with data that is otherwise easily protected at-rest and in-transit.

In this entry, we’re going to discuss various strategies that exist for applying that security with Microsoft’s sensitivity labels, the principal component of Microsoft Purview Information Protection. We’ll look at some key considerations for picking the right schema for your data, along with some implications that come with those choices.

Table of Contents

Guiding Principles for Labeling Strategies
Choosing the Right Scope and Strategy
Final Thoughts on Labeling Strategy

Guiding Principles for Labeling Strategies

As a very high-level quick review, Sensitivity Labels are ‘easy buttons’ that allow us to assign encryption, permissions, and watermarks to files, emails, groups, meetings, and other assets in our digital estate. The only people who can access content with a given sensitivity are defined in the permissions of the label itself, and the protections travel with the document, so when someone emails out your quarterly budget numbers by accident, the content will not be exposed—the external recipient won’t be on the permissions list and won’t be able to access the file.

Let Compliance Drive Your Labeling Strategy

The first factor in choosing a labeling strategy is to assess what legal requirements may compel your design. This can comprise both compliance and governance, and I define the difference as being:

Compliance is the laws and regulations your org has to follow.
Governance is the rules you impose on your organization and data.

You may have data compliance requirements that explicitly demand file and document-level security, and/or you may have corporate controls or industry expectations that define similar constraints. Follow those as your North Star.

Look to Legacy Policies for Guidance

The next thing I encourage any organization to do when looking at a labeling strategy is to see if there is a legacy document policy governing paper files. I’ve worked with organizations that have paper policies dating back to the 1970’s and 80’s that were perfectly valid in the digital age, and we were able to reproduce them with little-to-no change in the cloud. Many of these policies can also meet the requirements of the first point, but as I mentioned in Part 1, this is a disruptive technology—aligning that disruption to existing controls can help in getting buy-in from corporate stake-holders. Need some documents labeled “Executive Content” with red 10-point type in the top left-hand corner? Easy. Need a big splashy watermark of “****FOUO****” in blue 100-point type across the page? We can do that, too, and so much more.

Start Simple to Scale Successfully

The next thing I like to stress is that you do not need to boil the ocean with your initial implementation. More security is good, but less complexity is better. The less complexity, the more likely your users will be to engage with the tools, and the more scalable the solution will be. You can always decide to add more complexity later, and provide appropriate training to users as you scale your solution.

We’ve seen ample evidence in our 12+ years of deploying labeling projects (back to the early days of Azure RMS!) that offering a user too many labels results in confusion, resistance, and improperly marked content. This is a well-documented phenomenon called “choice overload”, where people presented with too many choices will often make no choice at all, or make the simplest available choice to allow them to move on with their day.

In practice, this principle generally means minimizing the number of labels that a user can interact with—typically to 5 or fewer. You can have dozens of labels for myriad purposes, but only present users the ones relevant to their work.

Avoid Mirroring Your Organizational Chart in Labels

Unless there is an overwhelming justification to do so, I strongly advise organizations to not recreate their entire organization structure in labels. There may be some odd exceptions, like labeling HR content as ‘HR’, but in general you do not need a label for every department, every team, or even every division. Projects that attempt this breadth of scope rarely succeed in production.

Design Labels with Licensing in Mind

It's also worth noting that licensing can play a role in your design. While a license that includes sensitivity labels (EM+S E3 & E5, M365 E3 & E5, M365 E5 Compliance Step-up, etc.) is required to label data, no license is required to consume that data. So if you have 10,000 front-line employees, 100 back-office folks who create corporate data, and you want to send out an internal policy document, you may only need 100 licenses for this deployment.

Choosing the Right Scope and Strategy

We have our guiding principles outlined: follow the law, cover as little new territory as possible, and less is more (including on the licensing side!). Let’s look at some options for our scope and strategy.

Exploring Labeling Scheme Options

Labeling schemes range from extremely simple to incredibly complex, from just a single label and a single publishing policy to dozens of labels and sub-labels, with multiple publishing policies and auto-labeling rules, protection policies, and deep integrations with Fabric, Entra groups, meetings, and more.

At the simplest end of the spectrum, you can define that content is either labeled or unlabeled. Labeled data can be encrypted, and rights on labeled data can be configured so that only internal members of the organization can access and modify it.

Content produced in core Microsoft 365 apps can be automatically assigned the label, or the choice can be presented to the user: should this file be protected?

In this simple schema, the label is typically only applied to files and emails, and no automatic protections are configured (no detections of sensitive info types like SSN’s, credit card numbers, etc).

PROS: Simple, quick to deploy, easy to train users. Satisfies legal and compliance requirements to identify and encrypt corporate data.

CONS: Not scalable as the business grows. No automatic detection of sensitive content. No granularity or flexibility. No differentiation between personal and corporate data. All data treated equally, irrespective of risk.

While you might be able to meet minimum business and legal requirements with a single label, it will not scale as your business grows. Users will likely be confused as to why there’s an INTERNAL label, but no corresponding EXTERNAL designation. It’s a light-switch that only shows ON, and “off” is simply implied.

Adding a little more complexity can yield more fruit: we’ve seen organizations begin with this basic approach, but add the implied EXTERNAL as an additional label, and sometimes even an optional PERSONAL label. In this scenario, all content intended to be maintained internally is still marked with the INTERNAL label, which enforces encryption and sets permissions, but now content that can be shared outside the organization is marked EXTERNAL.

Users can also mark content PERSONAL in this scenario. This content would generally not be encrypted or restricted, and by marking it PERSONAL, we can filter out file downloads and transfers that match this label so as to not trigger any insider risk thresholds.

Users interacting with a choice of 2 or 3 labels will see the label names, their associated colors, a lock icon on those that add encryption, and an optional tool-tip defined for each label, making it much more visually engaging and easy to navigate the process of choosing which is right for any given item.

Using TLP as a Model for Sensitivity Labels

A more scalable approach is the publicly available Traffic Light Protocol (TLP 2.0). TLP was created by UK and US government agencies explicitly to simplify the classification of sensitive data, and was officially adopted by CISA in 2022. TLP aligns data to the intended scope of its audience. I extrapolate this to the reputational and financial risk that the data would expose your organization to if it were to be released publicly.

TLP adds the benefit of aligning that risk to pre-defined colors, and users LOVE colors. When Microsoft first added colors to labels, I got really excited because it made it much easier to gain traction with users. TLP starts with colors, and those colors were explicitly chosen to help accommodate visually impaired users.

TLP also has the benefit of being a public protocol. The documentation isn’t up to any one explicit vendor to maintain, and best practices are easy to navigate. CISA’s explainer includes many examples of types of data, sharing, and the appropriate protections for any given transaction.

As a public protocol, its stakeholders are able to suggest, promote, and publish updates. The protocol currently sits at 2.0, which was adopted by CISA in 2022. It will continue to evolve as more sharing scenarios are identified, and best practices will be easy to track because they won’t be buried in product documentation.

TLP defines 5 main labels, which I list in reverse order to align with prioritization in Purview:

TLP:CLEAR – no restrictions on sharing. This is the classic “Public” or “external” label.
TLP:GREEN – limited disclosure within “the community”. I think of this is as general content that might need to be shared with customers and vendors. Typically not encrypted or restricted through permissions.
TLP:AMBER – limited “need-to-know” disclosure. This is internal content where risk begins for exposure, and where I typically begin to assign encryption and permissions.
TLP:AMBER+STRICT – content must not be shared outside the organization. Risk rises considerably if this confidential data is exposed. I use this to replace departmental labels
TLP:RED – For eyes and ears of the recipient only. This data represents irreparable financial or reputational risk for the company and must be tightly constrained with encryption and tightly-constrained permissions and access rights

When I leverage this protocol in a Microsoft Purview Information Protection deployment, I do not make all of these labels available to all users—there’s little chance that a front-line worker needs to label content TLP:RED. To facilitate this, I deploy label policies to ensure the right users have the right labels for their access. While this adds administrative complexity, it keeps things simple for users. Those who need the higher, more restrictive labels can reasonably be expected to go through more training to use those labels appropriately.

In my standard demo tenant, I have 5 label policies that present different labels and default actions to different users. The overwhelming majority of users are covered by one single label policy that presents 3 managed TLP labels and 1 personal label. The policy assigns TLP Green to all content by default, and while users are authorized to lower or remove the label, they must provide a justification to do so, and that justification is captured in a permanent log.

A user in HR, or someone sharing executive content, will be targeted with the additional appropriate label policy, and defaults may be similarly more restrictive.

PROS: Publicly supported and maintained. Used by public and private enterprise and government agencies—should meet every compliance standard available. Scalable. Aligned to risk and sharing scenarios. Accessibility built into the design.

CONS: More complex. Requires more training. Change will require re-training. The standard may offer too many options for a small business.

I’m such a big fan of the TLP framework that I intend to dedicate an entire future blog post to the specifics of my configuration, and tweaks that I would recommend to tailor the design to any organization’s environment. But we’re not out of design schemas yet!

So far we’ve started at the most simple and built toward granular, flexible public standards, but folks ask all the time: what does Microsoft do/recommend? Until 2024, they had used a framework very similar to TLP, but with different nomenclature and fewer defined sharing scenarios: they had 5 labels ranging from Public to General to Highly Confidential, and I initially built my demo environment to mirror that configuration:

Applying Microsoft's Default Labeling Framework

But in 2024, Microsoft went through a global overhaul and announced a framework that would be made available to E5 customers by default if they have not already chosen a labeling scheme. To activate this new default sensitivity schema, E5 customers can use the Recommendations page within Data Security Posture Management for AI in the Purview portal.

The new structure defines 12 labels, 7 of which are sub-labels, only 2 of which include restricted permissions and encryption by default. All labels are published to all users in a single publishing policy, but that can be easily changed by adding additional policies.

The sharing scenarios in this schema largely mirror the TLP scenarios, but they do so by presenting more choices to the user, which will require additional training for all users in the default configuration. Additionally, automatic detection and recommended labeling of sensitive info types is included at the Confidential sub-labels. And once these have been created, you can tweak them as much as you’d like. Add encryption, sensitive info-type detection, restrict access, change watermarks, whatever your heart desires.

I think it’s great that Microsoft has gone to the trouble to build this out, and I think the labels themselves likely cover about as many scenarios as you’d be likely to support in production, but I would caution that half of the job of deploying a successful labeling strategy is in the presentation and usage side, and that’s entirely controlled by label policies. Before rolling out a schema with 12 labels, I would want to build out a robust set of policies to ensure no single user sees more than 5, maybe 6 labels at the most, and that those labels are aligned to the work they produce—not necessarily to the content they consume.

PROS: Microsoft built it so you don’t have to! Pulls on all the features available within Sensitivity Labels in Purview.

CONS: Paywalled behind E5 licensing. Default label policy presents too many options to users. Will require tweaking to label behaviors for your business needs.

Final Thoughts on Labeling Strategy

These are just a few of the possible labeling scenarios, but as I said at the start of this treatise, less is more when introducing a disruptor to your workforce. I do not recommend a department-by-department hyper-compartmentalized design, or one that triggers exotic actions through arcane workload interactions (which I will cover in rather explicit detail in the next entry in this series!).

You want users to feel empowered and successful in using the tools, so a good design will deliver as few labels to the right people as possible to provide the right security for your data, no matter where it travels.

Part 3 of this series will look at some of the deeper options enabled through labels and policies to govern application interactions, like controlling at what point a piece of content is ready for consumption by AI solutions, how to restrict sensitive data from being accessed on a non-compliant device, and how to track label usage and traction within your organization.

Protect your organization's sensitive data from malicious attacks, data breaches, and leaks by making sure only the right people can access your information. Contact us today and we can help you secure your data no matter where it is with Microsoft Purview Information Protection.