Episode 21 — Identify Information Types Processed, Stored, and Transmitted With Confidence

When people first hear the phrase identify information types, it can sound like paperwork that exists only to satisfy an auditor, but it is actually one of the most practical skills you can develop in governance, risk, and compliance work. The whole reason a system has security controls is because the system touches information, and different kinds of information need different kinds of protection. If you misidentify what information is involved, everything downstream becomes shaky, from security objectives to control selection to assessment results. In this lesson, we build the habit of looking at a system and confidently stating what information it processes, what it stores, and what it transmits, without guessing and without getting lost in jargon. The goal is to leave you with a method you can repeat, even when the system is messy, the documentation is incomplete, and different stakeholders describe the same data in different ways.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

A helpful starting point is to separate the idea of an information type from the idea of a file format, a database table, or a specific record. An information type is a category of information that has a similar meaning and similar protection needs, like student records, employee payroll details, customer contact information, or internal financial forecasts. You can think of it as answering the question: what is this information about, and why would someone care if it was exposed, altered, or unavailable. Beginners often jump straight to the technology and say things like SQL data or emails, but those are containers and transport methods, not the information itself. The same information type might appear in multiple places and formats, like a person’s name appearing in a customer database, a help desk ticket, an email thread, and an exported report. Identifying information types means you are naming what the organization is handling, not just where it happens to live today.

Next, the phrase processed, stored, and transmitted is not a fancy way of saying used, saved, and sent, even though that is close to what it means. Processed is any time the system actively uses the information to do work, such as validating it, calculating with it, making decisions based on it, or displaying it to a user. Stored is any time the information sits somewhere the system can retrieve later, including databases, file shares, backups, logs, caches, and even temporary storage that lasts longer than a moment. Transmitted is any time the information moves from one place to another, whether that movement is across a network, between system components, to a third party, or to a user’s device. The reason this triad matters is that each state can create different risks, and different controls can apply at different points. A system might process sensitive information but not store it, or store it but never transmit it externally, and those details change what protection is reasonable.

To do this confidently, you need to develop a habit of defining the system boundary before you hunt for information types. The boundary is simply what you are calling the system for the purpose of this work, and it includes components, people, and connections that are in scope. If the boundary is too small, you might miss crucial information flows, like exports to a reporting tool or synchronization to a mobile app. If the boundary is too large, you can drown in unrelated data and end up writing a vague description that helps nobody. A practical way to think about the boundary is to ask what the system is responsible for delivering as a service or capability, and what components are needed to deliver that service. Once you can say what is inside and outside the boundary, you can start mapping where information enters, where it rests, where it moves, and where it exits.

One common misconception is that you can identify information types by reading the system name, the project charter, or the vendor brochure, and then calling it done. Those documents can help, but they often describe the business purpose in broad terms and may not mention the gritty details like audit logs, diagnostic dumps, or attachments that users upload. Another misconception is that information types are only the sensitive things, like Social Security numbers or medical records, and everything else is just normal data. In reality, many information types that look harmless by themselves become meaningful when combined, like a person’s name plus their email plus their schedule plus their location. Also, operational information types matter, such as configuration data, system credentials, access control lists, and security event records, because compromising those can undermine the entire environment. Confidence comes from being systematic, not from assuming you already know what the system handles.

A simple, repeatable approach is to start from the major functions the system performs and ask what information each function needs in order to work. If a system onboards users, it will likely process identity and contact information and it might store account attributes. If it performs payments, it might process billing data and store transaction histories. If it supports customer service, it might transmit conversation records, attachments, and status updates to other systems. Thinking in functions helps because systems are built to do jobs, and jobs require information. It also helps you discover information types that do not show up in database schema names, because the function might involve human behavior, like uploading documents or leaving comments. You are essentially tracing the story of the system and asking what information appears in each chapter.

Once you have likely information types, you verify them by looking for real evidence rather than relying on memory or assumptions. Evidence can include data dictionaries, interface control documents, report templates, sample screenshots, ticket categories, or even the names of forms and fields in the user interface. It can also include integration specs, because data often leaves a system through application programming interfaces, file exports, email notifications, or message queues. Another strong source of evidence is a privacy notice or a data retention policy, because those often list categories of personal data and how it is used. You do not need every artifact to be perfect, but you do want at least one solid reference per major information type so that your identification is defensible. Confidence is the feeling you get when you can point to something concrete and say, here is why I believe this information type exists in this system.

It also helps to separate business information types from technical and security-related information types, because both are real and both can drive control needs. Business information types are the ones tied to the mission of the organization, like student grades, patient appointments, customer orders, or research results. Technical information types include configuration settings, system architecture documentation, and operational runbooks that could be valuable to an attacker or critical to recovery. Security information types include authentication data, authorization rules, cryptographic keys, and security logs, which can be extremely sensitive even if they do not look like customer data. Beginners sometimes focus only on the business side and forget that a compromise of administrator credentials or audit records can be just as damaging. By deliberately asking what technical and security data the system handles, you reduce the chance of blind spots that show up later during assessment.

Another important skill is understanding that the same information type can exist in different lifecycle states, and each state can change how you treat it. For example, a document that includes a customer’s identity information might be uploaded, scanned, indexed, stored for retrieval, and then archived for retention. During scanning and indexing, the system processes the document and may create extracted text that is stored separately. During retrieval, the document is transmitted to a user, potentially across networks you do not directly control. During archiving, it may be stored in a different environment with different access patterns. If you only say the system stores customer documents, you miss the fact that it also processes them and transmits them, and those details can affect confidentiality, integrity, and availability concerns. Thinking in lifecycle terms helps you spot where additional copies or derivatives of data may exist.

Be careful with the tendency to describe information types at the wrong level of granularity, because that can create confusion later. If you are too broad and you say the system handles personal data, you have not actually helped anyone decide what protections are appropriate. If you are too narrow and you list every single field as its own information type, you will produce a document that is impossible to maintain and hard to use. A balanced approach is to group information into categories that share similar sensitivity and similar handling requirements, like contact information, government identifiers, financial account details, authentication secrets, health information, and internal operational logs. Then, within each category, you can note a few examples of common elements to show what you mean without turning the document into a database schema. The goal is for someone else to read your list and say, yes, I understand what data we are talking about, and I can see why it matters.

It is also crucial to distinguish where information lives versus where it passes through, because not every system that sees information is the system of record for that information. A system might transmit data from an upstream source to a downstream destination without keeping it, or it might store a cached copy for performance reasons even though another system is considered authoritative. Some systems generate derived data, like analytics summaries, risk scores, or anomaly alerts, which can become their own information types even though they are based on other data. When you document processed, stored, and transmitted, you are clarifying these roles, including whether the system is the originator, a consumer, a broker, or a repository. This matters because controls like retention, backup, deletion, and access approvals can differ depending on whether the system is the official home of the data or just a transit point. Clear language here prevents arguments later when someone says, we thought the other team was responsible for that.

Another place people lose confidence is when stakeholders use different words for the same thing, or the same word for different things. One person might say customer, another might say account holder, and a third might say user, and all three might refer to the same population. Or the term record might mean a database row to a developer and a legal document to compliance, and those are not the same. A practical way to handle this is to keep your information types grounded in meaning, not in departmental vocabulary. You can say the system handles customer identity and contact information, even if the database calls it party, the user interface calls it profile, and the business calls it client. As long as your identification is consistent and backed by evidence, you can map the synonyms later without rewriting the whole analysis. Confidence often comes from refusing to get dragged into terminology fights and instead focusing on what the information is and how it is used.

You should also develop an instinct for hidden or accidental information types, because systems frequently capture more than they intend. Logs may contain usernames, email addresses, file names, or even portions of content if errors are verbose. Support tickets may include screenshots that contain sensitive data, because users love to paste whatever helps them explain the problem. Analytics tools may store identifiers that can be linked back to individuals, even if they claim to be anonymous. Temporary files, caches, and backups can extend the life of information long past what people think is happening. When you ask what is processed, stored, and transmitted, include these secondary paths, because they are common sources of real-world incidents. Identifying them early makes later control decisions feel less like guesswork and more like proactive engineering.

As you wrap up the identification step, a strong final check is to walk through a handful of typical user journeys and confirm that your list of information types matches what would realistically occur. Imagine a new user registers, submits information, uploads a document, receives a notification, and later requests a report. Then imagine an administrator approves access, reviews logs, and exports data for oversight. Each journey should naturally touch the information types you identified, and if something appears in your mental walkthrough that is not on your list, that is a cue to revisit your evidence. This is not about being perfect on the first try, but it is about being deliberate and traceable, so that changes can be absorbed without chaos. When you can perform these walkthroughs and your documentation still holds up, that is what confidence looks like in practice.

By the time you can name information types with clarity and then state whether each one is processed, stored, transmitted, or some combination, you have built the foundation for everything that comes next in a compliance-driven security program. You are no longer relying on vague labels or assumptions about what the system does, and you are no longer hoping that your control selection will somehow align with reality. Instead, you are doing the simplest form of risk reasoning: what information is involved, where does it go, and what could happen if it is mishandled. That habit is what makes later steps like defining security objectives and determining impact levels feel straightforward instead of intimidating. If you keep practicing this method, you will find that the hardest part is not the analysis itself, but simply getting people to slow down and be specific about the information they handle, and that is exactly why this skill is valuable.

Episode 21 — Identify Information Types Processed, Stored, and Transmitted With Confidence
Broadcast by