When it comes to application of security controls, many organizations have gotten pretty good at selecting and implementing technologies that create defense-in-depth.  Network segmentation, authorization and access control, and vulnerability management are all fairly well understood and generally practiced by companies these days.  However, many organizations are still at risk because they can’t answer a simple question: where is sensitive data?  It should go without saying but if a company can’t identify the locations where sensitive data is stored, processed, or transmitted, it will have a pretty hard time implementing controls that will effectively protect that data. Two effective methods for identifying sensitive data repositories and transmission channels are data flow mapping and automated data discovery.  A comprehensive and accurate approach will include both.  Note, of course, that both methods assume that you have already defined what types of data are considered sensitive; if this is not the case, you will need to go through a data classification exercise and create a data classification policy. Data flow mapping is exactly what it sounds like: a table-top exercise to identify how sensitive data enters the organization and where it goes once inside.  Data flow mapping is typically pretty interview-centric, as you will need to really dig into the business processes that manipulate, move, and store sensitive data.  Depending on the size and complexity of your organization, data flow mapping could either be very straightforward or extremely complicated.  However, it is the only reliable way to determine the actual path that sensitive data takes through your organization.  As you conduct your interviews, remember that you want to identify all the ways that sensitive data is input into a business process, where it is stored and processed, who handles it and how, and what the outputs are.  Make sure that you get multiple perspectives on individual business processes as validation and also match up the outputs of one process with the inputs of another.  It is not uncommon for employees in one business unit or area to have misunderstandings about other processes; your goal is to piece together the entire puzzle. Automated data discovery does a poor job of shedding light on the mechanisms that move sensitive data around an organization but it can be very valuable for validating assumptions, identifying exceptions, and helping to reveal the true size of certain data repositories.  There are a number of free and commercial tools that can be used for data discovery (one of the most popular free tools is Cornell University’s Spider tool) but they all aim to accomplish the same objective: provide you with a list of files and repositories that contain data that you have defined as sensitive.  Good places to start your discovery include network shares, databases, portal applications, home drives on both servers and workstations, and email inboxes.  Be aware that most discovery tools will require that you provide or select a regular expression that matches the format of particular data fields.  However, some more advanced commercial tools also provide signature learning features. Ultimately, your data discovery exercise should result in a much improved understanding of how sensitive data passes through your organization and where it is stored.  The next step is to determine how to apply controls based on where data is stored, processed, and transmitted.  Also, where necessary, business processes may need to be adjusted in order to consolidate data and meet data protection requirements.   While identification of sensitive data is only the first phase in a process that will result in better data security and reduced risk, it is an absolutely critical step if application of security controls is to be effective.