How To Scan for Data Discovery
This article applies to: Data Discovery
Important! Before scanning, you need to understand your department’s practices around confidential data. You'll need to know what to do with different types of confidential data you uncover.
The following steps provide an overview of the data discovery process. More information is provided on the pages for individual scan tools.
- Choose a Data Discovery/Scan Tool: Your local technical support provider should be able to help you select and install a data discovery tool. Currently recommended for use at Cornell are Identity Finder for Windows and Macintosh and Find SSNs for Linux and other flavors of Unix.
- Install the Data Discovery Tool: Some departments and units have configured custom versions of the data discovery tools. This may mean some features are not available or some defaults have been changed. It's important to check with your local technical support provider to be sure you know which version of the tool you should install and where to find it. If you have questions, always check with your technical support provider before making changes or scanning with different settings. See the Identity Finder site for installation instructions and other details.
- Pre-scan Housekeeping: Some basic computer housekeeping practices will eliminate many of the places confidential data likes to hide. Taking these steps before scanning will reduce the time it takes to scan your computer and cut down on distracting false positives. See Before You Scan on the Identity Finder website.
Run Identity Finder: This can take a very long time on some computers. If you can, let the tool run overnight or over a weekend. Step-by-step instructions are available:
Scan Type Instructions Scan Windows
Scan External Drives
(Run separate scan on external drives, CDs, etc.)
- Outlook or Exchange on Mac. ID Finder on a Mac does not scan Entourage or Outlook mailbox files. If you work in a role that requires you to handle confidential data, you are strongly encouraged to visually scan your mailbox folders. If you do find confidential data in your mail folders, delete the message and then empty the Trash. Policy 5.10 states that you should not send confidential data in email unless it is encrypted.
Handle the Results: When the scan finishes, you’ll be presented with a list of possible matches. Check each to determine if it is valid, and then take whatever action is appropriate for your department and work. In most cases, this means you’ll need to open the file, find the data the tool identified, and determine what it really is and whether you need it. See the specific options available:
The number of results on an average machine can be overwhelming. Here are some tips:
- Confidential data follows people, not computers. It most often lurks in Office documents, spreadsheets, electronic mail, and PDF documents. Start your work there. Other files might contain real confidential data, too.
- Unintelligible files may be false positives, or they may be created by an application your computer doesn’t understand. This is where your understanding of your work comes in. Do you recognize the data? Does it contain other attributes like names, addresses, or useful text that can help identify it? If you’re not sure, consult your local technical support provider before deleting files.
- False positives are a fact of life. If you’re satisfied the result isn’t real confidential data, most data discovery tools let you ignore it.
- False negatives are a fact of life, too. Examine other files in the same directories where valid matches were identified by the search. More confidential data may be hiding there that wasn’t detected.
- Don’t try to tackle it all at once. If your data discovery tool lets you save your work, do so and pick it up again later.
- Post-scan Housekeeping: After you clean up your computer, there are certain things you may need to do to insure you haven’t simply moved the files somewhere else. See After You Scan.
- Ongoing Data Security: After completing this process, you should have secured all of the confidential data remaining in your care. You will want to repeat the data discovery process on an on-going basis to be sure any new data is identified and secured. Your department should have guidelines for how often this needs to be done.
Tip: You may need to map or mount a drive to scan files on a server. The How To page for each tool includes step-by-step instructions for How to Scan External Drives. Instructions for mapping a drive are also available: