How To Scan for Data Discovery
This article applies to: Data Discovery
Important! Before scanning, you need to understand your department’s practices around confidential data. You’ll need to know what to do with different types of confidential data you uncover.
The following steps provide an overview of the data discovery process. More information is provided on the pages for individual scan tools.
- Choose a Data Discovery/Scan Tool: Your local technical support provider should be able to help you select and install a data discovery tool. Currently recommended for use at Cornell are Spirion for Windows and Macintosh and Find SSNs for Linux and other flavors of Unix.
- Install the Data Discovery Tool: Some departments and units have configured custom versions of the data discovery tools. This may mean some features are not available or some defaults have been changed. It’s important to check with your local technical support provider to be sure you know which version of the tool you should install and where to find it. If you have questions, always check with your technical support provider before making changes or scanning with different settings. See the Spirion site for installation instructions and other details.
Run Spirion: This can take a very long time on some computers. If you can, let the tool run overnight or over a weekend. Step-by-step instructions are available:
Scan Type Instructions Scan Windows Mac Scan External Drives (Run separate scan on external drives, CDs, etc.) Windows Mac - Outlook email on Mac. Spirion on a Mac does not scan Outlook mailbox files. If you work in a role that requires you to handle confidential data, you are strongly encouraged to visually scan your mailbox folders. If you do find confidential data in your mail folders, delete the message and then empty the Trash. Cornell University Policy 5.10, Information Security, states that you should not send confidential data in email unless it is encrypted.
Handle the Results: When the scan finishes, you’ll be presented with a list of possible matches. Check each to determine if it is valid, and then take whatever action is appropriate for your department and work. In most cases, this means you’ll need to open the file, find the data the tool identified, and determine what it really is and whether you need it. See the specific options available:
The number of results on an average machine can be overwhelming. Here are some tips:
- Confidential data follows people, not computers. It most often lurks in Office documents, spreadsheets, electronic mail, and PDF documents. Start your work there. Other files might contain real confidential data, too.
- Unintelligible files may be false positives, or they may be created by an application your computer doesn’t understand. This is where your understanding of your work comes in. Do you recognize the data? Does it contain other attributes like names, addresses, or useful text that can help identify it? If you’re not sure, consult your local technical support provider before deleting files.
- False positives are a fact of life. If you’re satisfied the result isn’t real confidential data, most data discovery tools let you ignore it.
- False negatives are a fact of life, too. Examine other files in the same directories where valid matches were identified by the search. More confidential data may be hiding there that wasn’t detected.
- Don’t try to tackle it all at once. If your data discovery tool lets you save your work, do so and pick it up again later.
- Post-scan Housekeeping: After you clean up your computer, there are certain things you may need to do to insure you haven’t simply moved the files somewhere else. See After You Scan.
- Ongoing Data Security: After completing this process, you should have secured all of the confidential data remaining in your care. You will want to repeat the data discovery process on an on-going basis to be sure any new data is identified and secured. Your department should have guidelines for how often this needs to be done.
Comments?
To share feedback about this page or request support, log in with your NetID