2.6 Data Use Agreements and Confidential Data

2.6.1 The process for establishing a Data Use Agreement or Non-Disclosure Agreement

  • At project launch, the project manager and the rest of the project team should determine if a Data Use Agreement (DUA; common) or Non-Disclosure Agreement (NDA; not common) is necessary. Any project that will involve the use and sharing of data that is not publicly available should establish a DUA or NDA.
  • Start this process right away.
  • DUA is preferred if possible; only do NDA if necessary or requested by partner.
  • These agreements should go through UCSB’s Office of Technology & Industry Alliances (TIA).
    • This page provides guidelines for establishing a DUA. The first step in establishing a DUA is to fill out a DUA Request Form and send it to Jenna Nakano ().
    • This page provides guidelines for establishing an NDA. The first step in establishing an NDA is to fill out a NDA Request Form and send it to Jenna Nakano ().
    • CC emLab’s Amanda Kelley () on all emails relating to the DUA or NDA.
    • Once a request form has been sent, TIA will help produce standardized agreements that are based on answers to these forms. Alternatively, TIA can also review DUAs or NDAs that partners share.

2.6.2 Data storage options

  • emLab is generally happy to work with partners to determine the most appropriate method for data storage. Some partners may have specific data storage requirements that will be laid out in the DUA or NDA.
  • If the DUA or NDA does not have specific data storage requirements, we recommend one of the following three approaches depending on how sensitive the data are:
    • For data that are not confidential or sensitive, data should be stored in a project-specific directory on the emLab Team Drive. Only emLab PIs and full-time emLab staff will have default access to this data directory. Any additional access for postdocs, students, or other external collaborators will only be granted on an as-needed basis and only after the collaborator has read the DUA or NDA. This option is used for the vast majority of emLab projects.
    • For confidential or sensitive data, the primary recommended approach is the UCSB Knot Cluster through the Center for Scientific Computing.
      • We recommend using this approach if your DUA or NDA allows for it.
      • To set this up, use the request form here. Nathan (Fuzzy) Rogers (Research Computing Administrator, ) and Paul Weakliem (CNSI Research Computing Support, ) are good resources for questions.
      • Anyone storing sensitive data with the knot cluster should ensure that UCSB locks the data so that they remain private.
    • For confidential or sensitive data, the second option is storing data on the Secure Compute Research Environment (SCRE) at UCSB’s North Hall Data Center. The SCRE “is a private, secure, virtual environment for researchers to remotely analyze sensitive data, create research results, and output results and analyses.”
      • We only recommend this approach if your DUA or NDA requires data be stored in a secure facility like the North Hall Data Center
      • Setting up an SCRE gives you access to a secure virtual desktop that comes pre-loaded with applications such as R and R Studio.
      • You can make a request for an SCRE using the request form here.
      • UCSB IT will help set this up. You can follow up with questions at
      • Requests are usually fulfilled within one week.
      • Further information can be found in the SCRE user guide. Jennifer Mehl (Information Security Analyst, ) is another good resource for questions.
      • The SCRE has a number of important limitations: it is relatively slow; it is very difficult to access for non-UCSB collaborators; it can only be set up after the NDA/DUA is established; it can be difficult to install Stata and may require an individual license; any non-standard R packages need to be installed manually by a UCSB person managing the SCRE

2.6.3 Other best practices

Regardless of the data storage option chosen, we recommend several additional best practices:

  • High-level metadata for all datasets should be added to the _emlab_data_directory. See this section in the emLab SOP for further description of the emLab data directory. This will help emLab be internally transparent in how we are using data for different projects, even if the entire group doesn’t have access to the data.
    • For datasets that are confidential, ensure that the “Permissions” column is set to “Secure = confidential data and likely involves a DUA or NDA.”
  • Researchers should consider anonymizing individual-level data before publicly releasing the data (see this R package as one example for how to do this).