Research Guides: Data Management: Best Practices

File Naming & Structure

Why is file naming important?

Think of a file name as a unique identifier for each of your files. Following a naming convention allows you to simplify the organization of your files and locate your files with ease, as well as making it easier for others to understand and reuse your data. This is particularly important when you are working on a collaborative project.

How should you name your file?

Here are some recommended best practices for naming your files:

Use names that are brief but descriptive
Avoid spaces and special characters (like *, #, % etc.)
Come up with a naming convention adhered to by everyone using the files
Identify versions of files using dates and version numbering in file name
Use three letter file extensions to ensure backwards compatibility (ex: .doc, .tif, .txt)

How should files be structured?

Folder structure for your files can assist in the unique identification of the files contained within them. Consider the structure of the folders containing your data files before you begin to collect your data. Ideas for how to organize your folders include:

Data type (text, images, models, etc.)
Time (year, month, session, etc.)
Subject characteristic (species, age grouping, etc.)
Research activity (interview, survey, experiment, etc.)

Consider these examples of file naming and folder structure:

File001.txt vs.
201206blood_ID0234.txt

MyDocuments\Research\Sample12.jpg vs
C:\\NEHGrant01234\WWI\Images\London_001.jpg

Data Organization

Why should you organize your data?

The organizational structure of your data can help secondary users of your data find, identify, select, and obtain the data they require.

How do you organize your data?

For best results, data structure should be fully modeled top-to-bottom/beginning-to-end in the planning phase of a project.
You'll want to devise ways to express the following:

the context of data collection: project history, aim, objectives and hypothesis.
data collection methods: sampling, data collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage and secondary data sources used
dataset structure of data files, study cases, relationships between files
data validation, checking, proofing, cleaning and quality assurance procedure carried out
changes made to data over time since their original creation and identification of different versions of data files
information on access and use conditions or data confidentiality

(adapted from UKDA)

How to Make a Data Dictionary

Credit: OSF Best Practices: https://help.osf.io/m/bestpractices/l/618767-how-to-make-a-data-dictionary

A data dictionary is critical to making your research more reproducible because it allows others to understand your data. The purpose of a data dictionary is to explain what all the variable names and values in your spreadsheet really mean.

Zoom

Variable names

The first column should contain your variable names exactly as they appear in your spreadsheet.

Readable variable name

This column should contain short but human-readable variable names

For instance, if ‘VAR1’ is a variable name referring to weight, then an appropriate readable variable name for VAR1 is ‘weight’.
You can use spaces, characters, and capital letters.
This is the name that you would use to label graphs and other figures.

Measurement units

This column should contain the measurement units for the variable.

For instance, if a column contains measurements of time, it should be clear whether they are measured in hours, minutes, or seconds.

Allowed values

A column should contain the range of values or accepted values for the variable.

This helps identify data entry errors.
Minimum and maximum values should be included.
Chosen values (e.g., “male”, “female”) should be included and detailed, if needed, in the description column (see below).

Definition of the variable

This column should contain a definition of the variable.

The variable definition reflects the way you use the term and intend the term to be used by others who wish to understand your work.
While there are many kinds of definition, where possible, please provide a definition with the following genus-differentia form:

“A is a B that Cs.”

For instance, “An a) attitude is a b) disposition c) to think or feel that is about something or someone, typically one that is reflected in a person's behavior.”
Avoid circular definitions (e.g. “A baseball is a ball used in baseball.”)

Synonyms for the variable name (optional)

This column should contain, if relevant, one or more words that could be substituted for the variable name.
These synonyms should reflect the meaning of the variable name as you use it, and not merely as the variable name might be used in a different context.
Again, the purpose is to convey the meaning of the variable term you use in your data.

Description of the variable (optional)

The final column should contain, where needed, a longer explanation of the variable.

This is a human readable description with enough information for others to understand what the variable refers to.
It should also explain terms in the variable’s definition in more depth if needed. For instance, a description of the variable might clarify what is intended by ‘disposition’ in the above definition.
It could provide sources for definitions if those definitions are not the researcher’s own.

Other resources

The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. Learn more at: https://www.ddialliance.org/
Example data dictionaries provided by the USGS: https://www2.usgs.gov/datamanagement/describe/dictionaries.php