GEO Metadata Validation Rules
To improve GEO's processing rate and maintain a high standard of metadata collection, GEO has implemented an automated pre-checking service for metadata completeness, formatting and content in the metadata spreadsheet. After completion of FTP transfer for raw and processed data files, the completed metadata file should be uploaded on the Submit Metadata page.
Upon upload, the metadata file will be scanned and checked for formatting and content within seconds. For example, if a section (STUDY, SAMPLES, PROTOCOLS, PAIRED-END EXPERIMENTS) is missing, you will receive the error message "Uploaded file is missing mandatory section" and a table will appear with the name of the missing section. If you receive an error message, please correct the indicated fields of your metadata file and upload your file again. Uploading a complete metadata file will return the message "Your metadata file has been successfully uploaded". Successful uploading of the metadata file places your submission into GEO's processing queue and you will receive an email notification with your submission summary.
error name | error message that you will receive | explanation and how to fix |
---|---|---|
excel_parse_failure | Uploaded file cannot be read. The file must be in Excel version 2007 or higher with .xlsx extension. | The file is not an Excel version 2007 or higher file with .xlsx extension. GEO cannot process metadata files submitted with extension .txt, .csv, or .tsv. Do not compress the metadata Excel spreadsheet. A compressed metadata Excel spreadsheet cannot be read. |
discontinued_template | It appears that you have used a discontinued version of the metadata spreadsheet. Please use the above link to download the newest version and resubmit. | Old versions of the metadata spreadsheet are not supported. Please download, complete, and submit the newest version of the metadata spreadsheet. |
missing_worksheet | Uploaded file is missing required worksheet named "Metadata". Please make sure you are using our newest metadata template. | The Excel tab (also called a worksheet) containing the metadata information must be named "Metadata" or "2. Metadata Template". Any other tab name will produce the "missing_worksheet" error. For example, do not rename the tab "RNAseq" or "ChIPseq". Do not include multiple tabs with metadata for separate studies in the same file. GEO needs one metadata file per study. |
missing_section | Uploaded file is missing mandatory section: | The metadata tab must have sections titled STUDY, SAMPLES and PROTOCOLS. If it is a paired-end sequencing study, the metadata file must also contain a PAIRED-END EXPERIMENTS section. |
empty_samples_section | SAMPLES section does not list any samples. Please make sure that library names do not start with "#" symbol since such lines are treated as comments and ignored. | Samples must be listed in the SAMPLES section. |
missing_mandatory_info | Uploaded file is missing mandatory information in the STUDY or PROTOCOLS sections: | Required fields in STUDY and PROTOCOLS sections are: title, summary (abstract), experimental design, extract protocol, library construction protocol, library strategy, data processing description, assembly or genome build, and processed data files format and content. Library strategy refers to the experiment type such as RNA-seq, ATAC-seq, or Hi-C. A table will be provided that lists the fields in STUDY and/or PROTOCOLS sections that are empty. |
missing_sample_header | SAMPLES section is missing required headers for the table: | Deleting columns from the metadata template in the SAMPLES section is not allowed and will produce the "missing_sample_header" error. A table will be provided which lists the missing headers in the SAMPLES section. You can add columns to the SAMPLES section for additional characteristics appropriate for your samples. For example, you could use the header "overall survival" and provide survival data for each sample. |
empty_library_name | At least one of the samples has empty library name. | In the SAMPLES section at least one of the samples has empty library name. Sometimes this error is caused by non-empty cells in the SAMPLES section that are not associated with the included samples. |
missing_sample_info | SAMPLES section is missing required information: | Every sample in the SAMPLES section must include information for library name, title, organism, molecule, single or paired-end, and instrument model. A table will be provided which lists the missing field for each library name. |
duplicate_library_names | Identical library names were found. Library names must be unique. This check is case insensitive, meaning that "Control1" and "control1" will be considered identical. Identical names are: | Every library name in the SAMPLES section must be unique. A table will be provided which lists the non-unique library name and the number of times it was found (occurrences) in the SAMPLES section. |
duplicate_sample_titles | Identical sample titles were found. Sample titles must be unique. This check is case insensitive, meaning that "Control1" and "control1" will be considered identical. Identical titles are: | Every title in the SAMPLES section must be unique. A table will be provided which lists the non-unique title and the number of times (occurrences) it was found in the SAMPLES section. |