Social Science and Humanities Metadata field information
Post-Uploading Activity
Where the Social Science and Humanities Metadata fields have been added to your dataset and are available for editing, the following guidance should be used to support the population of the fields with appropriate metadata content.
All of these fields may require updating post any ADA Staff quality assurance checking of the dataset. Where required, ADA Staff will identify issues through the Processing Report. Where changes are required, you may be asked to complete these directly into the dataset, but in some cases the ADA Staff may make the change with your approval if it is easier to do so.
ADA Dataverse DEPOSIT Social Science and Humanities Metadata Fields
Unit of Analysis: The basic unit of analysis or observation that the dataset describes, such as individuals, families, households, groups, institutions, organisations, administrative units and more. A DDI controlled vocabulary exists for ‘Analysis Unit’ elements at http://www.ddialliance.org/controlled-vocabularies.
Universe: Provide a description of the population covered by the data in the file(s); the group of people or other elements that are the object of the study and to which the results refer. Age, nationality and residence commonly help to delineate against a given universe, but any number of other factors may be used, such as age limits, sex, marital status, race, ethnic group, income, veteran status, criminal convictions, and more. The universe may also consist of elements other than persons, such as housing units, court cases, deaths, countries, and so on. In general, it should be possible to tell from the universe description whether a given individual or element is a member of the population under study. Also known as the universe of interest, population of interest, and target population.
Time Method: The time method or time dimension of the data collection, such as panel, cross-sectional, trend, time-series, or other.
Data Collector: This field is not used by the ADA as the information is contained within the general Non-standard metadata field section under the Contributor heading.
Collector Training: Enter the type of training provided to the data collector here, particularly if specialist training will be required for secondary users to understand the data.
Frequency: If the data collected was done so at more than one point in time, indicate the frequency with which the data was collected; that is monthly, quarterly, yearly or other.
Sampling Procedure: Type of sample and sample design used to select the survey respondents to represent the population. May include reference to the target sample size and the sampling fraction.
Target Sample Size: Specific information regarding the target sample size, actual sample size, and the formula used to determine this should be entered where applicable.
- Actual: Enter the actual sample size, as an integer.
- Formula: Enter the formula used to determine the final target sample size.
Major Deviations for Sample Size: Show correspondence as well as discrepancies between the sampled units (obtained) and available statistics for the population (age, sex-ratio, marital status, etc…) as a whole.
Collection Mode: Method used to collect the data; instrumentation characteristics (for example, telephone interview, mail questionnaire, web-based self-completion, or other).
Type of Research Instrument: Type of data collection instrument used.
- Structured indicates an instrument in which all respondents are asked the same questions/tests, possibly with pre-coded answers. If a small portion of such a questionnaire includes open-ended questions, provide appropriate comments to support the answers.
- Semi-structured indicates that the research instrument contains mainly open-ended questions.
- Unstructured indicates that in-depth interviews were conducted.
Characteristics of Data Collection Situation: Description of noteworthy aspects of the data collection situation. Includes information on factors such as cooperativeness of respondents, duration of interviews, number of call backs, or similar relevant information.
Actions to Minimize Losses: Summary of actions taken to minimise data loss. Include information on actions such as follow-up visits, supervisory checks, historical matching, estimation, and so on.
Control Operations: Methods to facilitate data control performed by the primary investigator when conducting the study. This concerns any control of extraneous variables within the study such as minimizing differences between participants (e.g. their stage of development by controlling age of participants, or ability by controlling IQ), including standardising researcher's variables (such as removing gender bias).
Weighting: The use of sampling procedures might make it necessary to apply weights to produce accurate statistical results. In this section, describe where applicable the criteria for using weights in the analysis of a collection. If a weighting formula or coefficient was developed, the formula should be provided, its elements defined, and it be indicated how the formula was applied to the data. Where this information would make the data identifiable and therefore lead to disclosure, the depositor may choose to describe the methods used to determine the weighting but not reveal which elements the weighting was applied to. Where applied to all elements, the methodology may be described fully but the actual formula may need to be masked.
Finally, if the weighting criteria are complex and need to be recorded in a formal document, it may be easier to upload this document as Supporting Documentation with appropriate File Tags and Description Notes. The weighting document should then be referenced in this field.
Cleaning Operations: Enter here any methods used to clean the data collection, such as consistency checking, wildcard checking, missing variable/value labels, out of range values, logic inconsistencies, confidentiality issues, or other. This may also include details on coding rules that were developed to support the data collection, thus leading to standardisation of the data, hence cleaning.
Study Level Error Notes: This is a note element used for any information annotating or for clarifying the methodology and processing of the study.
Response Rate: The percentage of sample members who provided information.
Estimates of Sampling Error: Enter here a measure of how precisely one can estimate a population value from a given sample.
Other Forms of Data Appraisal: Enter here other issues pertaining to the data appraisal. You can describe issues such as response variance, non-response rate and testing for bias, interviewer and response bias, confidence levels, question bias, or similar.
Notes: General notes about the dataset should be entered using the following fields. These notes are not presented as part of the dataset Header Box and therefore can be more detailed than the Copyright and Ethics Note information field discussed earlier.
- Type: Type of Note. This could be an "Information Note" entered to support a secondary user in understanding the dataset.
- Subject: Note Subject. A general topic or subject that the note is about. This could be "Instructions for Use" for example.
- Text: Free text field for the detailed entry of the note information.