scientificNameDarwin Core record item, REQUIRED for checklist recordsThe full scientific name, including authorship and, if applicable, year of name. In the context of a checklist, the scientific name is the basic data element of a list of taxa or hierarchy that the dataset is intended to compile and publish (see Darwin Core Terms: A quick reference guide Depending on the purpose of the checklist, scientific names can be of any hierarchical level. although they usually rank a species or below for, for example, regional flora or fauna checklists, Red List rankings or thematic inventories such as marine organisms or taxonomic revisions of species groups. If you want the checklist to publish a hierarchy (tree structure), add separate entries for relevant higher taxonomic ranks, such as Kingdom, Class, and Family, and link them in a hierarchical structure to the parentNameUsageID (see below) to support a clear interpretation of the checklist entries. Valid scientific names are Latin names that follow the syntax rules of the respective group of taxa (e.g. botanical nomenclature). Working names («Mallomonas sp.4»), common names («fruit fly») or names containing identifying qualifiers («Anemone cf. nemorosa») are not allowed, among others. If common names are used, they must be specified in addition to scientific names using the VernacularName field theorem (see below). projectIDDatasetting EML metadata, REQUIRED for some checklist datasetsUnique identifier of the project from which a dataset is derivedThe dataset type is a GUID or other identifier that is almost globally unique.

This field is REQUIRED for a dataset funded by GBIF-managed programs. In this case, the projectID is the ID of the funded project as indicated in the contract document, e.g. «IDB-AF2016-0001-REG». titleEML record metadata, REQUIRED for checklist recordsThe title under which the record is published to gbif.org. Recommendation: A short but descriptive title that characterizes the dataset in an international context and distinguishes it from similar datasets in other institutions. For example «Four new genera and 14 new specific synonyms in Pholcidae and transfer from Pholcoides Roewer to Filistatidae (Araneae)». Not recommended: «Araneae (Part 1) part.». Among other things, the title will be part of the citation of the data usage dataset. For large data sets, data load is the biggest performance bottleneck. In general, at least for large datasets, you need a 4×2×1 topology. The hardware characteristics of front-end Web servers and application servers can generally remain the same as those recommended for small and medium-sized datasets. However, because the SQL Server layer will be the bottleneck, you may find that this limits your ability to scale up to additional Web servers and front-end applications.

If data loading is your bottleneck, you may find that additional Web servers and front-end applications do not improve throughput. citationDataset metadata EML, HIGHLY RECOMMENDED for checklist datasetsText that specifies how your dataset should be cited in publications that use your data. To ensure that your record is cited as you wish, you can explicitly specify the requested citation. This text appears on the dataset page and is made available to data users along with downloads that contain publications from your dataset. If no text is provided, GBIF automatically returns a citation in standard format that includes the name of the record and the name of the publishing institution, as well as the date of download and a reference to gbif.org. If a record belongs to an organization whenever possible, it is useful to check the original source of my information, if the data is based on NASA, Census Bureau, CE and another origin, it is desirable to obtain the data from the original entity with the minimum possible output. In many cases, we may find different sources for the same data set, and in each source we may have a different quantity or quality of information. An example of this is the landslide dataset in Kaggle has 1693 lines, the same NASA dataset (the original source of the data is from GLC – Nasa Centro Goddart) has 11,033 lines.

taxonRankDarwin Core dataset element, REQUIRED for checklist datasetsThe taxonomic rank of the given scientific name (see Darwin Core Terms: A quick reference guide The taxon rank supports interpreting the scientific name when indexing and supports comparing checklist records with the basic taxonomy, especially for names at the genus level or more (monomial). Although the format of the names of higher taxa in some groups contains indicators of their rank, this is not consistent between groups or even within them and cannot be reliably used for interpretation. For the correct placement of names, the explicit specification of the taxon rank is an important criterion in addition to the information about the higher taxonomy. For practical reasons, the ranks used must be (large) ranks of Linnaeus: kingdom, tribe, class, order, family, genus, species. The terms Latin and English are accepted. licenseDataset metadata EML, REQUIRED for checklist datasetsA machine-readable explanation of the rights associated with the published dataset. Use CC0 or CC BY. Note: All datasets funded under the BID and BIFA programs must be published under a Creative Commons CC0 rights waiver or CC BY Attribution license. Recordings without a valid license statement will not be accepted for publication. Machine-readable licenses enable automated data filters that provide users with clear guidance on the permitted use of records, thereby encouraging the use and citation of data. Note: If the recording is created by a program operated by GBIF (e.g.

BID, BIFA, CESP), two additional fields are required: publisherrecord metadata EML, REQUIRED for checklist recordsTitle of the institution or organization listed as a data publisher in gbif.org. The publishing body is the institution that owns or owns the dataset and is responsible for its content and maintenance. The title indicated should be the official title of the organization as registered with the competent authorities, listed on the websites and, where applicable, indicated in the project contract. contactDataset metadata EML, REQUIRED for checklist datasetContact data (minimum: name and email address) for at least one administrator contact for registration. Contact information will be publicly available on gbif.org. This information is necessary to ensure the possibility of communication about the file. The administrative contact person is the person/role who is consulted on content, qualitative and legal issues concerning the dataset by users and central services (GBIFS). If no personal contact data can be provided, it is possible to provide a functional contact via a role name (e.g.

«curator») and an e-mail (collections@myhouse.com). However, there is a need for responsibilities for handling incoming communications to be clearly defined and monitored internally. For example, when generating a single record that analyzes sales from all branches of a company and two different data sets contain different information about the same branch, we are faced with a referential integrity issue that requires validating the right data, correcting it, and moving on to the process of generating a single record. kingdomDarwin Core dataset element, HIGHLY RECOMMENDED for checklist datasetsThe full scientific name indicating the kingdom under which the scientific name is classified (see Darwin Core Terms: A short reference guide and other higher taxonomy, if possible For scientific names, there are many cases where the correspondence of a given name with the basic taxonomy is uncertain or ambiguous. This is the case, for example, of homonyms (identical names exist for different organisms, usually between groups), newly described names that are not yet part of the existing taxonomic tree, or spelling variants (typos, hyphenation, etc.).