2 Wikidata Data Model

Jere Odell; Mairelys Lemus-Rojas; and Lucille Brys

The data in Wikidata are formed by entities which are matched to RDF (Resource Description Framework) triples containing a subject, a predicate, and an object (in Wikidata terminology this would be an item, a property, and a value). This data can be retrieved using SPARQL (Simple Protocol and RDF Query Language). Entities in Wikidata refer to items, properties, or lexemes. However, for the purpose of this resource, we will begin by focusing on items and properties. These each have their respective namespaces: items in the main namespace and properties in the property namespace (Help:Namespaces – Wikidata, n.d.).

All items and properties are automatically assigned unique identifiers—Q for items and P for properties—followed by a sequential number. Identifiers allow the disambiguation of entities which is particularly valuable in Wikidata’s multilingual environment. Statements in Wikidata consist of property-value pairs where the value can often be another item. When these connections are made, they create a data and grammatical structure that describe an item. For example, in Table 1, the item for Lauren Berlant (Q12237573) has the property field of work (P101) with the value gender studies (Q1662673). To put it in a simple sentence, Lauren Berlant’s field of work is gender studies. The value, gender studies, is also an item with defined statements.

Table 1. Example of an RDF triple with labels and unique identifiers.
Item (subject) Property (predicate) Value (object)
Lauren Berlant field of work gender studies
Q12237573 P101 Q1662673

Items are used to represent topics, concepts, or objects, and can be created by anyone. For instance, Figure 4 illustrates the Wikidata data model using Marie Curie’s entry as an example.

Labeled screen shot displaying different features of a Wikidata entry using Marie Curie as an example.
Figure 4. Wikidata’s data model representing Marie Curie’s entity, which includes a statement with qualifiers and references (w.wiki/32q).

Q7186 next to Marie Curie’s name is the unique identifier assigned to this item. There is also an area for providing human-friendly descriptions and alternate names—or aliases—for the entry. While there can be multiple items in Wikidata with the same label or description, there can not be more than one item with the same combination of label and description (Lemus-Rojas & Pintscher, 2017). This restriction allows for the human-readable data presented in the label, description, and alias for the entry to be useful when trying to disambiguate concepts. In this figure, the subject is described as a “Polish-French physicist and chemist.” The triple forming the statement contains Marie Curie (Q7186) as the subject, award received (P166) as the predicate, and Nobel Prize in Physics (Q38104) as the value for the object. This statement also includes qualifiers which are used to provide context and scope for the entity. Here, more specific information about the claim is being offered including the point in time when the award was received, the names of other awardees, and the monetary compensation received. All claims can be supported by references which facilitate the verifiability of the data. References play a critical role in Wikidata due to the nature of the project which allows multiple, and often conflicting, data points to coexist. Continuing with Marie Curie’s example, the first reference represented includes a reference URL (P854) for the source of information, a retrieved (P813) date recording when the information was retrieved, as well as publisher (P123), language of work or name (P407), and title (P1476) of the source.

Properties serve as connectors between items and their values. Unlike items, properties are frequently proposed to accommodate the needs of the global community and defined in an open discussion that any user with a Wikidata account can join. Properties are defined in the proposal process to explain their utility and scope (Wikidata:Property creation – Wikidata, n.d.). In this way, a property will have constraints. When a property is used outside of its constraints, it will be flagged on the Wikidata entry. For instance, the property award received (P166) (see Figure 5) requires the inclusion of a reference to support the claim. When no reference is added, the statement is flagged with a notice so that Wikidata contributors can address it, as needed. When the discussion for the property proposal approaches consensus, that property is created by a Wikidata user with either a property creator or an administrative role. Given that properties need to be reliable and that the process depends on consensus, there are far fewer properties than there are items. In April 2022, Wikidata included over 97 million items, but only 9,953 properties.

Screen shot showing a flag on a statement that violates a property constraint.
Figure 5. Example of a statement from Marie Curie’s entry using the award received (P166) property to illustrate a property constraint triggered by the lack of references to support the statement.

In the context of this introduction to Wikidata for scholarly communication library professionals, common items (among others) include: authors, works, journals, publishers, universities, academic degrees, and subjects.

Exploring Properties in Wikidata

Many of the thousands of properties available for use in Wikidata are relevant to the scholarly communication community. Properties can be searched for in either the Wikidata properties page, or through tools like the Wikidata Property Explorer or the WDProp. In addition, the list below contains some core properties to use specifically when describing authors (see Table 2) and their works (see Table 3). For a more complete list we recommend visiting the IUPUI University Library’s WikiProject page.

Properties for Authors

Table 2. List of properties for authors.
Property Value(s)
instance of (P31) human (Q5)
sex or gender (P21) sex or gender a person identifies as or is publicly known as (see discussion in Chapter 4: Wikidata and Gender Equity in Publishing)
given name (P735) given name
family name (P734) family name
languages spoken, written or signed (P1412) language, as appropriate
occupation (P106) occupation information, as applicable

Examples:

field of work (P101) academic discipline
employer (P108) employment information
educated at (P69) education information
ORCID iD (P496) identifier
Google Scholar author ID (P1960) identifier
LinkedIn personal profile ID (P6634) identifier

Properties for Scholarly Works

Table 3. List of properties for scholarly works.
Property Value(s)
instance of (P31)
title (P1476) title of the work
subtitle (P1680) subtitle of the work
author name string (P2093) string to capture author’s names that are not yet established
author name string (P2093) qualifier: series ordinal (P1545) author’s order of authorship
author (P50) established author’s names
language of work or name (P407) language of the work
publication date (P577) date or point in time when the work was first published or released
number of pages (P1104) number of pages in a written work
published in (P1433) larger work that a given work was published in, like a book, journal, or music album
DOI (P356) identifier
PubMed ID (P698) identifier

Exploring Qualifiers in Wikidata

As seen in the lists above, many of the statements can also include qualifiers. With the use of qualifiers, contributors can provide more precise information about the claim being made in the statement (Help:Qualifiers – Wikidata, n.d.). The educated at (P69) statement for Lauren Berlant (Q12237573) illustrates the use of qualifiers highlighted in green in Figure 6.

Screen shot showing how qualifiers describe a statement in Wikidata.
Figure 6. Examples of qualifiers in use for Lauren Berlant’s education statements with the educated at (P69) property and supporting references.

In this example, the qualifiers academic degree (P512), academic major (P812), and end time (P582) describe Berlant’s time as a student. This additional information provides users with a better understanding of the author’s trajectory.

Supporting Statements with References

Most statements on Wikidata should include a reference to a source. Wikidata contributors should aim to add a reference to any statement of fact that is not common knowledge about the subject of the entry they are editing. Some statements, however, are self-referential—an ORCID, for example, provides a link that resolves to the author’s ORCID profile. Identifiers pointing to an external data source, like ORCID, are either correct for the entry they describe or they are not, but they do not need a reference. In contrast, adding an author’s undergraduate school degree and institution to their Wikidata entry records a fact that should be sourced. In 1979, Lauren Berlant completed a bachelor’s degree at Oberlin College. A fact like this could be sourced from an author’s ORCID profile or from a CV (Curriculum Vitae) that the author posted to a website. In the case of Lauren Berlant, the reference is provided with a link to a University of Chicago news story that memorializes Berlant’s life and work (see Figure 6).

References can also be made from books and other sources using the property stated in (P248). When appropriate, these can also be supplemented with qualifiers, such as quotation (P1683) or the retrieved (P813) date. If using a website as a reference, consider using Internet Archive’s Wayback Machine to find or archive a copy of the cited page. A Wikidata reference can be supplemented with archive URL (P1065) and archive date (P2960). In this way, the statement can be verified even if the original website content changes or moves (see highlighted area in blue in Figure 6).

 Activities

  • Identify a journal article authored by a scholar from your current or former university. Use the Wikidata search bar to try to find the article and the author in Wikidata. If you are having trouble finding an article, use one of these articles with the subject of gender studies (Q1662673) retrieved from the Wikidata Query Service. What are the missing bibliographic elements (if any)?
  • Visit the author’s entry. Do any statements need a reference? Add a missing statement or reference to the article or the author’s entry

Additional Resources

As you continue building your knowledge of Wikidata, some other resources worth checking out include: an Introduction to Wikidata, offered by Wiki Education, provides an overview of the knowledge base while also focusing on key concepts; the article Wikidata: A free collaborative knowledgebase which provides a perspective on the development of Wikidata and what it was originally set to accomplish; as well as the Wikidata in Brief, a one pager which can be used as a quick guide to Wikidata.

References

Help:Namespaces – Wikidata. (n.d.). Wikidata. Retrieved May 17, 2022, from https://www.wikidata.org/wiki/Help:Namespaces

Help:Qualifiers – Wikidata. (n.d.). Wikidata. Retrieved May 16, 2022, from https://www.wikidata.org/wiki/Help:Qualifiers

Lemus-Rojas, M., Pintscher, L. (2017). Wikidata and Libraries: Facilitating Open Knowledge. In M. Proffitt (Ed.), Leveraging Wikipedia: Connecting Communities of Knowledge (pp. 143–158). ALA Editions. https://scholarworks.iupui.edu/handle/1805/16690

Wikidata:Property creation – Wikidata. (n.d.). Wikidata. Retrieved May 17, 2022, from https://www.wikidata.org/wiki/Wikidata:Property_creation

License

Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Wikidata for Scholarly Communication Librarianship Copyright © 2022 by Jere Odell; Mairelys Lemus-Rojas; and Lucille Brys is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Digital Object Identifier (DOI)

https://doi.org/10.7912/1e00-ef15

Share This Book