2 Wikidata Data Model
Jere Odell; Mairelys Lemus-Rojas; and Lucille Brys
The data in Wikidata are formed by entities which are matched to RDF (Resource Description Framework) triples containing a subject, a predicate, and an object (in Wikidata terminology this would be an item, a property, and a value). This data can be retrieved using SPARQL (Simple Protocol and RDF Query Language). Entities in Wikidata refer to items, properties, or lexemes. However, for the purpose of this resource, we will begin by focusing on items and properties. These each have their respective namespaces: items in the main namespace and properties in the property namespace (Help:Namespaces – Wikidata, n.d.).
All items and properties are automatically assigned unique identifiers—Q for items and P for properties—followed by a sequential number. Identifiers allow the disambiguation of entities which is particularly valuable in Wikidata’s multilingual environment. Statements in Wikidata consist of property-value pairs where the value can often be another item. When these connections are made, they create a data and grammatical structure that describe an item. For example, in Table 1, the item for Lauren Berlant (Q12237573) has the property field of work (P101) with the value gender studies (Q1662673). To put it in a simple sentence, Lauren Berlant’s field of work is gender studies. The value, gender studies, is also an item with defined statements.
Item (subject) | Property (predicate) | Value (object) |
Lauren Berlant | field of work | gender studies |
Q12237573 | P101 | Q1662673 |
Items are used to represent topics, concepts, or objects, and can be created by anyone. For instance, Figure 4 illustrates the Wikidata data model using Marie Curie’s entry as an example.
Q7186 next to Marie Curie’s name is the unique identifier assigned to this item. There is also an area for providing human-friendly descriptions and alternate names—or aliases—for the entry. While there can be multiple items in Wikidata with the same label or description, there can not be more than one item with the same combination of label and description (Lemus-Rojas & Pintscher, 2017). This restriction allows for the human-readable data presented in the label, description, and alias for the entry to be useful when trying to disambiguate concepts. In this figure, the subject is described as a “Polish-French physicist and chemist.” The triple forming the statement contains Marie Curie (Q7186) as the subject, award received (P166) as the predicate, and Nobel Prize in Physics (Q38104) as the value for the object. This statement also includes qualifiers which are used to provide context and scope for the entity. Here, more specific information about the claim is being offered including the point in time when the award was received, the names of other awardees, and the monetary compensation received. All claims can be supported by references which facilitate the verifiability of the data. References play a critical role in Wikidata due to the nature of the project which allows multiple, and often conflicting, data points to coexist. Continuing with Marie Curie’s example, the first reference represented includes a reference URL (P854) for the source of information, a retrieved (P813) date recording when the information was retrieved, as well as publisher (P123), language of work or name (P407), and title (P1476) of the source.
Properties serve as connectors between items and their values. Unlike items, properties are frequently proposed to accommodate the needs of the global community and defined in an open discussion that any user with a Wikidata account can join. Properties are defined in the proposal process to explain their utility and scope (Wikidata:Property creation – Wikidata, n.d.). In this way, a property will have constraints. When a property is used outside of its constraints, it will be flagged on the Wikidata entry. For instance, the property award received (P166) (see Figure 5) requires the inclusion of a reference to support the claim. When no reference is added, the statement is flagged with a notice so that Wikidata contributors can address it, as needed. When the discussion for the property proposal approaches consensus, that property is created by a Wikidata user with either a property creator or an administrative role. Given that properties need to be reliable and that the process depends on consensus, there are far fewer properties than there are items. In April 2022, Wikidata included over 97 million items, but only 9,953 properties.
In the context of this introduction to Wikidata for scholarly communication library professionals, common items (among others) include: authors, works, journals, publishers, universities, academic degrees, and subjects.
Exploring Properties in Wikidata
Many of the thousands of properties available for use in Wikidata are relevant to the scholarly communication community. Properties can be searched for in either the Wikidata properties page, or through tools like the Wikidata Property Explorer or the WDProp. In addition, the list below contains some core properties to use specifically when describing authors (see Table 2) and their works (see Table 3). For a more complete list we recommend visiting the IUPUI University Library’s WikiProject page.
Properties for Authors
Property | Value(s) |
instance of (P31) | human (Q5) |
sex or gender (P21) | sex or gender a person identifies as or is publicly known as (see discussion in Chapter 4: Wikidata and Gender Equity in Publishing) |
given name (P735) | given name |
family name (P734) | family name |
languages spoken, written or signed (P1412) | language, as appropriate |
occupation (P106) | occupation information, as applicable
Examples: |
field of work (P101) | academic discipline |
employer (P108) | employment information |
educated at (P69) | education information |
ORCID iD (P496) | identifier |
Google Scholar author ID (P1960) | identifier |
LinkedIn personal profile ID (P6634) | identifier |
Properties for Scholarly Works
Exploring Qualifiers in Wikidata
As seen in the lists above, many of the statements can also include qualifiers. With the use of qualifiers, contributors can provide more precise information about the claim being made in the statement (Help:Qualifiers – Wikidata, n.d.). The educated at (P69) statement for Lauren Berlant (Q12237573) illustrates the use of qualifiers highlighted in green in Figure 6.
In this example, the qualifiers academic degree (P512), academic major (P812), and end time (P582) describe Berlant’s time as a student. This additional information provides users with a better understanding of the author’s trajectory.
Supporting Statements with References
Most statements on Wikidata should include a reference to a source. Wikidata contributors should aim to add a reference to any statement of fact that is not common knowledge about the subject of the entry they are editing. Some statements, however, are self-referential—an ORCID, for example, provides a link that resolves to the author’s ORCID profile. Identifiers pointing to an external data source, like ORCID, are either correct for the entry they describe or they are not, but they do not need a reference. In contrast, adding an author’s undergraduate school degree and institution to their Wikidata entry records a fact that should be sourced. In 1979, Lauren Berlant completed a bachelor’s degree at Oberlin College. A fact like this could be sourced from an author’s ORCID profile or from a CV (Curriculum Vitae) that the author posted to a website. In the case of Lauren Berlant, the reference is provided with a link to a University of Chicago news story that memorializes Berlant’s life and work (see Figure 6).
References can also be made from books and other sources using the property stated in (P248). When appropriate, these can also be supplemented with qualifiers, such as quotation (P1683) or the retrieved (P813) date. If using a website as a reference, consider using Internet Archive’s Wayback Machine to find or archive a copy of the cited page. A Wikidata reference can be supplemented with archive URL (P1065) and archive date (P2960). In this way, the statement can be verified even if the original website content changes or moves (see highlighted area in blue in Figure 6).
Activities
- Identify a journal article authored by a scholar from your current or former university. Use the Wikidata search bar to try to find the article and the author in Wikidata. If you are having trouble finding an article, use one of these articles with the subject of gender studies (Q1662673) retrieved from the Wikidata Query Service. What are the missing bibliographic elements (if any)?
- Visit the author’s entry. Do any statements need a reference? Add a missing statement or reference to the article or the author’s entry
Additional Resources
As you continue building your knowledge of Wikidata, some other resources worth checking out include: an Introduction to Wikidata, offered by Wiki Education, provides an overview of the knowledge base while also focusing on key concepts; the article Wikidata: A free collaborative knowledgebase which provides a perspective on the development of Wikidata and what it was originally set to accomplish; as well as the Wikidata in Brief, a one pager which can be used as a quick guide to Wikidata.
References
Help:Namespaces – Wikidata. (n.d.). Wikidata. Retrieved May 17, 2022, from https://www.wikidata.org/wiki/Help:Namespaces
Help:Qualifiers – Wikidata. (n.d.). Wikidata. Retrieved May 16, 2022, from https://www.wikidata.org/wiki/Help:Qualifiers
Lemus-Rojas, M., Pintscher, L. (2017). Wikidata and Libraries: Facilitating Open Knowledge. In M. Proffitt (Ed.), Leveraging Wikipedia: Connecting Communities of Knowledge (pp. 143–158). ALA Editions. https://hdl.handle.net/1805/16690
Wikidata:Property creation – Wikidata. (n.d.). Wikidata. Retrieved May 17, 2022, from https://www.wikidata.org/wiki/Wikidata:Property_creation