5 Selected Tools for Using Wikidata
Jere Odell; Mairelys Lemus-Rojas; and Lucille Brys
Introduction
As a project which invites online users to make contributions in an effort to document human knowledge, Wikidata continues to evolve. As the site grows, so does the need for using programmatic ways for cleaning, enhancing, and contributing knowledge as well as for finding mechanisms to display and browse data. Wikidata adheres to the FAIR (Findability, Accessibility, Interoperability, Reusability) guiding principles and as such is positively impacting the ecosystem of open knowledge. Among other things, this has made it possible for users to find strategies and develop solutions for the benefit of the global Wikidata community.
People interested in scholarly communication have a unique opportunity to contribute scholarly content to Wikidata that can then be used by external tools and web-based applications for generating scholarly profiles. This work aligns with Wikicite’s project goal of building a linked, open bibliographic database. Descriptions for scholarly works and their creators can be provided by either making manual contributions directly in the knowledge base, or by using external tools and applications to semi-automate the process. A growing number of tools created by Wiki contributors are available for all users to interact with Wikidata. These include those that can be used to facilitate the editing, querying, and visualization of data, as well as others that provide mechanisms for expediting specific tasks in Wikidata’s interface, and even some that are geared toward more advanced users (Wikidata:Tools – Wikidata, n.d.).
There are currently three main tools that can facilitate the contribution of scholarly communication related data to the site. While these tools can greatly reduce the time spent working on each entry, there is still a need for human curation to ensure that concepts are accurately represented, described, and linked. The QuickStatements, SourceMD, and AuthorDisambiguator tools can aid in building a biographic and bibliographic dataset more programmatically. On the other hand, Scholia is highly effective in the rendering of scholarly profiles using the data from Wikidata.
Tools for Contributing Data
QuickStatements
The QuickStatements tool, developed by Magnus Manske—a long-time Wikimedia contributor—was awarded the Coolest Tool Award in the Editor Category during the 2019 Wikidata conference. Quickstatements facilitates the contribution of new content to and enhancements and removal of existing statements in Wikidata.
How to use it?
While users can interact with the QuickStatements tool in many different ways, in order to do so, it is necessary to be logged in using Wikimedia-related credentials. By logging in, users agree to allow the tool to make changes in Wikidata on their behalf.
Once logged in, users can click on “New Batch” which will take them to a new page to create a batch of work. There are currently two options for doing this: the direct input of commands in the editor area or the pasting of commands created in an external sheet or text editor. Once either of these operations are performed, the button for “Import V1 commands” or “Import CSV commands” should be clicked for them to be imported into the tool. The following screen would then be an opportunity to take a quick look at the data as the properties will have their corresponding labels before clicking on either the “Run” or “Run in background” buttons for the commands to be executed. The first option will run the commands from the user’s web browser whereas the second will do so from a Wikimedia server. Running a batch in the background is advantageous as it allows users to revert batches, if needed, and run multiple batches at a time.
When performing a batch upload of data to enhance existing entries in Wikidata, the tool checks the data against the knowledge base and does not add statements which are exact matches. Those commands will be skipped during the process. However, if there are data missing from any existing statement, for instance references, then that piece of information is added. The tool provides access to a Help page with documentation and examples for new users to familiarize themselves with its functionality.
SourceMD
With the SourceMD tool, or Source Metadata tool, which was also developed by Magnus Manske, users can contribute publications directly to Wikidata using persistent identifiers. The tool accepts identifiers for PMID (PubMed ID), DOI (Digital Object Identifier), and PMCID (PubMed Central ID).
An enhanced version of the SourceMD tool (not currently working) is also available. The functionalities in this newer version include the ability to add authors to existing publications in the knowledge base, add metadata from researchers with an ORCID (Open Researcher and Contributor ID) profile to Wikidata, create/amend papers for authors with ORCID profiles, and create entries for books based on ISBNs (International Standard Book Number).
How to use it?
Using the SourceMD tool is straightforward and does not require, at the moment, any credentials. For this part of the process, all that is needed is to enter the identifiers for the resource that one wants to represent in Wikidata, one per line. The tool then checks the identifiers against CrossRef, pulls the metadata associated with it, and transforms it into a series of commands that can be executed using the QuickStatements tool. These commands use properties’ unique identifiers (P numbers) with their corresponding values. Clicking on “Open in QuickStatements” will open up a page in the QuickStatements tool with the data which will include the properties with their appropriate labels and the values for the statement. In QuickStatements, authentication is required in order to interact with the tool. After authenticating, the buttons “Run” or “Run in background” become available at the bottom of the page.
A page with instructions on how to use the tool and screenshots illustrating the process is available for users to consult at the Wikidata SourceMD Instructions page.
Author Disambiguator
The Author Disambiguator tool was developed by Arthur P. Smith—a Wikimedia contributor— to help facilitate the linking of authors to their works in Wikidata. The use of boots and/or batch uploads of articles to Wikidata can make it challenging for disambiguating and adding the names of authors to their corresponding works. Instead, the author’s name is stored in the author name string (P2093) property to be later disambiguated and linked. The Author Disambiguator works by providing results for users to match authors based on the data currently in Wikidata, removing the value in the author name string (P2093) and adding the author’s name under the author (P50) property. If there were any qualifiers or references present in the author name string (P2093) statement, then that information is also moved over to the author (P50) statement.
How to use it?
To use the tool, users can log in with their Wikimedia credentials by clicking on “Log in to your Wikimedia account to use OAuth instead of Quickstatements for updates.” The search field for the name of the author includes a help text suggesting the order in which the name should be entered: First Last. Once a name is added, the options to search for that name include: “Fuzzy match,” “Wikibase search,” and “Specify name strings.” A fuzzy match helps in finding the most works to match as it parses the full name to search for different variations. The Wikibase search looks for string matches that include all parts of the name as entered and some additional variations, as applicable, that also include the full name as used. Lastly, when the option to specify name strings is selected, the tool displays on a text box a number of variations for the name entered which can be adjusted as needed. This way, there is more flexibility to search for all the possible variants of a name.
After one of these options is selected, if works published by the author being searched are available in Wikidata and are in need of matching, the tool will display a list of the potential publications. Manual curation is needed at this point to identify if the publications are indeed connected to the author. If you are not familiar with the work of the particular author, one way to ensure they have authored the work is by accessing the publication via the identifier presented for the work. On the publisher/aggregator site, there is usually metadata available about the item which might include the author’s affiliation (if working on connecting faculty/researchers with their works). This can also be useful for helping select the name of the author at the bottom of the page in the Author Disambiguator tool to select the author’s name from the list of potential author items presented. Once the works and the author is selected, the linking of the items happen when clicking on “Link selected works to author.” Edits will then be made to the publications selected to connect them to their corresponding authors. For a more detailed explanation of the tool’s features, visit the Author Disambiguator page in Wikidata.
Tool for Visualizing Data
Scholia
Developed by Finn Årup Nielsen—an active Wikimedia contributor—Scholia is an open source web-based application that can bring together scholarly data and present them in a way that is easier for users to understand and navigate. The application makes live SPARQL queries to Wikidata which means that it renders scholarly profiles based on the most recent data available in the knowledge base. It serves as a venue for scholars and other users to explore the scholarly record available in Wikidata while also facilitating the identification of missing pieces of information.
Since the tool uses data stored in Wikidata to render the profiles, in order to make any enhancements to them, contributions need to be made directly to Wikidata. In other words, for a profile to display scholarly publications associated with a person, those publications need to be in Wikidata as well as an entry for their author. Also, works and the corresponding authors need to be connected through the use of the author (P50) property in the entry for the publication. The tools described earlier, QuickStatements, SourceMD, and Author Disambiguator are excellent options for contributing bibliographic data to Wikidata and for establishing the connections between the works and their authors.
The value that Scholia brings to the ecosystem of open knowledge is being recognized by the adoption of its template in Wikimedia projects. The Scholia template points users to the profile generated by the application for the particular subject (see Figure 8) providing an opportunity to explore their scholarship output.
How can a scholarly communication program use Scholia?
Scholia is a powerful tool for interacting with bibliographic data stored in Wikidata. In the context of scholarly communication, Scholia facilitates analysis, comparisons, and interpretations with the rendering of scholarly profiles. It also identifies areas where the data can be improved for a particular subject and provides links out to the tools that can be used to facilitate the contributions.
The tool has benefited from the support of other Wikimedian contributors who are actively reporting bugs and suggesting enhancements to improve the discovery and rendering of scholarly profiles. Currently, the tool is able to render profiles for researchers, topics, publications, organizations, awards, events, and other concepts. A search for a researcher will bring up all the information Wikidata has on the person as well as a snippet of their Wikipedia article, if one exists. For instance, the following aspects will form a profile: List of publications (number of publications per year, number of pages per year), Topics (topic scores, topics of authored works, topics-works matrix), Venue statistics, Review Statistics, Co-author graph, Co-author map, Other locations (includes associated locations), Timeline (includes information on education, employment, first published work, latest published work, most cited work), Academic tree, Citation statistics (most cited words, citations by year, citing authors), Associated images, and Events. Every individual aspect is live-queried providing the most recently available information in Wikidata. While each aspect has a default for presenting the data, most of them can be changed within the tool (much like one would do when running a query in the Wikidata Query Service). For instance, the Citations by year aspect which by default presents the data in a bar chart can be changed to any of the other options available (table, graph builder, line chart, scatter chart, area chart, bubble chart, tree map, dimensions), as appropriate. Similarly, some of the aspects can be edited to affect their results and the data can be downloaded for further analysis or to embed it in another application.
Profiles rendered for individual publications provide a different set of aspects that help users better understand the impact of the works. These include a List of authors, Topic scores, Timeline, Related works (related works from co-citation analysis, related works from knowledge graph embedding), Citations (citations to the work, cited works, authors of cited works, citation graph, citations per year), Wikipedia mentions, and Supports the following statement(s).
Other Useful Tools Worth Exploring
OpenRefine
OpenRefine (previously called GoogleRefine), a tool for cleaning up data, is actively used by the Wiki community as it provides a Wikidata reconciliation service. It also facilitates the direct contribution to Wikidata or through the preparation of a Wikidata schema to then use an external tool for contributing the data to Wikidata. Documentation for how to use this service can be found in the OpenRefine user manual.
Zotero’s Utilities
Wikidata Translator
The Wikidata Translator was developed for two purposes: to understand the data in Wikidata for publications so that they can be pulled into the user’s Zotero library, and to interact with the QuickStatements tool in order to facilitate the creation of entries in Wikidata for the publications stored in the user’s Zotero library. More information can be found on the Wikidata Zotero page.
Cita
Cita is an add-on developed to facilitate citation support for Zotero. It uses information from the cites work (P2860) property in Wikidata and allows the contribution of data back to the knowledge base. More information can be found on the Wikidata Zotero/Cita page.
Activities
Use Scholia to find an author in Wikidata from your current or former university.
- Does Scholia show a work authored by this researcher? Compare the Scholia record to another profile for the author (e.g., Google Scholar, ORCID, a personal webpage, a posted CV).
- If the author’s profile is missing an article that includes an identifier (DOI or PMID), use the SourceMD tool to add this article to Wikidata.
Additional Resources
To get a better understanding of how Scholia can be used for generating profiles and how it interacts with external tools to further improve Wikidata’s data consider watching the presentation, Curating bibliographic metadata of scholarly publications: demo of Wikidata-based Scholia workflows. Moreover, an OpenRefine tutorial, the book chapter Structuring Bibliographic References: Taking the Journal Anais do Museu Paulista to Wikidata focusing on the use of the Zotero translator, and a Cita presentation workshop can all provide additional information on how these tools are employed.
References
Wikidata:Tools – Wikidata (n.d.). Wikidata. Retrieved May 16, 2022, from https://www.wikidata.org/wiki/Wikidata:Tools