Workflow Overview
Introduction to the Conversion & Publication Workflow
At LINCS, the process of creating LOD is called the conversion workflow. There are various paths through this workflow. The path you take depends on the structure and size of your dataset, your timeline, and your research goals. Finally, the data is published and made available to the public.
LINCS has a team of experts to help your Research Team through this process.
For each step, we outline what communication you are likely to have with LINCS. Often, there will be a check-in with us at the start of a step, especially if you require any personalized advice, and at the end of each step to double check that your output is correct.
Although these steps are presented as an ordered list, this is an iterative process. Your Research Team can expect to have regular meetings with LINCS team members to discuss the conversion process and work collaboratively.
1. Export Your Existing Data
Prepare a version of the data that is easy to share and work with. Learn about exporting data.
2. Clean Your Data
Dat cleaning is an important step in data management regardless of whether you are making LOD. It ensures your data is internally consistent, correctly formatted, and complete. Learn about cleaning data.
3. Prepare Your Metadata: Name the Dataset & Create Keywords
The dataset title is the title LINCS uses to refer to a project’s dataset as it appears in LINCS tools and documentation. Keywords help users explore the data. Learn about naming the dataset. and creating keywords
4. Reconcile Entities
Entity reconciliation, also called entity linking and named entity disambiguation, is the step where we add unique identifiers in the form of URIs to your data to represent each unique entity. The goal is to use the same identifier every time that the same real-world thing is mention in your data, other LINCS data, and, ideally, linked data elsewhere on the web. Learn about the reconciliation process.
5. Map Existing Data: Develop & Implement Conceptual Mapping
Every incoming dataset starts with a unique structure and use of terms. To get all of this unique data to connect as Linked Open Data (LOD), each dataset needs to use the same ontology. In this step, LINCS develops a mapping that basically gives us instructions on how each relationship in the original data should look as Resource Description Framework (RDF) triples. Learn about conceptual mapping.
Next we implement the conceptual mapping. In this step we finally convert your data from its original structure and ontology into LINCS RDF. Learn about implementing conceptual mapping.
6. Publish Your Data
Congratulations! You have transformed your data into Linked Open Data!
After errors have been spotted and changes have been made, the final version of the dataset will be uploaded to the LINCS triplestore. The final dataset will now be publicly accessible via ResearchSpace as published LOD. This data can be used in publications and shared with others who want to use and connect to the data, except in limited, mutually agreed-upon circumstances. Learn about setting up and publishing your data in ResearchSpace.
7. Edit Your Data & Transform More Data
After publishing your data, you can still add more data and edit your data. Your Research Team can make changes to the data directly in ResearchSpace. This will affect the version of the dataset that is in the LINCS triplestore, which means that the conversion workflow does not need to be repeated.
If your Research Team wants to add more data after the conversion process, LINCS can rerun the data conversion without repeating the consultation process if the new data has the exact same structure as the initial data. If the new data does not have the same structure, the conversion process will need to be altered and repeated. Steps like reconciliation will always need to be redone if there are new entities in the new data. Note that the new data can then be merged with the existing project or can be made into a new, separate project. Learn about converting additional data and editing data.
Timelines
The time needed to complete the full conversion and publication process varies dramatically based on these factors:
- How clean is your source data?
- How many entities are in your data? How many entities need to be de-duplicated internally or reconciled externally?
- What is the structure of your source data? The TEI and natural language workflows can be faster because they are more automated and less customized than the structured and semi-structured workflows.
- How many unique types of relationships are represented in your data?
- How much time does your team have to dedicate to the process? Is this continuous or will there be pauses based on the academic calendar?
- How many projects is LINCS supporting at the same time as yours?
A small dataset being converted by an experienced team with dedicated time could get through the whole conversion process in a few weeks. However, most projects working with LINCS tend to take between 6 to 12 months, factoring in time to learn tools, busy schedules, and consultation between the Research Team and LINCS.
We have included time estimates for each step in the conversion workflow documentation. Remember that it does not have to be done all in one go. There is value in completing many of the steps on their own, like cleaning or reconciling your data, and slowly working towards all of the benefits of LOD.
Next Step
There is a different process for transforming different types of data into Linked Open Data. Whether you are starting with structured or semi-structured data, TEI data, or Natural Language Data, there is a customized workflow. Review the workflow that matches your needs.