Incorporating provenance in social knowledge collection
We are exploring the use of provenance in social knowledge collection in the context of science.
Scientific datasets need metadata descriptions that provide key documentation for what the data is and how to use it, such as the location where they were collected and the type of sensor used. Scientists prefer to define themselves the metadata that they want to specify, rather than being required to provide what is pre-defined as required by traditional scientific data catalogs. Often times scientists have no incentives to invest the effort required to formally and fully describe their datasets.
We are investigating the use of semantic wikis as a platform for scientific communities to create and converge organically on metadata properties that suit their needs. We are also investigating "organic data publishing" as a paradigm to support data sharing as a collective activity that is integrated with other activities in scientific research, such as the joint formulation of shared science questions and their pursuit through shared workflows for data analysis. This approach is consistent with recent trends to make scientific software and data more open and broadly accessible across disciplines, as well as open to volunteer contributors and to citizen scientists.
A key aspect of this work is the credit to contributors through provenance and the development of proactive mechanisms to encourage structure and convergence. Our organic data science approach can benefit other Semantic MediaWiki projects for social knowledge collection, particularly those focusing on big data integration and analysis.
This work is reported in the following publications:
* “Organic Data Sharing: A Novel Approach to Scientific Data Sharing.” Gil, Y.; Ratnakar, V.; and Hanson, P. In Second International Workshop on Linked Science: Tackling Big Data (LISC), held in conjunction with the International Semantic Web Conference (ISWC), Boston, MA, 2012. Available as a preprint.