Jisc gateway to text and data-mining – status update
« Over the past few months, we have been investigating the opportunities for a possible Jisc-delivered text and data mining (TDM) service and analysing options for how we might do so, initially across two existing Jisc services: CORE and Journals Archive. These two platforms already deliver significant amounts of digital content, in the form of scholarly articles, and their combined corpus could immediately facilitate new research opportunities and lines of enquiry if text mining techniques were applied to it.
Currently referred to as the Jisc Gateway to Text and Data Mining (or JGTDM), the scoping phase of a potential service has completed, so now seems like a good point to take stock and to communicate about what we have been doing.
…
The way ahead
The goal of Jisc Gateway to Text and Data Mining is to provide a solution which supports both experienced and novice practitioners of TDM – a service that is intuitive and easy to understand, yet with capabilities sufficient to be of real value to the majority of users.
Whilst still in its formative stages, a broad approach has been identified, which seeks to combine three key elements:
– A mechanism allowing users to define and create their own, task-specific corpora from a ‘pool’ dataset created from CORE and Journal Archives.
– A workflow environment in which corpora can be interrogated and processed using TDM components from an available toolkit. To begin with, this range of tools will be limited and will concentrate on providing those most widely-used.
– Training materials and helpdesk assistance, as well as mechanisms to share experiences and best practices, in order to encourage and support a community of practitioners.
… »