Data Stewardship

The role of the Data Steward (custodian; manager) is to help improve understanding of data, discover relationships among the data, consolidate metadata, and transform data into information. His main goal is to support excellent research with excellent data.

Exploratory/Exploitative Data Analysis (E/EDA)
We manage data requests through our Exploratory/Exploitative Data Analysis (E/EDA) method which is based on the exploratory data analysis framework described in Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining. The E/EDA steps are:

  1. Problem definition
    1. define the objectives
    2. define the deliverables
    3. define roles and responsibilities
    4. assess the current situation
    5. define the timetable
    6. perform a cost/benefit analysis
  2. Data preparation
  3. Implementation of the analysis
  4. Deployment of results

Open Source + Open Archive + Open Innovation
We aim to share the benefits of "Open-ness" with our partners. Whenever a project or client does not explicitly define ownership, we will select a Creative Commons license for the data. Our partners are encouraged to store their datasets with the Dataverse Network Project (DVN); DVN increases scholarly recognition, facilitates data access and analysis, and ensures long-term preservation whether or not the data are in the public domain.

Data catalogue
Our ever growing data "collection" consists of Amadeus, Company.Info, Compustat, COMETS, COMTRADE, Corporate Affiliations, CorpWatch, Datastream, EEE-PPAT, Eurostat, NBER, NGAGNS, OECD HAN, OECD REGPAT, OECD TPF, PATSTAT, PubMed, Scopus, SDC Platinum, UNdata, USPTO, and Web of Science.

Special note should go to the Global Business Register (GBR). GBRDirect provides a single access point to official information on companies around Europe. The information is sourced directly from the official national business registers of each European country. They have graciously allowed me access to their API, and I believe that their data will be pivotal in merging databases such as PATSTAT and SDC.