​​Frequently Asked Questions

Why is Datamaran different?

​Datamaran is the only analytics platform to analyse unstructured data on regulatory, competitive, and reputational risks related to the latest economic, environmental, social, and corporate governance (ESG) issues – all in one view. We’ve combined our diverse expertise – legal, financial, investment, ESG, and data science – with Natural Language Processing (NLP) techniques to extract unstructured data from a variety of sources (corporate reports, website, regulations, and news), derive meaning from the narrative, and ensure that the data is accurate and complete. This is what we call “automating human expertise.” The end user is and has always been the focus of our work. That’s why we built Datamaran with a global network of end users – large cap companies, advisory firms, and industry experts.

Who uses Datamaran?

​Datamaran is a multi-user platform that delivers relevant data for business executives, their advisors, and researchers via annual web-based subscriptions.
Our main users are business executives from a variety of departments, including: public and governmental affairs; legal and compliance; sustainability; procurement; business development; marketing and communications; human resources; and investor relations.
Multiple users can immediately share insights via the platform, which is accessible from your browser at anytime. Each user has their own login in order to easily customize their analyses on the go.

Where does the data come from?

Datamaran extracts unstructured data from a variety of publicly available sources, including:

  • Corporate annual reports
  • Corporate websites
  • News sites
  • Government websites, regulatory databases, and related sites
  • Our database moves at the speed of your markets, capturing the latest corporate disclosure, regulatory developments, and news
  • to deliver consistent and relevant data. They cover a growing universe of:
  • 7000+ companies worldwide, based on largest market capitalization
  • 4500+ regulations that impact corporate disclosure on ESG issues
  • Multiple online media channels

What does the data tell me?

Datamaran is a comprehensive business intelligence platform that informs decision-making on regulatory, reputational, and competitive risks. It enables:

  • Market and competitor analyses
  • Supply chain monitoring
  • Stakeholder engagement
  • Media sentiment analysis
  • Regulation tracking
  • Collaborative decision-making

How does the technology work?

​We have built a system to process thousands of company reports, using NLP to determine how much a report discusses various non-financial topics which could potentially be material. This list of topics is based on current reporting frameworks, regulations and the news, and continually updated as new issues arise.
Our platform allows users to look in detail at regulations dealing with these issues and mentions in news media and on Twitter, to provide a broader picture around each issue: what types of disclosure might be required, and how often is it being mentioned alongside particular companies in the news.
This data is updated daily so that companies can continuously monitor non-financial issues that are relevant to them and benchmark themselves against peer companies in their sector. This means that corporate decision-makers can ensure that their strategy is based on current, forward-looking and objective data.
Using the storage capability, the processing speed and power of computers to complement human intelligence is fundamental to the rise of computing as a field and the so-called 4th industrial revolution; as the benefits are beginning to be understood by a widespread audience, using data to aid decision making is becoming more and more ubiquitous across all business sectors.
The chances are that it is not a question of if but of how data can be leveraged to help your business do what it does better.

How does Datamaran automate human expertise?

​Our team of legal, industry, and ESG experts worked closely with our data scientists on developing and refining the analytic process behind Datamaran – a process we call “automating human expertise.” This process is guided by our proprietary ontology, which is a dictionary of topics and related key terms that cover economic, environmental, social, and corporate governance issues; it includes a growing collection of topics and terms that reflect current and emerging trends in corporate disclosure, industry standards, regulatory initiatives, and public opinion. The engine behind Datamaran searches continuously for these topics and terms across a variety of sources – corporate reports, websites, regulations, and news – using techniques such as NLP, semantic analysis, and machine learning.

What is the Ontology?

​The first exercise our team of experts went through was to build the Datamaran Ontology. The Ontology is a dictionary of topics and related key terms that the engine searches for. It consists of financial, economic, environmental, social, employment and corporate governance topics. As an example, anti-corruption is a topic in our Ontology. In order to give the most complete overview, the engine will also search for a multitude of related key terms such as corruption and bribery.

The current Ontology searches for over 100 different topics consisting of 6,000+ key terms and a combination of their related terms. Our experts built this Ontology by manually annotating a high number of sources (e.g. sustainability reports, financial reports, SEC-filings, corporate websites, regulations and social/online media) and by analyzing which topics appear in financial and sustainability reporting frameworks. Both HTML as well as PDF sources are analyzed by the Datamaran engine. 

We ensure our ontologies are mapped against the main reporting frameworks and guidelines, including the Global Reporting Initiative, United Nations Global Compact, International Integrated Reporting Council, and Sustainability Accounting Standards Board.  Our ontology and related key terms, and relationships between terms for which the engine searches across the above-mentioned sources, uses techniques such as Natural Language Processing (NLP), semantic analysis, and machine learning; it includes a growing collection of topics.

What does High/Medium and Low mean?

High, Medium and Low are the emphasis scores applied to each topic as it appears in a company’s report. The emphasis takes into account the number of times the topic is mentioned, as well as a number of other variables, such as:

  • number of hits per topic
  • location of topic (e.g. CEO letter, where in the SEC filing etc etc).
  • number of topic mentions per sentence
  • etc.

For each assessment of H/M/L, grammar rules are applied. For instance, if a company states that it does not disclose information on executive compensation – that particular topic is not counted in the analysis.

For each assessment of a topic, the definition provided in the topic description is followed closely. The degree of granularity of a topic can be different:
E.g. the topic greenhouse gases is less granular than greenhouse gases management.
Greenhouse gases looks at: General references to gases that cause the greenhouse effect of atmospheric temperature increases. Greenhouses gases management looks at:
Processes, activities, and/or operations resulting in increasing or decreasing GHG emissions.

Simplified summary of what H/M/L means:

  • High topics are topics that are found a high number of times in a source and/or in key sections of a source.
  • Medium topics are topics that are found a moderate amount of times in a source, or rarely but in a key section.
  • Low topics are topics that appear, but only rarely, in a source.

What is NLP?

​NLP stands for Natural Language Processing. It sits at the intersection of Artificial Intelligence (AI), computational linguistics and computer science. It is the process of making a computer understand the structure and the meaning of language as used by humans.

What’s the difference between machine learning, NLP and AI?

NLP pursues a set of problems within the field of AI that are to do with understanding language.
AI is the process of teaching a machine to do intelligent things – you give it rules and teach it how to play.
Machine Learning is the process of making computers learn from past experience and previous examples.

Why is it important to monitor social media?

​Research from the University of Cambridge has shown that predictive models based on Facebook “likes” are able to predict someone’s personality with more accuracy than their colleagues, friends or family. The only person marginally better at predicting their personality was their spouse! This sort of predictive power can be useful to advertising companies trying to personalise ads or to insurance companies trying to predict how high or low risk you are in terms of a car insurance premium. High power predictive models usually rely on having large amounts of data on many customers, to build up a better picture of the entire personality landscape for example. This is why companies like Netflix and Amazon are able to build successful recommendation systems.

There is sometimes debate around whether bigger data is always better. If the data points you are collecting are relevant to your model and the cost of storing and processing the data is no object then the answer is definitely yes, quite simply because you will reduce the standard error of your model the more data points you sample. If you are trying to build up a picture of the distribution of something you need to sample as many data points as possible, in order to capture more granular details about the lay of the land.

Is AI Outperforming Humans?

​For certain tasks, we are hearing increasing reports of trained computer programs outperforming human experts, due to their objectivity and ability to analyse data in minute detail and in huge quantities. Researchers from the Stanford University School of Medicine used machine-learning to train a computer to accurately differentiate between two types of lung cancer.

They used 2,186 images of lung cancer tissues and due to the system’s pixel level specificity it was able to identify nearly 10,000 traits of the two cancer types, more than can be detected with the human eye. The system also improves on human performance because of its objectivity; detection by humans is inherently subjective and two highly skilled pathologists will often only agree 60% of the time.

Our system can certainly achieve results more quickly than a human performing the same tasks manually. When completing data analysis tasks, it is an extremely clever tool for increasing productivity – we are no longer spending the majority of our time on data collection, but on data analysis and interpretation of results into forward looking decision making.

The most powerful business intelligence tool on the market!


Sign up here to receive our newsletter.