What is the difference between a catalogue and a dataset?
A Catalogue can have two definitions, comprising: 1. data or 2. metadata.
A data Catalogue (or Catalog) is a curated collection of datasets and data services that are managed and published by an organisation or entity.
A data catalogue is a curated collection of metadata about datasets that are managed and published by an organisation or entity.
A Dataset is a collection of data, published or curated by a single source, and available for access or download in one or more formats.
Within the context of the EHDS (European Health Data Space) Regulation proposal [EUR-Lex 52022PC0197 (Art.44)], access to such datasets must adhere to principles of data minimisation and purpose limitation. This ensures that only the data relevant and necessary for specific processing purposes is provided. This data can be in either anonymised or pseudonymised format depending on the feasibility of achieving the processing objectives.
Related Articles
In which order should I insert datasets and catalogues?
The catalogue needs to be created first, afterwards a dataset can be created and linked to a catalogue. The relationship between a catalogue and a dataset is “one to many”, this means that a catalogue can comprise multiple datasets but a dataset can ...
How should 'time to access' be calculated for a dataset with multiple components that have different access times (e.g., a mix of instantly available open data and restricted data that requires approval)?
For a complex dataset with components that have different access times, the reported "time to access" should always be the longest time required. In other words, the metric must represent the total time until all requested components of the dataset ...
Will we be provided with a dataset to test the QUANTUM quality tool, or will we use a dataset from our organization?
No, you need to employ one of your own.
Is there any difference between the online tool and the docker one?
No, the use of one or the other is going to depend on your knowledge and experience.
Is the Quality Label agnostic to data type, or is it different depending on the type of dataset (tabular, image, text, genomic...)?
It is agnostic to data type data usage and data users.