An enterprise may wish to offer some of the services from the business catalog in an on-demand, self-service model. Catalogs: Defining Standardized Database. Data Catalog Vocabulary (DCAT) W3C Recommendation 16 January 2014 This version: Latest published version: Implementation report: Previous version: Editors:,, Contributors: Please refer to the, a list of issues with this document discovered after publication. This document is also available in this non-normative format: The English version of this specification is the only normative version. Non-normative may also be available. © 2012-2014 ® (,,, ), All Rights Reserved. W3C, and rules apply. Abstract DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation. Status of This Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the at This document has been reviewed by W3C Members, by software developers, and by other W3C groups and interested parties, and is endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web. The was developed at DERI, refined by the, and then finally standardized by the Working Group. DCAT incorporates terms from pre-existing vocabularies, where stable terms with appropriate meanings could be found, such as foaf:homepage and dct:title. Informal summary definitions of these terms are included here for convenience, while complete definitions are available in the provided authoritative references. Changes to definitions in those references, if any, will supersede the summaries given in this specification. Note that conformance to DCAT (Section 3) concerns usage of only the terms in the DCAT namespace itself, so possible changes to the external definitions will not affect conformance of DCAT implementations. This document was published by the as a Recommendation. If you wish to make comments regarding this document, please send them to (, ). All comments are welcome. Please see the Working Group's. This document was produced by a group operating under the. W3C maintains a made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains must disclose the information in accordance with. Table of Contents • • • • • • • • • • • • • • • • • • • • • • 1. Introduction This section is non-normative. Data can come in many formats, ranging from spreadsheets over XML and RDF to various speciality formats. DCAT does not make any assumptions about the format of the datasets described in a catalog. Other, complementary vocabularies may be used together with DCAT to provide more detailed format-specific information. For example, properties from the VoID vocabulary [ ] can be used to express various statistics about a DCAT-described dataset if that dataset is in RDF format. This document does not prescribe any particular method of deploying data expressed in DCAT. DCAT is applicable in many contexts including RDF accessible via SPARQL endpoints, embedded in HTML pages as RDFa, or serialized as e.g. RDF/XML or Turtle. The examples in this document use Turtle simply because of Turtle's readability. Namespaces The namespace for DCAT is However, it should be noted that DCAT makes extensive use of terms from other vocabularies, in particular. DCAT itself defines a minimal set of classes and properties of its own. A full set of namespaces and prefixes used in this document is shown in the table below. Prefix Namespace dcat dct dctype foaf rdf rdfs skos vcard xsd 3. Conformance As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative. The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [ ]. A data catalog conforms to DCAT if: • It is organized into datasets and distributions. • An RDF description of the catalog itself and its datasets and distributions is available (but the choice of RDF syntaxes, access protocols, and access policies is not mandated by this specification). • The contents of all metadata fields that are held in the catalog, and that contain data about the catalog itself and its dataset and distributions, are included in this RDF description, expressed using the appropriate classes and properties from DCAT, except where no such class or property exists. • All classes and properties defined in DCAT are used in a way consistent with the semantics declared in this specification. • DCAT-compliant catalogs MAY include additional non-DCAT metadata fields and additional RDF data in the catalog's RDF description. A DCAT profile is a specification for data catalogs that adds additional constraints to DCAT. A data catalog that conforms to the profile also conforms to DCAT. Additional constraints in a profile MAY include: • A minimum set of required metadata fields • Classes and properties for additional metadata fields not covered in DCAT • Controlled vocabularies or URI sets as acceptable values for properties • Requirements for specific access mechanisms (RDF syntaxes, protocols) to the catalog's RDF description 4. Vocabulary Overview This section is non-normative. DCAT is an RDF vocabulary well-suited to representing government data catalogs such as and. DCAT defines three main classes: • represents the catalog • represents a dataset in a catalog. • represents an accessible form of a dataset as for example a downloadable file, an RSS feed or a web service that provides the data. Notice that a dataset in DCAT is defined as a 'collection of data, published or curated by a single agent, and available for access or download in one or more formats'. A dataset does not have to be available as a downloadable file. For example, a dataset that is available via an API can be defined as an instance of dcat:Dataset and the API can be defined as an instance of dcat:Distribution. DCAT itself does not define properties specific to APIs description. These are considered out of the scope of this version of the vocabulary. Nevertheless, this can be defined as a profile of the DCAT vocabulary. Another important class in DCAT is which describes a dataset entry in the catalog. Notice that while dcat:Dataset represents the dataset itself, dcat:CatalogRecord represents the record that describes a dataset in the catalog. The use of the CatalogRecord is considered optional. It is used to capture provenance information about dataset entries in a catalog. If this distinction is not necessary then CatalogRecord can be safely ignored. All RDF examples in this document are written in Turtle syntax [ ]. 4.1 Basic Example This example provides a quick overview of how DCAT might be used to represent a government catalog and its datasets. First, the catalog description::catalog a dcat:Catalog; dct:title 'Imaginary Catalog'; rdfs:label 'Imaginary Catalog'; foaf:homepage; dct:publisher:transparency-office; dct:language; dcat:dataset:dataset-001,:dataset-002,:dataset-003;. The publisher of the catalog has the relative URI:transparency-office. Further description of the publisher can be provided as in the following example::transparency-office a foaf:Organization; rdfs:label 'Transparency Office';. The catalog lists each of its datasets via dcat:dataset property. In the example above, an example dataset was mentioned with the relative URI:dataset-001. A possible description of it using DCAT is shown below::dataset-001 a dcat:Dataset; dct:title 'Imaginary dataset'; dcat:keyword 'accountability','transparency','payments'; dct:issued '2011-12-05'^^xsd:date; dct:modified '2011-12-05'^^xsd:date; dcat:contactPoint; dct:temporal; dct:spatial; dct:publisher:finance-ministry; dct:language; dct:accrualPeriodicity; dcat:distribution:dataset-001-csv;. In order to express frequency of update in the example above, we chose to use an instance from the developed as part of the W3C Data Cube Vocabulary efforts. Additionally, we chose to describe the spatial and temporal coverage of the example dataset using URIs from and from data.gov.uk, respectively. A contact point is also provided where comments and feedback about the dataset can be sent. Further details about the contact point, such as email address or telephone number, can be provided using VCard [ ]. The dataset distribution:dataset-001-csv can be downloaded as a 5Kb CSV file. This information is represented via an RDF resource of type dcat:Distribution.:dataset-001-csv a dcat:Distribution; dcat:downloadURL; dct:title 'CSV distribution of imaginary dataset 001'; dcat:mediaType 'text/csv'; dcat:byteSize '5120'^^xsd:decimal;. 4.2 Classifying datasets The catalog classifies its datasets according to a set of domains represented by the relative URI:themes. SKOS can be used to describe the domains used::catalog dcat:themeTaxonomy:themes.:themes a skos:ConceptScheme; skos:prefLabel 'A set of domains to classify documents';.:dataset-001 dcat:theme:accountability. Notice that this dataset is classified under the domain represented by the relative URI:accountability. It is recommended to define the concept as part of the concepts scheme identified by the URI:themes that was used to describe the catalog domains. An example SKOS description::accountability a skos:Concept; skos:inScheme:themes; skos:prefLabel 'Accountability';. 4.3 Describing catalog records metadata If the catalog publisher decides to keep metadata describing its records (i.e. The records containing metadata describing the datasets), dcat:CatalogRecord can be used. For example, while:dataset-001 was issued on 2011-12-05, its description on Imaginary Catalog was added on 2011-12-11. This can be represented by DCAT as in the following::catalog dcat:record:record-001.:record-001 a dcat:CatalogRecord; foaf:primaryTopic:dataset-001; dct:issued '2011-12-11'^^xsd:date;. 4.4 A dataset available only behind some Web page:dataset-002 is available as a CSV file. However:dataset-002 can only be obtained through some Web page where the user needs to click some links, provide some information and check some boxes before accessing the data:dataset-002 a dcat:Dataset; dcat:landingPage; dcat:distribution:dataset-002-csv;.:dataset-002-csv a dcat:Distribution; dcat:accessURL; dcat:mediaType 'text/csv';. Notice the use of dcat:landingPage and the definition of the dcat:Distribution instance. 4.5 A dataset available as download and behind some Web page On the other hand,:dataset-003 can be obtained through some landing page but also can be downloaded from a known URL.:dataset-003 a dcat:Dataset; dcat:landingPage; dcat:distribution:dataset-003-csv;.:dataset-003-csv a dcat:Distribution; dcat:downloadURL. Dcat:mediaType 'text/csv';. Notice that we used dcat:downloadURL with the downloadable distribution and that the other distribution through the landing page does not have to be defined as a separate dcat:Distribution instance. Vocabulary specification The definitions (including domain and range) of terms outside the dcat namespace are provided here only for convenience and must not be considered normative. The authoritative definitions of these terms are in the corresponding specifications: [ ], [ ], [ ], [ ], [ ] and [ ]. 5.1 Class: Catalog The following properties are recommended for use on this class:,,,,,,,,,,,, RDF Class: Definition: A data catalog is a curated collection of metadata about datasets. Usage note: Typically, a web-based data catalog is represented as a single instance of this class. See also:, Property: title RDF Property: Definition: A name given to the catalog. Range: Property: description RDF Property: Definition: A free-text account of the catalog. Range: Property: release date RDF Property: Definition: Date of formal issuance (e.g., publication) of the catalog. Range: encoded using the relevant and typed using the appropriate XML Schema datatype [ ] See also:, and Property: update/modification date RDF Property: Definition: Most recent date on which the catalog was changed, updated or modified. Range: encoded using the relevant and typed using the appropriate XML Schema datatype [ ] See also:, and Property: language RDF Property: Definition: The language of the catalog. This refers to the language used in the textual metadata describing titles, descriptions, etc. Of the datasets in the catalog. Range: Resources defined by the Library of Congress (, ) SHOULD be used. If a ISO 639-1 (two-letter) code is defined for language, then its corresponding IRI SHOULD be used; if no ISO 639-1 code is defined, then IRI corresponding to the ISO 639-2 (three-letter) code SHOULD be used. Usage note: Multiple values can be used. The publisher might also choose to describe the language on the dataset level (see ). Property: homepage RDF Property: Definition: The homepage of the catalog. Range: Usage note: is an inverse functional property (IFP) which means that it should be unique and precisely identify the catalog. This allows smushing various descriptions of the catalog when different URIs are used. Property: publisher RDF Property: Definition: The entity responsible for making the catalog online. Usage note: Resources of type are recommended as values for this property. See also: Property: spatial/geographic RDF Property: Definition: The geographical area covered by the catalog. Range: Property: themes RDF Property: Definition: The knowledge organization system (KOS) used to classify catalog's datasets. Domain: Range: Property: license RDF Property: Definition: This links to the license document under which the catalog is made available and not the datasets. Even if the license of the catalog applies to all of its datasets and distributions, it should be replicated on each distribution. Range: See also:, Property: rights RDF Property: Definition: This describes the rights under which the catalog can be used/reused and not the datasets. Even if theses rights apply to all the catalog datasets and distributions, it should be replicated on each distribution. Range: See also:, Property: dataset RDF Property: Definition: A dataset that is part of the catalog. Sub property of: Domain: Range: Property: catalog record RDF Property: Definition: A catalog record that is part of the catalog. Domain: Range: 5.2 Class: Catalog record The following properties are recommended for use on this class:,,,, RDF Class: Definition: A record in a data catalog, describing a single dataset. Usage note This class is optional and not all catalogs will use it. It exists for catalogs where a distinction is made between metadata about a dataset and metadata about the dataset's entry in the catalog. For example, the publication date property of the dataset reflects the date when the information was originally made available by the publishing agency, while the publication date of the catalog record is the date when the dataset was added to the catalog. In cases where both dates differ, or where only the latter is known, the publication date should only be specified for the catalog record. Notice that the W3C PROV Ontology [ ] allows describing further provenance information such as the details of the process and the agent involved in a particular change to a dataset. See also If a catalog is represented as an RDF Dataset with named graphs (as defined in [ ]), then it is appropriate to place the description of each dataset (consisting of all RDF triples that mention the dcat:Dataset, dcat:CatalogRecord, and any of its dcat:Distributions) into a separate named graph. The name of that graph should be the IRI of the catalog record. Property: title RDF Property: Definition: A name given to the record. Range: Property: description RDF Property: Definition: free-text account of the record. Range: Property: listing date RDF Property: Definition: The date of listing the corresponding dataset in the catalog. Range: encoded using the relevant and typed using the appropriate XML Schema datatype [ ] Usage note: This indicates the date of listing the dataset in the catalog and not the publication date of the dataset itself. See also: Property: update/modification date RDF Property: Definition: Most recent date on which the catalog entry was changed, updated or modified. Range: encoded using the relevant and typed using the appropriate XML Schema datatype [ ] Usage note: This indicates the date of last change of a catalog entry, i.e. The catalog metadata description of the dataset, and not the date of the dataset itself. See also: Property: primary topic RDF Property: Definition: Links the catalog record to the dcat:Dataset resource described in the record. Usage note: property is functional: each catalog record can have at most one primary topic i.e. Describes one dataset. 5.3 Class: Dataset The following properties are recommended for use on this class:,,,,,,,,,,,,,,, RDF Class: Definition: A collection of data, published or curated by a single agent, and available for access or download in one or more formats. Sub class of: Usage note: This class represents the actual dataset as published by the dataset publisher. In cases where a distinction between the actual dataset and its entry in the catalog is necessary (because metadata such as modification date and maintainer might differ), the class can be used for the latter. See also: Property: title RDF Property: Definition: A name given to the dataset. Range: Property: description RDF Property: Definition: free-text account of the dataset. Range: Property: release date RDF Property: Definition: Date of formal issuance (e.g., publication) of the dataset. Range: encoded using the relevant and typed using the appropriate XML Schema datatype [ ] Usage note: This property should be set using the first known date of issuance. Property: update/modification date RDF Property: Definition: Most recent date on which the dataset was changed, updated or modified. Range: encoded using the relevant and typed using the appropriate XML Schema datatype [ ] Usage note: The value of this property indicates a change to the actual dataset, not a change to the catalog record. An absent value may indicate that the dataset has never changed after its initial publication, or that the date of last modification is not known, or that the dataset is continuously updated. See also: Property: language RDF Property: Definition: The language of the dataset. Range: Resources defined by the Library of Congress (, ) SHOULD be used. If a ISO 639-1 (two-letter) code is defined for language, then its corresponding IRI SHOULD be used; if no ISO 639-1 code is defined, then IRI corresponding to the ISO 639-2 (three-letter) code SHOULD be used. Usage note: • This overrides the value of the in case of conflict. • If the dataset is available in multiple languages, use multiple values for this property. If each language is available separately, define an instance of dcat:Distribution for each language and describe the specific language of each distribution using dct:language (i.e. The dataset will have multiple dct:language values and each distribution will have one of these languages as value of its dct:language property). Property: publisher RDF Property: Definition: An entity responsible for making the dataset available. Usage note: Resources of type are recommended as values for this property. See also: Property: frequency RDF Property: Definition: The frequency at which dataset is published. Range: (A rate at which something recurs) Property: identifier RDF Property: Definition: A unique identifier of the dataset. Range: Usage note: The identifier might be used as part of the URI of the dataset, but still having it represented explicitly is useful. Property: spatial/geographical coverage RDF Property: Definition: Spatial coverage of the dataset. Range: (A spatial region or named place) Property: temporal coverage RDF Property: Definition: The temporal period that the dataset covers. Range: (An interval of time that is named or defined by its start and end dates) Property: theme/category RDF Property: Definition: The main category of the dataset. A dataset can have multiple themes. Sub property of: Domain: Range: Usage note: The set of s used to categorize the datasets are organized in a describing all the categories and their relations in the catalog. See also: Property: keyword/tag RDF Property: Definition: A keyword or tag describing the dataset. Domain: Range: Property: contact point RDF Property: Definition: Link a dataset to relevant contact information which is provided using VCard [ ]. Domain: Range: Property: dataset distribution RDF Property: Definition: Connects a dataset to its available distributions. Domain: Range: Property: landing page RDF Property: Definition: A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information. Sub property of: Domain: Range: Usage note: If the distribution(s) are accessible only through a landing page (i.e. Direct download URLs are not known), then the landing page link SHOULD be duplicated as accessURL on a distribution. (see ) 5.4 Class: Distribution The following properties are recommended for use on this class:,,,,,,,,,, RDF class: Definition: Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed Usage note: This represents a general availability of a dataset it implies no information about the actual access method of the data, i.e. Whether it is a direct download, API, or some through Web page. The use of property indicates directly downloadable distributions. Property: title RDF Property: Definition: A name given to the distribution. Range: Property: description RDF Property: Definition: free-text account of the distribution. Range: Property: release date RDF Property: Definition: Date of formal issuance (e.g., publication) of the distribution. Range: encoded using the relevant and typed using the appropriate XML Schema datatype [ ] Usage note: This property should be set using the first known date of issuance. See also: Property: update/modification date RDF Property: Definition: Most recent date on which the distribution was changed, updated or modified. Range: encoded using the relevant and typed using the appropriate XML Schema datatype [ ] See also: Property: license RDF Property: Definition: This links to the license document under which the distribution is made available. Range: See also:, Property: rights RDF Property: Definition: Information about rights held in and over the distribution. Range: Usage note: dct:license, which is a sub-property of dct:rights, can be used to link a distribution to a license document. However, dct:rights allows linking to a rights statement that can include licensing information as well as other information that supplements the licence such as attribution. See also:, Property: access URL RDF Property: Definition: A landing page, feed, SPARQL endpoint or other type of resource that gives access to the distribution of the dataset Domain: Range: Usage note: • Use accessURL, and not downloadURL, when it is definitely not a download or when you are not sure whether it is. • If the distribution(s) are accessible only through a landing page (i.e. Direct download URLs are not known), then the landing page link SHOULD be duplicated as accessURL on a distribution. (see ) See also Property: download URL RDF Property: Definition: A file that contains the distribution of the dataset in a given format Domain: Range: Usage note: dcat:downloadURL is a specific form of dcat:accessURL. Nevertheless, DCAT does not define dcat:downloadURL as a subproperty of dcat:accessURL not to enforce this entailment as DCAT profiles may wish to impose a stronger separation where they only use accessURL for non-download locations. See also Property: byteSize RDF Property: Definition: The size of a distribution in bytes. Domain: Range: typed as. Usage note: The size in bytes can be approximated when the precise size is not known. Property: media type RDF Property: Definition: The media type of the distribution as defined. Sub property of: Domain: Range: Usage note: This property SHOULD be used when the media type of the distribution is defined in, otherwise dct:format MAY be used with different values. See also: Property: format RDF Property: Definition: The file format of the distribution. Range: Usage note: SHOULD be used if the type of the distribution is defined. 5.5 Class: Concept scheme RDF Class: Definition: The knowledge organization system (KOS) used to represent themes/categories of datasets in the catalog. See also:, 5.6 Class: Concept RDF Class: Definition: A category or a theme used to describe datasets in the catalog. Usage note: It is recommended to use either skos:inScheme or skos:topConceptOf on every skos:Concept used to classify datasets to link it to the concept scheme it belongs to. This concept scheme is typically associated with the catalog using dcat:themeTaxonomy See also:, 5.7 Class: Organization/Person RDF Classes: for people and for government agencies or other entities. Usage note: FOAF [ ] provides sufficient properties to describe these entities. Acknowledgements This document contains a significant contribution from Richard Cyganiak. Richard Cyganiak is one of the initiators of the DCAT work and significantly contributed to the work on this specification as it made its way through the W3C process. The editors would like to thank Vassilios Peristeras for his comments and support for the original DCAT work. Vassilios Peristeras is also one of the initiators of the DCAT work. We would also like to thank Rufus Pollock for his significant input and comments. This document has benefited from inputs from many members of the Government Linked Data Working Group. Specific thanks are due to Ghislain Atemezing, Martin Alvarez and Makx Dekkers. Change history Changes since: None. This article covers the provisioning and cataloging of new tenants, in a multi-tenant sharded database model or pattern. This article has two major parts: • of the provisioning and cataloging of new tenants. • that highlights the PowerShell script code that accomplishes the provisioning and cataloging. • The tutorial uses the Wingtip Tickets SaaS application, adapted to the multi-tenant sharded database pattern. Database pattern This section, plus a few more that follow, discuss the concepts of the multi-tenant sharded database pattern. In this multi-tenant sharded model, the table schemas inside each database include a tenant key in the primary key of tables that store tenant data. The tenant key enables each individual database to store 0, 1, or many tenants. The use of sharded databases makes it easy for the application system to support a very large number of tenants. All the data for any one tenant is stored in one database. The large number of tenants are distributed across the many sharded databases. A catalog database stores the mapping of each tenant to its database. Isolation versus lower cost A tenant that has a database all to itself enjoys the benefits of isolation. The tenant can have the database restored to an earlier date without being restricted by the impact on other tenants. Database performance can be tuned to optimize for the one tenant, again without having to compromise with other tenants. The problem is that isolation costs more than it costs to share a database with other tenants. When a new tenant is provisioned, it can share a database with other tenants, or it can be placed into its own new database. Later you can change your mind and move the database to the other situation. Databases with multiple tenants and single tenants are mixed in the same SaaS application, to optimize cost or isolation for each tenant. Tenant catalog pattern When you have two or more databases that each contain at least one tenant, the application must have a way to discover which database stores the tenant of current interest. A catalog database stores this mapping. Tenant key For each tenant, the Wingtip application can derive a unique key, which is the tenant key. The app extracts the tenant name from the webpage URL. The app hashes the name to obtain the key. The app uses the key to access the catalog. The catalog cross-references information about the database in which the tenant is stored. The app uses the database info to connect. Other tenant key schemes can also be used. Using a catalog allows the name or location of a tenant database to be changed after provisioning without disrupting the application. In a multi-tenant database model, the catalog accommodates moving a tenant between databases. Tenant metadata beyond location The catalog can also indicate whether a tenant is offline for maintenance or other actions. And the catalog can be extended to store additional tenant or database metadata, such as the following items: • The service tier or edition of a database. • The version of the database schema. • The tenant name and its SLA (service level agreement). • Information to enable application management, customer support, or devops processes. The catalog can also be used to enable cross-tenant reporting, schema management, and data extract for analytics purposes. Elastic Database Client Library In Wingtip, the catalog is implemented in the tenantcatalog database. The tenantcatalog is created using the Shard Management features of the. The library enables an application to create, manage, and use a shard map that is stored in a database. A shard map cross-references the tenant key with its shard, meaning its sharded database. During tenant provisioning, EDCL functions can be used from applications or PowerShell scripts to create the entries in the shard map. Later the EDCL functions can be used to connect to the correct database. The EDCL caches connection information to minimize the traffic on the catalog database and speed up the process of connecting. Important Do not edit the data in the catalog database through direct access! Direct updates are not supported due to the high risk of data corruption. Instead, edit the mapping data by using EDCL APIs only. Tenant provisioning pattern Checklist When you want to provision a new tenant into an existing shared database, of the shared database you must ask the following questions: • Does it have enough space left for the new tenant? • Does it have tables with the necessary reference data for the new tenant, or can the data be added? • Does it have the appropriate variation of the base schema for the new tenant? • Is it in the appropriate geographic location close to the new tenant? • Is it at the right service tier for the new tenant? When you want the new tenant to be isolated in its own database, you can create it to meet the specifications for the tenant. After the provisioning is complete, you must register the tenant in the catalog. Finally, the tenant mapping can be added to reference the appropriate shard. Template database Provision the database by executing SQL scripts, deploying a bacpac, or copying a template database. The Wingtip apps copy a template database to create new tenant databases. Like any application, Wingtip will evolve over time. At times, Wingtip will require changes to the database. Changes may include the following items: • New or changed schema. • New or changed reference data. • Routine database maintenance tasks to ensure optimal app performance. With a SaaS application, these changes need to be deployed in a coordinated manner across a potentially massive fleet of tenant databases. For these changes to be in future tenant databases, they need to be incorporated into the provisioning process. This challenge is explored further in the. Scripts The tenant provisioning scripts in this tutorial support both of the following scenarios: • Provisioning a tenant into an existing database shared with other tenants. • Provisioning a tenant into its own database. Tenant data is then initialized and registered in the catalog shard map. In the sample app, databases that contain multiple tenants are given a generic name, such as tenants1 or tenants2. Databases that contain a single tenant are given the tenant's name. The specific naming conventions used in the sample are not a critical part of the pattern, as the use of a catalog allows any name to be assigned to the database. Tutorial begins In this tutorial, you learn how to. • Provision a tenant into a multi-tenant database • Provision a tenant into a single-tenant database • Provision a batch of tenants into both multi-tenant and single-tenant databases • Register a database and tenant mapping in a catalog Prerequisites To complete this tutorial, make sure the following prerequisites are completed: • Azure PowerShell is installed. For details, see • The Wingtip Tickets SaaS Multi-tenant Database app is deployed. To deploy in less than five minutes, see • Get the Wingtip scripts and source code: • The Wingtip Tickets SaaS Multi-tenant Database scripts and application source code are available in the GitHub repo. • See the for steps to download and unblock the Wingtip scripts. Provision a tenant into a database shared with other tenants In this section, you see a list of the major actions for provisioning that are taken by the PowerShell scripts. Then you use the PowerShell ISE debugger to step through the scripts to see the actions in code. Major actions of provisioning The following are key elements of the provisioning workflow you step through: • Calculate the new tenant key: A hash function is used to create the tenant key from the tenant name. • Check if the tenant key already exists: The catalog is checked to ensure the key has not already been registered. • Initialize tenant in the default tenant database: The tenant database is updated to add the new tenant information. • Register tenant in the catalog: The mapping between the new tenant key and the existing tenants1 database is added to the catalog. • Add the tenant's name to a catalog extension table: The venue name is added to the Tenants table in the catalog. This addition shows how the Catalog database can be extended to support additional application-specific data. • Open Events page for the new tenant: The Bushwillow Blues events page is opened in the browser. Debugger steps To understand how the Wingtip app implements new tenant provisioning in a shared database, add a breakpoint and step through the workflow: • In the PowerShell ISE, open. Learning Modules ProvisionTenants Demo-ProvisionTenants.ps1 and set the following parameters: • $TenantName = Bushwillow Blues, the name of a new venue. • $VenueType = blues, one of the pre-defined venue types: blues, classicalmusic, dance, jazz, judo, motorracing, multipurpose, opera, rockmusic, soccer (lowercase, no spaces). • $DemoScenario = 1, to provision a tenant in a shared database with other tenants. • Add a breakpoint by putting your cursor anywhere on line 38, the line that says: New-Tenant `, and then press F9. • Run the script by pressing F5. • After script execution stops at the breakpoint, press F11 to step into the code. • Trace the script's execution using the Debug menu options, F10 and F11, to step over or into called functions. For more information about debugging PowerShell scripts, see. Provision a tenant in its own database Major actions of provisioning The following are key elements of the workflow you step through while tracing the script: • Calculate the new tenant key: A hash function is used to create the tenant key from the tenant name. • Check if the tenant key already exists: The catalog is checked to ensure the key has not already been registered. • Create a new tenant database: The database is created by copying the basetenantdb database using a Resource Manager template. The new database name is based on the tenant's name. • Add database to catalog: The new tenant database is registered as a shard in the catalog. • Initialize tenant in the default tenant database: The tenant database is updated to add the new tenant information. • Register tenant in the catalog: The mapping between the new tenant key and the sequoiasoccer database is added to the catalog. • Tenant name is added to the catalog: The venue name is added to the Tenants extension table in the catalog. • Open Events page for the new tenant: The Sequoia Soccer Events page is opened in the browser. Debugger steps Now walk through the script process when creating a tenant in its own database: • Still in. Learning Modules ProvisionTenants Demo-ProvisionTenants.ps1 set the following parameters: • $TenantName = Sequoia Soccer, the name of a new venue. • $VenueType = soccer, one of the pre-defined venue types: blues, classicalmusic, dance, jazz, judo, motorracing, multipurpose, opera, rockmusic, soccer (lower case, no spaces). • $DemoScenario = 2, to provision a tenant into its own database. • Add a new breakpoint by putting your cursor anywhere on line 57, the line that says: & $PSScriptRoot New-TenantAndDatabase `, and press F9. • Run the script by pressing F5. • After the script execution stops at the breakpoint, press F11 to step into the code. Use F10 and F11 to step over and step into functions to trace the execution. Provision a batch of tenants This exercise provisions a batch of 17 tenants. It’s recommended you provision this batch of tenants before starting other Wingtip Tickets tutorials so there are more databases to work with. • In the PowerShell ISE, open. Learning Modules ProvisionTenants Demo-ProvisionTenants.ps1 and change the $DemoScenario parameter to 4: • $DemoScenario = 4, to provision a batch of tenants into a shared database. • Press F5 and run the script. Verify the deployed set of tenants At this stage, you have a mix of tenants deployed into a shared database and tenants deployed into their own databases. The Azure portal can be used to inspect the databases created. In the, open the tenants1-mt- server by browsing to the list of SQL servers. The SQL databases list should include the shared tenants1 database and the databases for the tenants that are in their own database: While the Azure portal shows the tenant databases, it doesn't let you see the tenants inside the shared database. The full list of tenants can be seen in the Events Hub webpage of Wingtip, and by browsing the catalog. Using Wingtip Tickets events hub page Open the Events Hub page in the browser (http:events.wingtip-mt.trafficmanager.net) Using catalog database The full list of tenants and the corresponding database for each is available in the catalog. A SQL view is provided that joins the tenant name to the database name. The view nicely demonstrates the value of extending the metadata that is stored in the catalog. • The SQL view is available in the tenantcatalog database. • The tenant name is stored in the Tenants table. • The database name is stored in the Shard Management tables. • In SQL Server Management Studio (SSMS), connect to the tenants server at catalog-mt.database.windows.net, with Login = developer, and Password = P@ssword1 • In the SSMS Object Explorer, browse to the views in the tenantcatalog database. • Right click on the view TenantsExtended and choose Select Top 1000 Rows. Note the mapping between tenant name and database for the different tenants. Other provisioning patterns This section discusses other interesting provisioning patterns. Pre-provisioning databases in elastic pools The pre-provisioning pattern exploits the fact that when using elastic pools, billing is for the pool not the databases. Thus databases can be added to an elastic pool before they are needed without incurring extra cost. This pre-visioning significantly reduces the time taken to provision a tenant into a database. The number of databases pre-provisioned can be adjusted as needed to keep a buffer suitable for the anticipated provisioning rate. Auto-provisioning In the auto-provisioning pattern, a dedicated provisioning service is used to provision servers, pools, and databases automatically as needed. This automation includes the pre-provisioning of databases in elastic pools. And if databases are decommissioned and deleted, the gaps this creates in elastic pools can be filled by the provisioning service as desired. This type of automated service could be simple or complex. For example, the automation could handle provisioning across multiple geographies, and could set up geo-replication for disaster recovery. With the auto-provisioning pattern, a client application or script would submit a provisioning request to a queue to be processed by a provisioning service. The script would then poll to detect completion. If pre-provisioning is used, requests would be handled quickly, while a background service would manage the provisioning of a replacement database. Additional resources • • Next steps In this tutorial you learned how to.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
February 2018
Categories |