Network Working Group K. Davies Internet-Draft ICANN Intended status: Informational A. Freytag Expires: January 10, 2014 ASMUS Inc. July 9, 2013 Representing Label Generation Rulesets using XML draft-davies-idntables-03 Abstract This memo describes a method of representing the domain name registration policy for a zone administrator using Extensible Markup Language (XML). These policies, known as "Label Generation Rulesets" (LGRs), are particularly used for the implementation of Internationalised Domain Names (IDNs). The rulesets are used to implement and share policy on which specific Unicode codepoints are permitted for registrations, which alternative codepoints are considered variants, and what actions may be performed on those variants. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 10, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Davies & Freytag Expires January 10, 2014 [Page 1] Internet-Draft Label Generation Rulesets in XML July 2013 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. LGR Format . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Namespace . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2. Basic structure . . . . . . . . . . . . . . . . . . . . . 6 4.3. Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.3.1. The version element . . . . . . . . . . . . . . . . . 7 4.3.2. The date element . . . . . . . . . . . . . . . . . . . 7 4.3.3. The language element . . . . . . . . . . . . . . . . . 7 4.3.4. The domain element . . . . . . . . . . . . . . . . . . 8 4.3.5. The description element . . . . . . . . . . . . . . . 8 4.3.6. The validity-start and validity-end elements . . . . . 8 4.3.7. The unicode-version element . . . . . . . . . . . . . 8 4.4. Codepoint Rules . . . . . . . . . . . . . . . . . . . . . 9 4.4.1. Sequences . . . . . . . . . . . . . . . . . . . . . . 9 4.4.2. Variants . . . . . . . . . . . . . . . . . . . . . . . 10 4.4.3. Result tagging . . . . . . . . . . . . . . . . . . . . 11 4.5. Whole Label Evaluation Rules . . . . . . . . . . . . . . . 12 4.5.1. Basic concepts . . . . . . . . . . . . . . . . . . . . 12 4.5.2. Character Classes . . . . . . . . . . . . . . . . . . 12 4.5.3. Context rules . . . . . . . . . . . . . . . . . . . . 14 4.5.4. Action elements . . . . . . . . . . . . . . . . . . . 15 4.6. Example table . . . . . . . . . . . . . . . . . . . . . . 16 5. Processing a label against an LGR . . . . . . . . . . . . . . 18 5.1. Determining eligibility for a label . . . . . . . . . . . 18 5.2. Determining variants for a label . . . . . . . . . . . . . 18 6. Conversion between other formats . . . . . . . . . . . . . . . 19 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Appendix A. RelaxNG Schema . . . . . . . . . . . . . . . . . . . 23 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 28 Appendix C. Editorial Notes . . . . . . . . . . . . . . . . . . . 29 C.1. Known Issues and Future Work . . . . . . . . . . . . . . . 29 C.2. Sample tables and running code . . . . . . . . . . . . . . 29 C.3. Change History . . . . . . . . . . . . . . . . . . . . . . 29 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 Davies & Freytag Expires January 10, 2014 [Page 2] Internet-Draft Label Generation Rulesets in XML July 2013 1. Introduction This memo describes a method of using Extensible Markup Language (XML) to describe the algorithm used to determine whether a given domain label is permitted, and under which circumstances. These algorithms are comprised of a list of permissible codepoints, variants, and a number of conditions where certain relationships are applied. These algorithms form part of a zone administrator's policies, and can be referred to as Label Generation Rulesets (LGRs), or IDN tables. Administrators of the zones for top-level domain registries have historically published their LGRs using ASCII text or HTML. The formatting of these documents has been loosely based on the format used for the Language Variant Table in [RFC3743]. [RFC4290] also provides a "model table format" that describes a similar set of functionality. Through the first decade of IDN deployment, experience has shown that LGRs derived from these formats are difficult to consistently implement and compare due to their different formats. A universal format, such as one using a structured XML format, will assist by improving machine-readability, consistency, reusability and maintainability of LGRs. It also provides for more complex conditional implementation of variants that reflects the known requirements of current zone administrator policies. While the predominant usage of this specification is to represent IDN label policy, the format may also be used for describing ASCII domain name label rulesets. Davies & Freytag Expires January 10, 2014 [Page 3] Internet-Draft Label Generation Rulesets in XML July 2013 2. Design Goals The following items are explicit design goals of this format: o MUST be in a format that can be implemented in a reasonably straightforward manner in software; o The format SHOULD be able to be checked for formatting errors, such that common mistakes can be caught; o An LGR MUST be able to express the set of valid codepoints that are allowed for registration under a specific zone administrator's policies; o MUST be able to express computed alternatives to a given domain name based on a one-to-one, or one-to-many relationship. These computed alternatives are commonly known as "variants"; o Variants SHOULD be able to be tagged with specific categories, such that the categories can be used to support registry policy (such as whether to list the computed variant in the zone, or to merely block it from registration); o Variants MUST be able to stipulated based on contextual information. For example, specific variants may only be applicable when they follow another specific codepoint, or when the codepoint is displayed in a specific presentation form; o The data contained within an LGR MUST be unambiguous, such that independent implementations that utilise the contents will arrive at the same results; o LGRs SHOULD be suitable for comparison and re-use, such that one could easily compare the contents of two or more to see the differences, to merge them, and so on. o As many existing IDN tables are practicable SHOULD be able to be migrated to the LGR format with all applicable logic retained. It is explicitly NOT the goal of this format to stipulate what codepoints should be listed in an LGR by a zone administrator. Which registration policies are used for a particular zone is outside the scope of this memo. Davies & Freytag Expires January 10, 2014 [Page 4] Internet-Draft Label Generation Rulesets in XML July 2013 3. Requirements To be able to fulfil the known utilisation of LGRs, the existing corpus of published IDN tables were reviewed to prepare this specification. In addition, the requirements of ICANN's work to implement an LGR for the DNS Root Zone [LGR-PROCEDURE] were also considered. In Section B of that document, five specific requirements for an LGR methodology were identified: o The ability to identify a set of codepoints that are permitted. o The ability to represent a list of variants, if any, for each codepoint. o A method of identifying codepoints that are related, using a tag. o The ability to describe rules regarding the possible actions that may be performed on the resulting label (such as blocked, allocatable, etc.) o The ability to describe rules that check for ill-formed combinations across the whole label. Davies & Freytag Expires January 10, 2014 [Page 5] Internet-Draft Label Generation Rulesets in XML July 2013 4. LGR Format An LGR is expressed as a well-formed XML Document [XML]. 4.1. Namespace The XML Namespace URI is [TBD]. 4.2. Basic structure The basic XML framework of the document is as follows: ... Within the "lgr" element rests several sub-elements. Firstly is a "meta" element that contains all meta-data associated with the IDN table, such as its authorship, what it is used for, implementation notes and references. This is followed by a "data" element that contains the substantive codepoint data. Finally, an optional "rules" element contains information on whole-label evaluation rules, if any, along with any specific rules regarding the disposition of computed variants. ... ... ... A document should contain exactly one "lgr" element, and within that optionally one "meta" element and exactly one "data" element. 4.3. Metadata The "meta" element is used to express meta-data associated within the LGR. It can be used to explain the author or relevant contact person, explain what the usage of the IDN table is, provide implementation notes as well as references. The data contained Davies & Freytag Expires January 10, 2014 [Page 6] Internet-Draft Label Generation Rulesets in XML July 2013 within is not required by software consuming the LGR in order to calculate valid labels, or to calculate variants. 4.3.1. The version element The "version" element is used to uniquely identify each version of the LGR being represented. No specific format is required, but it is RECOMMENDED that it be a numerical positive integer, which is incremented with each revision of the file. An example of a typical first edition of a document: 1 A common alternative is to use a major-minor number scheme, where two decimal numbers are used to represent major and minor changes to the LGR. For example, "1.0" would be the first major release, "1.1" would be a minor update to that, and "2.0" would represent a major revision. 4.3.2. The date element The "date" element is used to identify the date the LGR was written. The contents of this element MUST be a valid ISO 8601 date string as described in [RFC3339]. Example of a date: 2009-11-01 4.3.3. The language element The "language" element signals that the LGR is associated with a specific language or script. The value of the language element must be a valid language tag as described in [RFC5646]. The tag may simply refer to a script if the LGR is not referring to a specific language. There may be multiple language elements for a LGR if it spans multiple languages and/or scripts. Example of an English language LGR: en If the LGR applies to a specific script, rather than a language, the "und" language tag should be used followed by the relevant [RFC5646] script subtag. For example, for a Cyrillic script LGR: und-Cyrl Davies & Freytag Expires January 10, 2014 [Page 7] Internet-Draft Label Generation Rulesets in XML July 2013 4.3.4. The domain element This optional element refers to a domain to which this policy is applied. example.com There may be multiple tags used to reflect a list of domains. 4.3.5. The description element The "description" element is a free-form element that contains any additional relevant description. Typically, this field contains authorship information, as well as additional context on how the LGR was formulated (such as with references), and how it has been applied. The element has an optional "type" attribute, which refers to the media type of the enclosed data. If the description lacks a type field, it will be assumed to be plain text. The description elements describe information relating to the LGR that is useful for the user of the LGR in its interpretation. This may explain the history, the rationale, reference sources etc. It may also contain authorship information. The "type" attribute may be used to specify the encoding within description element. The attribute should be a valid MIME type. If supplied, it will be assumed the contents is content of that encoding. Typical types would be "text/plain" or "text/html". "text/ plain" will be assumed if no type attribute is specified. 4.3.6. The validity-start and validity-end elements The "validity-start" and "validity-end" elements are optional elements that describe the time period from which the contents of the LGR become valid (i.e. are used in registry policy), and the contents of the LGR cease to be used. The times should conform to the format described in section 5.6 of [RFC5646]. It may be comprised of a date, or a date and time stamp. 4.3.7. The unicode-version element If a given table is dependent on certain characters or functionality from a given version of the Unicode standard, the minimum version number MUST be listed. If any software processing the table does not Davies & Freytag Expires January 10, 2014 [Page 8] Internet-Draft Label Generation Rulesets in XML July 2013 have the minimum requisite version, it MUST NOT perform any operations relating to whole-label evaluation. This is because the Unicode properties for the codepoints may have changed in subsequent versions. 6.2 4.4. Codepoint Rules The bulk of a label generation ruleset is a description of which set of codepoints are eligible for a given label. For rulesets that perform operations that result in potential variants, the codepoint- level relationships between variants need to also be described. The codepoint data is collected within a "data" element. Within this element, a series of "char" and "range" elements describe eligible codepoints, or ranges of codepoints, respectively. Discrete permissible codepoints or codepoint sequences may be stipulated with a "char" element, e.g. Ranges of permissible codepoints may be stipulated with a "range" element, e.g. The range is inclusive of the first and last codepoints. Codepoints must be expressed in hexadecimal, i.e. according to the standard Unicode convention without the prefix "U+". The rationale for not allowing other encoding formats, including native Unicode encoding in XML, is explored in [UAX42]. The XML conventions used in this format, including the element and attribute names, mirror this document where practical and reasonable to do so. 4.4.1. Sequences A sequence of two or more codepoints may be specified in a LGR, when the exact sequence of codepoints is required to occur in order for the consituent elements to be eligible. This approach allows representation of policy where a specific codepoint is only eligible when preceded or followed by another codepoint. For example, in order to represent the eligibility of the MIDDLE DOT (U+00B7) only when both preceded and followed by the LATIN SMALL LETTER L (U+006C): Davies & Freytag Expires January 10, 2014 [Page 9] Internet-Draft Label Generation Rulesets in XML July 2013 4.4.2. Variants While most LGRs typically only determine codepoint eligibility, others additionally specify a mapping of codepoints to other codepoints, known as "variants". What constitutes a variant is a matter of policy, and varies for each implementation. 4.4.2.1. Basic variants Variants are specified as one of more children of a "char" element. For example, to map LATIN SMALL LETTER V (U+0076) as a variant of LATIN SMALL LETTER U (U+0075): A sequence of multiple codepoints can be specified as a variant of a single codepoint. For example, the sequence of LATIN SMALL LETTER O (U+006F) then LATIN SMALL LETTER E (U+0065) can be specified as a variant for an LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) as follows: Variants are specified in only one direction. For symmetric variants, the inverse of the variant must be explicitly specified: Both the south and target of a variant mapping may be sequences. It is not possible to specify variants for ranges. 4.4.2.2. Null variants To specify a null variant, which is a variant string that maps to no codepoint, use an empty cp attribute. For example, to mark a string with a ZERO WIDTH NON-JOINER (U+200C) to the same string without the ZERO WIDTH NON-JOINER: Davies & Freytag Expires January 10, 2014 [Page 10] Internet-Draft Label Generation Rulesets in XML July 2013 4.4.2.3. Conditional variants Fundamentally, variants are mappings between two sequences of codepoints. However, in some instances for a variant relationship to exist, some context external to the codepoint sequence must be considered. For example, in some cases the positional context determines whether two code point sequences are variants of each other. This is because Arabic characters can have different forms based on position. This position context cannot be solely derived from the codepoint, as the code point is the same for the various forms. To specify a conditional variant relationship the "when" attribute is used. The variant relationship exists when the condition in the "when" attribute is satisfied. arabic-initial The codepoint is in a context where it would be presented in its Arabic Initial form. arabic-isolated The codepoint is in a context where it would be presented in its Arabic Isolated form. arabic-medial The codepoint is in a context where it would be presented in its Arabic Medial form. arabic-final The codepoint is in a context where it would be presented in its Arabic Final form. For example, to mark ARABIC LETTER ALEF WITH WAVY HAMZA BELOW (U+0673) as a variant of ARABIC LETTER ALEF WITH HAMZA BELOW (U+0625), but only when it appears in isolated or final forms: Only a single context attribute can be applied to any "var" element, however, multiple "var" elements using the same mapping, but different "when" attributes may be specified. 4.4.3. Result tagging Typically, LGRs are used to explicitly designate allowable codepoints, with any label with a codepoint not explicitly listed in the LGR being considered an ineligible label according to the ruleset. Davies & Freytag Expires January 10, 2014 [Page 11] Internet-Draft Label Generation Rulesets in XML July 2013 For more complex registry rules, there may be a need to discern codepoints and variants of certain types. This can be accomplished by applying a "tag" attribute, and then filtering on results based on the tag using whole label evaluation. A tag may be of any value, but the following tags are pre-defined to encourage common conventions in their application. If these tags can represent registry policy, they SHOULD be used. 4.5. Whole Label Evaluation Rules 4.5.1. Basic concepts The codepoints in a label sometimes need to satisfy context-based rules, in order for the label to be considered valid. Whole Label Evaluation Rules (WLE) can be specified to support this validation. The same validation can be applied to variants created by applying the variant mapping. The whole label evaluation rules are contained in an "wle" element, which contains character class, rule and action elements. These are described below. A Whole Label Evaluation Rule describes a complete label. The elements of the "rule" element are: o character classes, which defines sets of codepoints to be used for context comparisons; o context operators, which define when character classes may appear; and o actions, which define what actions to take based on the context. 4.5.2. Character Classes Character classes are named sets of characters that share a particular property. They can be defined in several ways. 1. Define the property via matching a tag in the codepoint data. All characters with the same tag attribute are part of the same class. 2. Reference one of the Unicode character properties defined in the Unicode Character Database (UCD). 3. Explicitly list all the codepoints in the class. Davies & Freytag Expires January 10, 2014 [Page 12] Internet-Draft Label Generation Rulesets in XML July 2013 4. Define a class as a combination of any number of these definitions or other classes 4.5.2.1. Tag-based classes If tags are defined using the "tag" attribute, classes are defined based upon the names of the tags used. From these classes, further operations may be performed by context operators and actions. 4.5.2.2. Unicode property based classes A class is defined in terms of Unicode properties by giving the Unicode property alias and the property value or property value alias. The example above selects all characters for which the Unicode canonical combining class (ccc) value is 9. This value of the ccc is assigned in the UCD to all characters that are viramas. The string "ccc" is the short-alias for the canonical combining class, as defined in the file PropertyAliases.txt in the UCD. [[Possibly change those to the labels used by the XML format of the UCD -- per UAX42]] Unicode properties may, in principle, change between versions of the Unicode Standard. However, the values assigned for a given version are fixed. If Unicode Properties are used, they MUST be declared in the header, and the Unicode Version must be defined. (Note, some Unicode properties are stable across versions and do not change, once assigned. Nevertheless, in order to make sure the UCD version covers all the characters in the codepoint tables, it is necessary to give version number in the header.). 4.5.2.3. Explicitly declared classes A class of codepoints may also be declared by listing the codepoints that are a member of the class. This is useful when tagging can not be used because codepoints are not part of the eligible set of codepoints for the given LGR. To define a class in terms of an explicit list of codepoints: Davies & Freytag Expires January 10, 2014 [Page 13] Internet-Draft Label Generation Rulesets in XML July 2013 This defines a class named "abc" containing the codepoints for characters "a", "b" and "c". The ordering of the codepoints is not material, but it is RECOMMENDED to list them in ascending order. Range operators may also be used to represent a series of consecutive codepoints. The same declaration can be made as follows: 4.5.2.4. Combined classes Classes may be combined using logical operators for inversion, union, intersection and exclusive-or. 4.5.3. Context rules Context rules are comprised of a series of logical conditions that must be satisfied in order to determine a label meets a given context. These rules relate to the appearance of character classes defined elsewhere in the table. 4.5.3.1. The rule element A matching rule is defined by a "rule" element, which combines character classes with context operators. A simple rule to match a label where all characters are members of the class "preferred": To provide more specificity on the number of times a specific character class may appear, the "count" attribute allows you to specify the number of times. This number should be an integer of 0 or higher. If it is followed by a plus character (+), this means it can be higher that the number stated. Therefore, "1" would mean exactly one occurrence, whereas "1+" would indicate one or more occurrences. Davies & Freytag Expires January 10, 2014 [Page 14] Internet-Draft Label Generation Rulesets in XML July 2013 For cases where several alternates could be chosen, the element can encode a list of choices: For cases when a match may occur against any codepoint, use any "any" element: By default Whole Label Evaluation Rules always match the entire label. Use attribute "match" with values "start", "anywhere" and "end" to define rules that need to match in specific positions of the label. Rules are named and can be nested by reference. Here's an example of a rule requiring that all labels be letters (optionally followed by combining marks) and possibly digits. 4.5.4. Action elements The purpose of a rule is to trigger a specific action. Often, the action simply results in blocking a label that does not match a rule. Davies & Freytag Expires January 10, 2014 [Page 15] Internet-Draft Label Generation Rulesets in XML July 2013 blocking rule < An action may contain precisely one "match" or "not-match" attribute, but not both. Because rules may be compound rules that contain other rules, only a single rule may be named as the value of the "match" or "not-match" attrbute. The precise action taken and the name of the corresponding "action" attribute are not defined here. It is strongly RECOMMENDED to use the following actions only with their conventional sense. block The resulting string should be blocked from registration. This would typically apply for a derived variant that has no practical use, such as blocking confusingly similar by undesirable variants. allocate The resulting string should be reserved for use by the same operator of the origin string, but not automatically allocated for use. activate The resulting string should be activated for use. (This is the typical default action if no tagging is used, and is known as a "preferred" variant in [RFC3743]) 4.6. Example table A sample complete XML LGR is as follows. 1 2010-01-01 sv example Swedish examples institute. ]]> Davies & Freytag Expires January 10, 2014 [Page 16] Internet-Draft Label Generation Rulesets in XML July 2013 Davies & Freytag Expires January 10, 2014 [Page 17] Internet-Draft Label Generation Rulesets in XML July 2013 5. Processing a label against an LGR 5.1. Determining eligibility for a label In order to use a table to test a specific domain label for membership in the LGR, a consumer of the LGR must iterate through each codepoint within a given U-label, and test that each codepoint is a member of the LGR. If any codepoint is not a member of the LGR, it shall be deemed as not eligible in accordance with the table. A codepoint is deemed a member of the table when it is listed with the element, and all necessary condition listed in "when" attributes are correctly satisfied. 5.2. Determining variants for a label For a given eligible label, the set of variants is deemed to be each possible permutation of elements, whereby all "when" attributes are correctly satisfied for each codepoint in the given permutation. Davies & Freytag Expires January 10, 2014 [Page 18] Internet-Draft Label Generation Rulesets in XML July 2013 6. Conversion between other formats Both [RFC3743] and [RFC4290] provide different grammars for IDN tables. These formats are unable to fully cater for the increased requirements of contemporary IDN variant policies. This specification is a superset of functionality provided by these IDN table formats, thus any table expressed in those formats can be expressed in this format. Automated conversion can be conducted between tables conformant with the grammar specified in each document. Davies & Freytag Expires January 10, 2014 [Page 19] Internet-Draft Label Generation Rulesets in XML July 2013 7. IANA Considerations This document does not specify any IANA actions. Davies & Freytag Expires January 10, 2014 [Page 20] Internet-Draft Label Generation Rulesets in XML July 2013 8. Security Considerations There are no security considerations for this memo. Davies & Freytag Expires January 10, 2014 [Page 21] Internet-Draft Label Generation Rulesets in XML July 2013 9. References [LGR-PROCEDURE] Internet Corporation for Assigned Names and Numbers, "Procedure to Develop and Maintain the Label Generation Rules for the Root Zone in Respect of IDNA Labels". [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, July 2002. [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint Engineering Team (JET) Guidelines for Internationalized Domain Names (IDN) Registration and Administration for Chinese, Japanese, and Korean", RFC 3743, April 2004. [RFC4290] Klensin, J., "Suggested Practices for Registration of Internationalized Domain Names (IDN)", RFC 4290, December 2005. [RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman, "Linguistic Guidelines for the Use of the Arabic Language in Internet Domains", RFC 5564, February 2010. [RFC5646] Phillips, A. and M. Davis, "Tags for Identifying Languages", BCP 47, RFC 5646, September 2009. [UAX42] Unicode Consortium, "Unicode Character Database in XML". [XML] "Extensible Markup Language (XML) 1.0". Davies & Freytag Expires January 10, 2014 [Page 22] Internet-Draft Label Generation Rulesets in XML July 2013 Appendix A. RelaxNG Schema Davies & Freytag Expires January 10, 2014 [Page 23] Internet-Draft Label Generation Rulesets in XML July 2013 Davies & Freytag Expires January 10, 2014 [Page 24] Internet-Draft Label Generation Rulesets in XML July 2013 Davies & Freytag Expires January 10, 2014 [Page 25] Internet-Draft Label Generation Rulesets in XML July 2013 Davies & Freytag Expires January 10, 2014 [Page 26] Internet-Draft Label Generation Rulesets in XML July 2013 Davies & Freytag Expires January 10, 2014 [Page 27] Internet-Draft Label Generation Rulesets in XML July 2013 Appendix B. Acknowledgements This format builds upon the work on documenting IDN tables by many different registry operators. Notably, a comprehensive language table for Chinese, Japanese and Korean was developed by the "Joint Engineering Team" [RFC3743] that is the basis of many registry policies; and a set of guidelines for Arabic script registrations [RFC5564] was published by the Arabic-language community. Contributions that have shaped this document have been provided by Francisco Arias, Mark Davis, Nicholas Ostler, Thomas Roessler, Steve Sheng and Andrew Sullivan. Davies & Freytag Expires January 10, 2014 [Page 28] Internet-Draft Label Generation Rulesets in XML July 2013 Appendix C. Editorial Notes This appendix to be removed prior to final publication. C.1. Known Issues and Future Work o A default set of actions should be defined if they are not explicitly accounted for in the table. o A method of specifying the origin URI for a table, and an expiration or refresh policy, as meta-data may be a useful way to declare how the table will be updated. C.2. Sample tables and running code Some sample tables using this format, as well as a basic implementation of this specification, is posted at https://github.com/kjd/idntables C.3. Change History -00 Initial draft. -01 Add an XML Namespace, and fix other XML nits. Add support for sequences of codepoints. Improve on consistently using Unicode nomenclature. -02 Add support for validity periods. -03 Incorporate requirements from the Label Generation Ruleset Procedure for the DNS Root Zone. These requirements include a detailed grammar for specifying whole-label variants, and the ability to explicitly declare of the actions associated with a specific variant. The document also consistently applies the term "Label Generation Ruleset", rather than "IDN table", to reflect the policy term now being used to describe these. Davies & Freytag Expires January 10, 2014 [Page 29] Internet-Draft Label Generation Rulesets in XML July 2013 Authors' Addresses Kim Davies Internet Corporation for Assigned Names and Numbers 12025 Waterfront Drive Los Angeles, CA 90094 US Phone: +1 310 301 5800 Email: kim.davies@icann.org URI: http://www.iana.org/ Asmus Freytag ASMUS Inc. Email: asmus@unicode.org Davies & Freytag Expires January 10, 2014 [Page 30]