Unified Scripture Format XML (USFX)

What is USFX?
What is different between USFM and USFX?
Why create yet another Bible file format?
What is the USFX philosophy?
How do USFM, USFX, and OSIS differ?
What are the USFX tags?
Where is the USFX schema?
USFX Schema documentation
Copyright and Permissions
Who should I contact with comments on USFX?

What is USFX?

Unified Scripture Format XML (USFX) is derived from USFM, a file format for publishing and interchanging basic Scripture texts in multiple languages. It is XML, and has an XML schema. It is not intended to be used by itself for all aspects of Bible layout and publishing, but just for representation of the Scripture text itself, and a small amount of accompanying material. USFX exists primarily to bring the advantages of XML to USFM, with minimal additional changes. USFX can be quickly and easily converted to and from USFM. Like USFM, USFX is not designed to be used for general books, theological works, dictionaries, or any other type of data. USFX is not intended to totally replace other Bible formats, but since it is XML, it can be converted to other formats with tools like XSLT.

What is different between USFM, USX, and USFX?

USFX is derived from USFM, but it is not USFM. It can, however, encode everything in a USFM document.

The most obvious difference between USFM and USFX is that USFM is based on backslash codes, but USFX is XML. Another difference is that elements representing things like words of Jesus Christ and Old Testament quotes in the New Testament must be properly closed with their own corresponding XML closing tags. Furthermore, these things may be nested in any way that can actually happen in Scripture, provided that the resulting XML is well-formed. Attributes like that are not assumed to be closed when a verse marker is encountered or when another style starts, unlike the way Paratext interprets USFM.

There is another related format called USX, that was devised after USFX. It is used by Paratext and by the ETEN Digital Bible Library. That format is similar, but not exactly the same. USX can encode everything that USFM can encode and vice versa. USX retains some of the quirks of USFM, like the need for special syntax for nested character styles. It lacks some of the extensions that USFM has, like the ref tag. Haiola can read any of USFM, USX, and USFX, and create both USFM and USFX output. Paratext can create USX from USFM. Therefore, properly-encoded Bibles in any one of these three formats can be converted to the other two formats with no loss of Scripture information or formatting as long as the USFX extensions not found in the other two formats are not used. (Even then, only the extended formatting is lost).

Why Create Yet Another Bible File Format?

Necessity is the mother of invention.” The need for USFX was first felt in the process of converting Scriptures from one “standard” format to another, and in editing some Scriptures in the process of Bible translation work. The first application is to embed a simple XML schema in a Microsoft Word 2003 (or later) XML document that is both easy to work with in Microsoft Word, and easy to convert back to USFM. There are a several other XML Bible schemas in existence that I'm aware of, but these don’t map very cleanly to USFM. USFX is very easy to convert to and from USFM, because it is based on USFM. It is also much simpler to embed in WordML than complex schemas like OSIS, saving me a great deal of time, and making some applications possible that would otherwise be impossible.

What is the USFX Philosophy?

  1. Keep it simple— simple to generate, parse, use, and understand.
  2. Take full advantage of the rich history and experience in handling Scripture texts in USFM and previous SFM dialects, but add the advantages of XML.
  3. Preserve the (admittedly overly terse) tag names of USFM to simplify conversion.
  4. Make it stable, but flexible— arranged so that future enhancements can be added without invalidating old data, where practical (especially in places where USFM is still under development), but plan so that extensions are rarely needed.
  5. Maximize the use of milestones instead of containers for anything that could overlap, and for optional tags that can be ignored by some processes.
  6. Support lossless conversion from USFM to USFX and back, with no loss of Scripture text, punctuation (including quotation punctuation), supplementary text, or metadata.
  7. Use Unicode (UTF-8) character encoding.
  8. Optimize this XML format for Scripture authoring, editing, and conversion to USFM. Make it at least possible to convert to other Scripture file formats with differing primary purposes, such as search, display-on-demand, manipulation of the text (such as machine-assisted translation, punctuation generation, etc.).
  9. Some applications are easier with a strict book/chapter/verse hierarchy, while USFX is aimed more at print publication applications and Scripture editing and authoring. USFX makes no attempt to directly support a strict book/chapter/verse hierarchy. Rather, we recommend converting USFX via XSLT or other means to an XML schema with this sort of layout, when needed, rather than trying to build that functionality into USFX.
  10. Instead of making USFX all things to all people, just make it easy to convert to formats that will satisfy other needs, like HTML, OSIS, WordML, various Bible study program file formats, etc.
  11. The underlying assumptions on the nature of Scripture text made by USFM are retained in USFX.

More on the philosophy of USFX and how it compares with other Scripture file formats is available here.

How do USFM, USFX, and OSIS Differ?

USFM is an attempt to unify the many variations in usage of backslash (\) codes to mark Scripture texts. It is not XML. There are many Scriptures encoded in some form near to this format, mostly for minority languages. USFM is preferable to the many similar, but slightly different, implementations of SFM codes to represent Scriptures used by different, because it is well thought-out, and because it is easier to support one standard way to mark Scripture files with backslash codes than many ways, thus making these files more portable among organizations and branches and making software support for these files easier, less error-prone, and less costly. USFM is currently the format that I recommend for practical Bible translation work.

USFX is primarily an expression of what USFM would look like as proper XML instead of a set of backslash codes. Every USFM backslash code has a corresponding USFX XML tag. USFX is more verbose than USFM, as that is the nature of XML, but it is easier to parse with XML software libraries and XSL transformations. Because USFX and USFM are so similar, it is very easy to convert between the two.

OSIS is another proposed XML Scripture interchange standard. The OSIS XML schema and documentation view Scriptures differently than USFM and USFX, so a fully automatic and lossless transformation between the two is currently not possible. Not only are the metadata sections of OSIS different, but to be fully compliant with the OSIS standard, some punctuation in the Bible text itself must be converted to markup in such a way that it cannot be recovered without language- and style-dependent processing. This conversion is language-dependent and labor-intensive. Because of differences in the kinds of things that are encoded and the ways they are encoded, the current version of OSIS is not suitable for many applications that USFM works well for. However, an improper subset of OSIS that I call Modified OSIS comes close. It is generated by Haiola software from USFX source.

Where is the USFX Schema?

The alias http://eBible.org/usfx.xsd points to the latest schema, which (as I write this) is documented here. Currently the USFX schema is not as restrictive as it might be. Validation of a USFX document against the schema is necessary, but not sufficient, to ensure compliance with the USFX standard.

Copyright and Permissions

The USFX Schema is copyright © 2005-2014 SIL International, EBT, eBible.org and Michael Paul Johnson. It is released under the Gnu Lesser Public License or the Common Public License, as explained in LICENSING.txt.

Creative Commons License
This documentation of USFX is licensed under a Creative Commons Attribution2.5 License. Copies and modified copies must contain a pointer back to the "official" release at http://eBible.org/usfx/.

Who should I contact with comments on USFX?

Comments on USFX should be directed to Kahunapule Michael Johnson. You may use his secure web contact form or standard web contact form.

Bible translators' resources