General Bible Format Tagging Specification
Purpose
This specification is intended as an aid to preparing Bible Texts for use with various Bible search programs. While it is possible to parse and use these files directly, these files may be further processed by a Bible search program by indexing and/or converting to another format, such as the XML Scripture Encoding Model (XSEM) or OSIS. Software tools to do this are available. GBF is not XML, because it doesn't use the same conventions for tags and because strict nesting is not required. Conversions to other formats should be relatively simple. This format is designed to leave most of the detailed formatting up to the Bible search program, which may be running in DOS, Windows, Linux, or some other platform.
Caution
Since this format is still under development, some things might change, especially with respect to footnote and cross reference handling. There are still some things that are a bit ambiguous in this specification, and which may be clarified later with a reference implementation of a GBF reader. Check the master copy of this document before writing programs to take advantage of Bibles distributed in this format. This document was last updated 1 February 2004. Recent changes are marked in bold font.
General Format
General Bible Format (GBF) files are plain ASCII text files, with lines limited to no more than 254 characters. Lines are ended with the DOS standard CR-LF pair of characters. GBF files consist of a series of tokens, which may be tags, words, spaces, or punctuation. These tokens may not cross line boundaries. When read, line endings are the same as a single space, except when they immediately follow the <CM> token, in which case they are ignored. It is up to the program displaying or converting the text to another format to insert soft line breaks at the appropriate places. For ASCII values greater than 127, the U. S. Windows ANSI character set is assumed. (Currently, the only use for these characters is for typographic quotation marks.) GBF readers targeting DOS or Unix should convert characters in this range to the closest equivalents on those platforms.
Tags
There are several kinds of tags in GBF files, indicated by the first character after the "<" character. These include: file header tags, text type tags, file tail tags, font attribute tags, paragraph attribute tags, informational tags, sync marks, special characters, etc. All tags start with the less-than symbol (<) and two characters that identifies the tag and end with the greater-than symbol (>). Tag identifiers are case sensitive, and may be extended in the future to include additional letters, numbers, and other characters. Some tags take a parameter (like sync marks). This parameter is inserted just before the ending ">". Unrecognized tags (not in this specification) should be ignored by GBF readers aimed at this version of the GBF specification, preferably with a warning. In tags with start and stop versions, the second letter of the tag is upper case in the start tag, and lower case in the stop tag.
Text Type Tags
The text type mutually exclusive tag group specifies the logical contents of the following text and implies a default set of character and paragraph settings. The actual fonts and paragraph attributes assigned to each of these kinds of text is determined by the displaying program, preferably with some allowance for customization from the user.
File header tags are presented in the following order (except allowed repetitions noted below), starting at the beginning of the file.
Type | Tag | Purpose |
File header start | <H0vv> | SBF file version information. (vv = 00 for initial version, 02 for this version). This must be the first token of the file, if present. This is where a shift to Unicode may happen in the future. Any text before this tag is ignored. |
Long translation title | <H1> | Set the long title of the translation for printing at the beginning and for long window titles or headings. Should precede the main text of the Bible (i.e., The Holy Bible, God’s Living Word Translation; or American Standard Version). |
Translation abbreviation | <H2> | Set the short translation title or abbreviation (i. e., GLW, ASV, NASB 1995, NJB, Amplified, etc.) |
Copyright notice (short) | <H3> | Short copyright notice (minimal) if copyrighted or something like "KJV text is in the Public Domain." |
Copyright notice (long) | <H4> | Copyright and permissions notice (full text). The copyright notice ends when the next field (probably <TI>) starts. |
Place | <HP> | Geographic location(s) translation is targeted at. |
Book title | <HT> | "Holy Bible" in English, or the equivalent in another language. |
Language Ethnologue code | <HE> | Three-letter code for this language, as listed in the SIL Ethnologue. |
Language name | <HL> | Language of this translation, expressed in that language. |
Chapter specifier | <HS> | "Chapter" in the language of the translation. |
Psalm specifier | <HU> | "Psalm" in the language of the translation. |
Verse specifier | <HV> | "Verse" in the language of the translation. |
Contact name | <HC> | Translator or translation sponsor name. |
Contact address line | <HA> | One line of the contact address. Multiple instances of this line are allowed (i. e. for street, city & state, etc.). |
Copyright work | <HW> | optional -- may be something like "Text," "helps," "New Testament," "maps," etc. |
Copyright year | <HY> | One 4-digit year. May be repeated. |
Copyright year range | <HR> | Two 4-digit years, separated by a hyphen, such as 2001-2003. |
Copyright owner | <HO> | Corporate or personal name of copyright owner. The sequence from <HW> through <HO> may be repeated as appropriate to describe copyright claims on different works within the same volume. Omit the tags <HW> through <HO> on Public Domain works. |
File body tags are used as necessary to mark the type of content of the text.
Type | Tag | Purpose |
Apocrypha | <BA> | Text of the Apocrypha/Deuterocanonical books |
Commentary | <BC> | Commentary (not normally used in Bible texts but in separate files with sync marks). |
Introduction to translation | <BI> | Notes to the reader, translation history, etc. |
New Testament Text | <BN> | Text of the 27 books of the New Testament |
Old Testament Text | <BO> | Text of the 39 books of the Old Testament |
Book Preface | <BP> | Preface or introduction to a Bible translation. |
File tail tags must be in the following order:
Type | Tag | Purpose |
Check value | <ZW> | OBSOLETE (Replaced by external check value from SHASUM or SAPPSUM. Was defined as: Check value (SHA-1 hash of all lines from the beginning of the file to just before this line). Followed by 32 hexadecimal digits. SBF readers should reject any SBF File that fails this validation check. The hash value may optionally be followed by a DSS digital signature.) |
Digital Signature | <ZX> | OBSOLETE (Replaced by external PGP digital signatures. Was defined as: DSS Signature of file from <H0vv> to before <ZW>) |
Registered user ID. | <ZY> | OBSOLETE (Not used. Was defined as: The user ID, name, organization, and check value of the registered user are encoded in this section.) |
End of file | <ZZ> | Stop reading here. |
Type | Start Tag | Stop Tag | Purpose |
Psalm Book Title | <TB> | <Tb> | Mark the beginnings of the 5 books of Psalms |
Comment | <TC> | <Tc> | Ignore
text in this section for normal display or conversion use. Field
containing revision status or comments pertaining to editing or
proofreading of this electronic edition of the Bible text. |
Hebrew Title | <TH> | <Th> | Hebrew titles of psalms. The Hebrew title should come right after the sync marker for verse 1. |
Section header | <TS> | <Ts> | Translator’s or publisher’s section headers |
Book title | <TT> | <Tt> | Full title of the current book as it is to be displayed, i. e. "The Good News According to John" or "John's First Letter" |
Short book name | <TN> | <Tn> | Short title of the current book for headers and references, i. e. "John" or "1 John" |
Vernacular Book abbreviation | <TA> | <Ta> | Abbreviation for this book in the language of the translation. |
Preface | <TP> | <Tp> | Preface or introductory material for a book. |
Font Attributes
Text attribute tag pairs are inserted into the text as necessary to
indicate text
attributes like italics. These may or may not be properly represented
in all Bible viewers
due to platform limitations. All of these text attributes are assumed
to be off at the
beginning of each chapter book unless the
start tag is explicitly repeated after the chapter
sync mark. The stop tag for each font attribute should be explicitly
inserted before the end of each book, and if appropriate, the
start tag reasserted after the next book begins.
Attribute | Start tag | Stop tag |
Bold (used to indicate titles in Preface or book introductions) | <FB> | <Fb> |
Small Caps (unused) | <FC> | <Fc> |
Italics | <FI> | <Fi> |
Font name (unused) | <FNname> | <Fn> |
Old Testament quote (unused) | <FO> | <Fo> |
Red (words of Jesus) | <FR> | <Fr> |
Superscript (unused) | <FS> | <Fs> |
Underline (unused) | <FU> | <Fu> |
Subscript (unused) | <FV> | <Fv> |
Paragraph Attributes
These tags describe attributes of paragraphs, like justification, indenting, etc. All of these attributes are assumed to be in the default state (non-indented prose, left justified, direction left-to-right) at the beginning of each chapter unless the appropriate start tag is repeated after the chapter sync mark. Justification tags are mutually exclusive, as are direction tags.
Attribute | Start Tag | Stop Tag | Comment |
Direction left-to-right (default) (unused) | <DL> | European & American languages | |
Direction right-to-left (unused) | <DR> | Hebrew, Arabic, etc. | |
Direction top-to-bottom (unused) | <DT> | Mandarin, etc. | |
Justify Center (unused) | <JC> | Useful for titles. | |
Justify Full (unusuded) | <JF> | Might map to left justification. | |
Justify Left | <JL> | The default method of justification. | |
Justify Right | <JR> | For "selah" | |
Indented quote (unused) | <PI> | <Pi> | Indented extended quotation. |
Poetry | <PP> | <Pp> | Describes poetry (like in Psalms). |
Information Tags
These tags provide various additional pieces of information about the text that are intended to be displayed on demand with a right mouse click, in a separate window, or at the bottom of a page. The start and stop tags indicate a range of words over which the footnote or reference apply. The sequence indicators shown as "seq" below are used to match the start and stop tags in case of overlapping references, and is a short string of numbers or letters that are guaranteed to be unique in the range of text they cover.
Type | Start Tag | Stop Tag | Purpose |
Text with an embedded footnote. | <RB> | <RF> | The text between <RB> and <RF> is further described or has a comment or translator's note between <RF> and <Rf>. The text between <RB> and <RF> may be marked as having a hyperlink for a footnote pop-up, or may be marked with more conventional superscript indicators in printed text. |
Footnote text | <RF> | <Rf> | Embedded footnote text is between the <RF> and <Rf> tags. <RF> may or may not be preceded by <RB>. |
Parallel Passage (unused) | <RPseq Book ch:vs> | <Rpseq> | Book is a number or abbreviation without spaces. |
Cross reference (unused) | <RXseq Book ch:vs> | <Rxseq> | Book is a number or abbreviation without spaces. |
Word information tags:
Type | Tag | Purpose |
Strong’s Greek Number | <WGnnn> | Ordinal number of Greek lexicon entry for previous word. (Optional) |
Strong’s Hebrew Number | <WHnnn> | Ordinal number of Hebrew lexicon entry for previous word. (Optional) |
Interlinear translation | <WIword(s)> | words to be placed under the current word. (Optional) |
Original Language word information | <WTxxxx> | xxxx
are one or more characters with specific meanings that apply to the
previous word’s tense, gender, number, etc.:
A = aorist P = plural S = singular [to be expanded] (Not used) |
Form of address | <WTf> | The proper name immediately preceding this tag is being addressed directly (2nd person) as opposed to referred to (3rd person). This tag is used to aid the automated conversion of God's proper name from the conventions used in the World English Bible to those used in the HNV. |
Sync Marks
Verse sync marks identify the current book, chapter, and verse. Each kind of sync mark is required at the beginning of the section (book, chapter, or verse) that it identifies. Sync marks may optionally be repeated within a section. If no specific number is specified in a sync mark, the number is assumed to be one more than the previous sync mark of the same kind.
Kind | Tag | Example or comment |
Book | <SBxxx> | For John, <SBJohn> or <SB67> |
Chapter | <SCxxx> | For chapter 3, <SC3> or (if the last chapter was chapter 2), <SC> |
Verse | <SVxxx> | For
verse 16 (following verse 15), <SV16> or <SV>. Verse bridges (i.e. 2 or more verses translated as a unit), the first and last (and optionally all intervening) verse markers are put in order with NOTHING in between them, like <SV5><SV6>, before the contents of the combined verse. |
Date (unused) | <SDmmdd> | For April 30, <SD0430>. Not normally used in Bible texts, but may be in commentaries arranged by daily reading. If mmdd are omitted, assume next day. |
Book markers may be either numeric or an English name or abbreviation as follows. The STEP # is not used in this format, but is listed for reference in conversions.
Old Testament Book Name | Abbreviations | STEP # | Book Number |
Genesis | Ge, Gn | 1 | 1 |
Exodus | Ex | 2 | 2 |
Leviticus | Lev, Lv | 3 | 3 |
Numbers | Nu | 4 | 4 |
Deuteronomy | De, Dt | 5 | 5 |
Joshua | Jos | 6 | 6 |
Judges | Judg, Jdg | 7 | 7 |
Ruth | Ru | 8 | 8 |
1 Samuel | 1 Sa, 1Sa | 9 | 9 |
2 Samuel | 2 Sa, 2Sa | 10 | 10 |
1 Kings | 1 Ki, 1Ki | 11 | 11 |
2 Kings | 2 Ki, 2Ki | 12 | 12 |
1 Chronicles | 1 Ch, 1Ch | 13 | 13 |
2 Chronicles | 2 Ch, 2Ch | 14 | 14 |
Ezra | Ezr | 15 | 15 |
Nehemiah | Ne | 16 | 16 |
Esther | Es | 17, 95, 100 | 17 |
Job | Job | 18 | 18 |
Psalms | Ps | 19, 91, 92 | 19 |
Proverbs | Pr | 20 | 20 |
Ecclesiastes | Ec | 21 | 21 |
Song of Solomon | Song, SS | 22 | 22 |
Isaiah | Isa | 23 | 23 |
Jeremiah | Je | 24 | 24 |
Lamentations | La | 25 | 25 |
Ezekiel | Eze | 26 | 26 |
Daniel | Da | 27, 89 | 27 |
Hosea | Ho | 28 | 28 |
Joel | Joe | 29 | 29 |
Amos | Am | 30 | 30 |
Obadiah | Ob | 31 | 31 |
Jonah | Jon | 32 | 32 |
Micah | Mi | 33 | 33 |
Nahum | Na | 34 | 34 |
Habakkuk | Hab | 35 | 35 |
Zephaniah | Zep | 36 | 36 |
Haggai | Hag | 37 | 37 |
Zechariah | Zec | 38 | 38 |
Malachi | Mal | 39 | 39 |
Apocrypha Book name | Abbreviation | STEP # | Book # |
Tobit | Tob | 69, 93, 98 | 40 |
Judith | Judi, Jdt | 70 | 41 |
Esther (Greek) | GrkEs | 71
(additions only);
88 (Complete) |
42 |
Wisdom | Wis | 72 | 43 |
Sirach | Sir | 73, 94, 99 | 44 |
Baruch | Bar | 74, 90 | 45 |
Letter of Jeremiah | Let | 75 | 46 |
Prayer of Azariah and the Song of the Three Jews | Azar | 76 | 47 |
Susanna | Sus | 77 | 48 |
Bel and the Dragon | Bel | 78 | 49 |
1 Maccabees | 1Mac | 79, 96 | 50 |
2 Maccabees | 2Mac | 80, 97 | 51 |
1 Esdras | 1Esd | 81 | 52 |
Prayer of Manasseh | Man | 82 | 53 |
Psalm 151 | P1 | 86 | 54 |
3 Maccabees | 3Mac | 84 | 55 |
2 Esdras | 2Esd | 83, 87 | 56 |
4 Maccabees | 4Mac | 85 | 57 |
Daniel
(Greek, complete with additions) |
AddDan |
58 |
New Testament Book Name | Abbreviation | STEP # | Book # |
Matthew | Mat, Mt | 40 | 64 |
Mark | Mar, Mk | 41 | 65 |
Luke | Lu, Lk | 42 | 66 |
John | Joh | 43 | 67 |
Acts | Ac | 44 | 68 |
Romans | Ro, Rm | 45 | 69 |
1 Corinthians | 1 Co, 1Co | 46 | 70 |
2 Corinthians | 2 Co, 2Co | 47 | 71 |
Galatians | Ga | 48 | 72 |
Ephesians | Ep | 49 | 73 |
Philippians | Phili, Php | 50 | 74 |
Colossians | Col | 51 | 75 |
1 Thessalonians | 1 Th, 1Th | 52 | 76 |
2 Thessalonians | 2 Th, 2Th | 53 | 77 |
1 Timothy | 1 Ti, 1Ti | 54 | 78 |
2 Timothy | 2 Ti, 2Ti | 55 | 79 |
Titus | Tit | 56 | 80 |
Philemon | Phile, Phm | 57 | 81 |
Hebrews | He | 58 | 82 |
James | Ja | 59 | 83 |
1 Peter | 1 Pe, 1Pe | 60 | 84 |
2 Peter | 2 Pe, 2Pe | 61 | 85 |
1 John | 1 Jo, 1Jo | 62 | 86 |
2 John | 2 Jo, 2Jo | 63 | 87 |
3 John | 3 Jo, 3Jo | 64
(14 vs.);
67 (15 vs.) |
88 |
Jude | Jude | 65 | 89 |
Revelation | Re | 66;
68 (18 vs. in Ch. 12) |
90 |
Special Character Tags
These tags indicate just a single character in the text.
Meaning | Tag | Comment |
ASCII character (unused) | <CAxx> | xx is a hexadecimal value |
> (unused) | <CG> | Literal greater-than sign. |
< (unused) | <CT> | Literal less-than sign. |
End of paragraph | <CM> | Ends paragraph or line of poetry. In prose, may cause blank line and/or indent. |
End of Line | <CL> | Ends line without ending paragraph -- used in the first line of a poetic couplet. In poetry, the next line will be indented. |
Unicode character (unused) | <CUxxxx> | xxxx is a hexadecimal value. |
External Note and Highlight Files
External note and highlight files may either be prepared commentaries or notations made by the individual Bible student as he or she studies the Bible using a Bible search program. When a note and highlight file is open along with a Bible text, the notes are made available to the reader by a pop-up mechanism or a separate window, or printed at the bottom of a page as footnotes. The highlights are applied to the Bible text background as it is displayed. The note and highlight files are plain ASCII text, using the same tags as the Bible text for font and paragraph characteristics, plus the following:
Meaning | Tag | Comment |
Start of note file | <HN> | This tag must be first. |
Note (unused) | <NNref-ref> | The text following is a note pertaining to the Bible text included in the references indicated. References are of the form Book ch:vs wd, where Book may be an abbreviation (without spaces) or a number, and ch and vs are numbers. The number wd is the number of the word within the verse. If wd is omitted, then the whole verse is assumed. If vs is omitted, then the whole chapter is assumed. If ch is omitted, then the whole book is assumed. If the second reference is omitted, then the reference is assumed to cover only the first reference. The beginning and ending words are considered part of the range |
Color (unused) | <NC rrr ggg bbb ref-ref> | Background highlight color expressed as three numbers from 0 to 255 for red (rrr), green (ggg), and blue (bbb) covering the reference indicated. The reference is interpreted just like it is for the note tag. For example, to highlight all of John 3:16 in green, the tag would be <NC 0 255 0 John 3:16>. To highlight "In the beginning" in John 1:1 with a shade of greenish blue, the tag would be <NC 0 64 255 John 1:1 1-John 1:1 3> |
End of file | <ZZ> | Last token of the file. Anything after this token is ignored. |