General Bible Format Tagging Specification

Purpose

This specification is intended as an aid to preparing Bible Texts for use with various Bible search programs. While it is possible to parse and use these files directly, these files may be further processed by a Bible search program by indexing and/or converting to another format, such as the XML Scripture Encoding Model (XSEM) or OSIS. Software tools to do this are available.  GBF is not XML, because it doesn't use the same conventions for tags and because strict nesting is not required.  Conversions to other formats should be relatively simple. This format is designed to leave most of the detailed formatting up to the Bible search program, which may be running in DOS, Windows, Linux, or some other platform.

Caution

Since this format is still under development, some things might change, especially with respect to footnote and cross reference handling. There are still some things that are a bit ambiguous in this specification, and which may be clarified later with a reference implementation of a GBF reader. Check the master copy of this document before writing programs to take advantage of Bibles distributed in this format. This document was last updated 1 February 2004. Recent changes are marked in bold font.

General Format

General Bible Format (GBF) files are plain ASCII text files, with lines limited to no more than 254 characters. Lines are ended with the DOS standard CR-LF pair of characters. GBF files consist of a series of tokens, which may be tags, words, spaces, or punctuation. These tokens may not cross line boundaries. When read, line endings are the same as a single space, except when they immediately follow the <CM> token, in which case they are ignored. It is up to the program displaying or converting the text to another format to insert soft line breaks at the appropriate places. For ASCII values greater than 127, the U. S. Windows ANSI character set is assumed. (Currently, the only use for these characters is for typographic quotation marks.) GBF readers targeting DOS or Unix should convert characters in this range to the closest equivalents on those platforms.

Tags

There are several kinds of tags in GBF files, indicated by the first character after the "<" character. These include: file header tags, text type tags, file tail tags, font attribute tags, paragraph attribute tags, informational tags, sync marks, special characters, etc. All tags start with the less-than symbol (<) and two characters that identifies the tag and end with the greater-than symbol (>). Tag identifiers are case sensitive, and may be extended in the future to include additional letters, numbers, and other characters. Some tags take a parameter (like sync marks). This parameter is inserted just before the ending ">". Unrecognized tags (not in this specification) should be ignored by GBF readers aimed at this version of the GBF specification, preferably with a warning. In tags with start and stop versions, the second letter of the tag is upper case in the start tag, and lower case in the stop tag.

Text Type Tags

The text type mutually exclusive tag group specifies the logical contents of the following text and implies a default set of character and paragraph settings. The actual fonts and paragraph attributes assigned to each of these kinds of text is determined by the displaying program, preferably with some allowance for customization from the user.

File header tags are presented in the following order (except allowed repetitions noted below), starting at the beginning of the file.

Type Tag Purpose
File header start <H0vv> SBF file version information. (vv = 00 for initial version, 02 for this version). This must be the first token of the file, if present. This is where a shift to Unicode may happen in the future. Any text before this tag is ignored.
Long translation title <H1> Set the long title of the translation for printing at the beginning and for long window titles or headings. Should precede the main text of the Bible (i.e., The Holy Bible, God’s Living Word Translation; or American Standard Version).
Translation abbreviation <H2> Set the short translation title or abbreviation (i. e., GLW, ASV, NASB 1995, NJB, Amplified, etc.)
Copyright notice (short) <H3> Short copyright notice (minimal) if copyrighted or something like "KJV text is in the Public Domain."
Copyright notice (long) <H4> Copyright and permissions notice (full text). The copyright notice ends when the next field (probably <TI>) starts.
Place <HP> Geographic location(s) translation is targeted at.
Book title <HT> "Holy Bible" in English, or the equivalent in another language.
Language Ethnologue code <HE> Three-letter code for this language, as listed in the SIL Ethnologue.
Language name <HL> Language of this translation, expressed in that language.
Chapter specifier <HS> "Chapter" in the language of the translation.
Psalm specifier <HU> "Psalm" in the language of the translation.
Verse specifier <HV> "Verse" in the language of the translation.
Contact name <HC> Translator or translation sponsor name.
Contact address line <HA> One line of the contact address. Multiple instances of this line are allowed (i. e. for street, city & state, etc.).
Copyright work <HW> optional -- may be something like "Text," "helps," "New Testament," "maps," etc.
Copyright year <HY> One 4-digit year. May be repeated.
Copyright year range <HR> Two 4-digit years, separated by a hyphen, such as 2001-2003.
Copyright owner <HO> Corporate or personal name of copyright owner. The sequence from <HW> through <HO> may be repeated as appropriate to describe copyright claims on different works within the same volume. Omit the tags <HW> through <HO> on Public Domain works.

 

File body tags are used as necessary to mark the type of content of the text.

Type Tag Purpose
Apocrypha <BA> Text of the Apocrypha/Deuterocanonical books
Commentary <BC> Commentary (not normally used in Bible texts but in separate files with sync marks).
Introduction to translation <BI> Notes to the reader, translation history, etc.
New Testament Text <BN> Text of the 27 books of the New Testament
Old Testament Text <BO> Text of the 39 books of the Old Testament
Book Preface <BP> Preface or introduction to a Bible translation.

 

File tail tags must be in the following order:

Type Tag Purpose
Check value <ZW> OBSOLETE (Replaced by external check value from SHASUM or SAPPSUM. Was defined as: Check value (SHA-1 hash of all lines from the beginning of the file to just before this line). Followed by 32 hexadecimal digits. SBF readers should reject any SBF File that fails this validation check. The hash value may optionally be followed by a DSS digital signature.)
Digital Signature <ZX> OBSOLETE (Replaced by external PGP digital signatures. Was defined as: DSS Signature of file from <H0vv> to before <ZW>)
Registered user ID. <ZY> OBSOLETE (Not used. Was defined as: The user ID, name, organization, and check value of the registered user are encoded in this section.)
End of file <ZZ> Stop reading here.

 

Type Start Tag Stop Tag Purpose
Psalm Book Title <TB> <Tb> Mark the beginnings of the 5 books of Psalms
Comment <TC> <Tc> Ignore text in this section for normal display or conversion use. Field containing revision status or comments pertaining to editing or proofreading of this electronic edition of the Bible text.
Hebrew Title <TH> <Th> Hebrew titles of psalms. The Hebrew title should come right after the sync marker for verse 1.
Section header <TS> <Ts> Translator’s or publisher’s section headers
Book title <TT> <Tt> Full title of the current book as it is to be displayed, i. e. "The Good News According to John" or "John's First Letter"
Short book name <TN> <Tn> Short title of the current book for headers and references, i. e. "John" or "1 John"
Vernacular Book abbreviation <TA> <Ta> Abbreviation for this book in the language of the translation.
Preface <TP> <Tp> Preface or introductory material for a book.

 

Font Attributes

Text attribute tag pairs are inserted into the text as necessary to indicate text attributes like italics. These may or may not be properly represented in all Bible viewers due to platform limitations. All of these text attributes are assumed to be off at the beginning of each chapter book unless the start tag is explicitly repeated after the chapter sync mark. The stop tag for each font attribute should be explicitly inserted before the end of each book, and if appropriate, the start tag reasserted after the next book begins.

 

Attribute Start tag Stop tag
Bold (used to indicate titles in Preface or book introductions) <FB> <Fb>
Small Caps (unused) <FC> <Fc>
Italics <FI> <Fi>
Font name (unused) <FNname> <Fn>
Old Testament quote (unused) <FO> <Fo>
Red (words of Jesus) <FR> <Fr>
Superscript (unused) <FS> <Fs>
Underline (unused) <FU> <Fu>
Subscript (unused) <FV> <Fv>

 

Paragraph Attributes

These tags describe attributes of paragraphs, like justification, indenting, etc. All of these attributes are assumed to be in the default state (non-indented prose, left justified, direction left-to-right) at the beginning of each chapter unless the appropriate start tag is repeated after the chapter sync mark. Justification tags are mutually exclusive, as are direction tags.

Attribute Start Tag Stop Tag Comment
Direction left-to-right (default) (unused) <DL>   European & American languages
Direction right-to-left (unused) <DR>   Hebrew, Arabic, etc.
Direction top-to-bottom (unused) <DT>   Mandarin, etc.
Justify Center (unused) <JC>   Useful for titles.
Justify Full (unusuded) <JF>   Might map to left justification.
Justify Left <JL>   The default method of justification.
Justify Right <JR>   For "selah"
Indented quote (unused) <PI> <Pi> Indented extended quotation.
Poetry <PP> <Pp> Describes poetry (like in Psalms).

Information Tags

These tags provide various additional pieces of information about the text that are intended to be displayed on demand with a right mouse click, in a separate window, or at the bottom of a page. The start and stop tags indicate a range of words over which the footnote or reference apply. The sequence indicators shown as "seq" below are used to match the start and stop tags in case of overlapping references, and is a short string of numbers or letters that are guaranteed to be unique in the range of text they cover.

Type Start Tag Stop Tag Purpose
Text with an embedded footnote. <RB> <RF> The text between <RB> and <RF> is further described or has a comment or translator's note between <RF> and <Rf>. The text between <RB> and <RF> may be marked as having a hyperlink for a footnote pop-up, or may be marked with more conventional superscript indicators in printed text.
Footnote text <RF> <Rf> Embedded footnote text is between the <RF> and <Rf> tags. <RF> may or may not be preceded by <RB>.
Parallel Passage (unused) <RPseq Book ch:vs> <Rpseq> Book is a number or abbreviation without spaces.
Cross reference (unused) <RXseq Book ch:vs> <Rxseq> Book is a number or abbreviation without spaces.

 

Word information tags:

Type Tag Purpose
Strong’s Greek Number <WGnnn> Ordinal number of Greek lexicon entry for previous word. (Optional)
Strong’s Hebrew Number <WHnnn> Ordinal number of Hebrew lexicon entry for previous word. (Optional)
Interlinear translation <WIword(s)> words to be placed under the current word. (Optional)
Original Language word information <WTxxxx> xxxx are one or more characters with specific meanings that apply to the previous word’s tense, gender, number, etc.:

A = aorist

P = plural

S = singular

[to be expanded]

(Not used)

Form of address <WTf> The proper name immediately preceding this tag is being addressed directly (2nd person) as opposed to referred to (3rd person). This tag is used to aid the automated conversion of God's proper name from the conventions used in the World English Bible to those used in the HNV.

Sync Marks

Verse sync marks identify the current book, chapter, and verse. Each kind of sync mark is required at the beginning of the section (book, chapter, or verse) that it identifies. Sync marks may optionally be repeated within a section. If no specific number is specified in a sync mark, the number is assumed to be one more than the previous sync mark of the same kind.

 

Kind Tag Example or comment
Book <SBxxx> For John, <SBJohn> or <SB67>
Chapter <SCxxx> For chapter 3, <SC3> or (if the last chapter was chapter 2), <SC>
Verse <SVxxx> For verse 16 (following verse 15), <SV16> or <SV>.
Verse bridges (i.e. 2 or more verses translated as a unit), the first and last (and optionally all intervening) verse markers are put in order with NOTHING in between them, like <SV5><SV6>, before the contents of the combined verse.
Date (unused) <SDmmdd> For April 30, <SD0430>. Not normally used in Bible texts, but may be in commentaries arranged by daily reading. If mmdd are omitted, assume next day.

 

Book markers may be either numeric or an English name or abbreviation as follows. The STEP # is not used in this format, but is listed for reference in conversions.

Old Testament Book Name Abbreviations STEP # Book Number
Genesis Ge, Gn 1 1
Exodus Ex 2 2
Leviticus Lev, Lv 3 3
Numbers Nu 4 4
Deuteronomy De, Dt 5 5
Joshua Jos 6 6
Judges Judg, Jdg 7 7
Ruth Ru 8 8
1 Samuel 1 Sa, 1Sa 9 9
2 Samuel 2 Sa, 2Sa 10 10
1 Kings 1 Ki, 1Ki 11 11
2 Kings 2 Ki, 2Ki 12 12
1 Chronicles 1 Ch, 1Ch 13 13
2 Chronicles 2 Ch, 2Ch 14 14
Ezra Ezr 15 15
Nehemiah Ne 16 16
Esther Es 17, 95, 100 17
Job Job 18 18
Psalms Ps 19, 91, 92 19
Proverbs Pr 20 20
Ecclesiastes Ec 21 21
Song of Solomon Song,  SS 22 22
Isaiah Isa 23 23
Jeremiah Je 24 24
Lamentations La 25 25
Ezekiel Eze 26 26
Daniel Da 27, 89 27
Hosea Ho 28 28
Joel Joe 29 29
Amos Am 30 30
Obadiah Ob 31 31
Jonah Jon 32 32
Micah Mi 33 33
Nahum Na 34 34
Habakkuk Hab 35 35
Zephaniah Zep 36 36
Haggai Hag 37 37
Zechariah Zec 38 38
Malachi Mal 39 39

 

 

Apocrypha Book name Abbreviation STEP # Book #
Tobit Tob 69, 93, 98 40
Judith Judi, Jdt 70 41
Esther (Greek) GrkEs 71 (additions only);

88 (Complete)

42
Wisdom Wis 72 43
Sirach Sir 73, 94, 99 44
Baruch Bar 74, 90 45
Letter of Jeremiah Let 75 46
Prayer of Azariah and the Song of the Three Jews Azar 76 47
Susanna Sus 77 48
Bel and the Dragon Bel 78 49
1 Maccabees 1Mac 79, 96 50
2 Maccabees 2Mac 80, 97 51
1 Esdras 1Esd 81 52
Prayer of Manasseh Man 82 53
Psalm 151 P1 86 54
3 Maccabees 3Mac 84 55
2 Esdras 2Esd 83, 87 56
4 Maccabees 4Mac 85 57
Daniel (Greek, complete with additions)
AddDan

58

 

 

New Testament Book Name Abbreviation STEP # Book #
Matthew Mat, Mt 40 64
Mark Mar, Mk 41 65
Luke Lu, Lk 42 66
John Joh 43 67
Acts Ac 44 68
Romans Ro, Rm 45 69
1 Corinthians 1 Co, 1Co 46 70
2 Corinthians 2 Co, 2Co 47 71
Galatians Ga 48 72
Ephesians Ep 49 73
Philippians Phili, Php 50 74
Colossians Col 51 75
1 Thessalonians 1 Th, 1Th 52 76
2 Thessalonians 2 Th, 2Th 53 77
1 Timothy 1 Ti, 1Ti 54 78
2 Timothy 2 Ti, 2Ti 55 79
Titus Tit 56 80
Philemon Phile, Phm 57 81
Hebrews He 58 82
James Ja 59 83
1 Peter 1 Pe, 1Pe 60 84
2 Peter 2 Pe, 2Pe 61 85
1 John 1 Jo, 1Jo 62 86
2 John 2 Jo, 2Jo 63 87
3 John 3 Jo, 3Jo 64 (14 vs.);

67 (15 vs.)

88
Jude Jude 65 89
Revelation Re 66;

68 (18 vs. in Ch. 12)

90

 

Special Character Tags

 

These tags indicate just a single character in the text.

 

Meaning Tag Comment
ASCII character (unused) <CAxx> xx is a hexadecimal value
> (unused) <CG> Literal greater-than sign.
< (unused) <CT> Literal less-than sign.
End of paragraph <CM> Ends paragraph or line of poetry. In prose, may cause blank line and/or indent.
End of Line <CL> Ends line without ending paragraph -- used in the first line of a poetic couplet. In poetry, the next line will be indented.
Unicode character (unused) <CUxxxx> xxxx is a hexadecimal value.

 

External Note and Highlight Files

External note and highlight files may either be prepared commentaries or notations made by the individual Bible student as he or she studies the Bible using a Bible search program. When a note and highlight file is open along with a Bible text, the notes are made available to the reader by a pop-up mechanism or a separate window, or printed at the bottom of a page as footnotes. The highlights are applied to the Bible text background as it is displayed. The note and highlight files are plain ASCII text, using the same tags as the Bible text for font and paragraph characteristics, plus the following:

Meaning Tag Comment
Start of note file <HN> This tag must be first.
Note (unused) <NNref-ref> The text following is a note pertaining to the Bible text included in the references indicated. References are of the form Book ch:vs wd, where Book may be an abbreviation (without spaces) or a number, and ch and vs are numbers. The number wd is the number of the word within the verse. If wd is omitted, then the whole verse is assumed. If vs is omitted, then the whole chapter is assumed. If ch is omitted, then the whole book is assumed. If the second reference is omitted, then the reference is assumed to cover only the first reference. The beginning and ending words are considered part of the range
Color (unused) <NC rrr ggg bbb ref-ref> Background highlight color expressed as three numbers from 0 to 255 for red (rrr), green (ggg), and blue (bbb) covering the reference indicated. The reference is interpreted just like it is for the note tag. For example, to highlight all of John 3:16 in green, the tag would be <NC 0 255 0 John 3:16>. To highlight "In the beginning" in John 1:1 with a shade of greenish blue, the tag would be <NC 0 64 255 John 1:1 1-John 1:1 3>
End of file <ZZ> Last token of the file. Anything after this token is ignored.

[Home]