% copyright: Copyright (C) 2000, 2004, 2017 Anton Zinoviev
% title: Hyphenation patterns for Bulgarian
% version: 21 October 2017
% language:
%     name: Bulgarian
%     tag: bg
% notice: >
%     This file is part of the hyph-utf8 package.
%     See http://www.hyphenation.org/tex for more information.
% authors:
%     -
%         name: Anton Zinoviev
%         contact: anton:lml.bas.bg
% licence:
%     text: >
%         This software may be used, modified, copied, distributed, and sold,
%         both in source and binary form provided that the above copyright
%         notice and these terms are retained. The name of the author may not
%         be used to endorse or promote products derived from this software
%         without prior permission.  THIS SOFTWARE IS PROVIDES "AS IS" AND
%         ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED.  IN NO EVENT
%         SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT
%         OF THE USE OF THIS SOFTWARE.
% hyphenmins:
%     typesetting:
%         left: 2
%         right: 2
% changes: See below
% ==========================================
% Copyright (C) 2000,2004,2017 by Anton Zinoviev <anton@lml.bas.bg>
%
% This software may be used, modified, copied, distributed, and sold,
% both in source and binary form provided that the above copyright
% notice and these terms are retained. The name of the author may not
% be used to endorse or promote products derived from this software
% without prior permission.  THIS SOFTWARE IS PROVIDES "AS IS" AND
% ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED.  IN NO EVENT
% SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT
% OF THE USE OF THIS SOFTWARE.
%
% Bulgarian hyphenation patterns
%
% Generated by ./hyph-bg.sh --safe-morphology --standalone-tex
%
% Both left and right hyphenmins should be set to 2.
%
% % Automated Bulgarian Hyphenation
% % Anton Zinoviev
% % 21 October 2017
% 
% Principles of the Bulgarian hyphenation
% =======================================
% 
% One specificity of the Bulgarian language is that the average length
% of the words is greater than in English.  When typesetting a Bulgarian
% text, hyphenation is more important than when typesetting an English
% text.  Knuth's algorithm for line-breaking is such that in most
% English paragraphs no hyphenation will be used.  With a Bulgarian
% text, however, even the Knuth's algorithm will use hyphenation in most
% paragraphs.  Hyphenation becomes an absolute necessity if we want to
% obtain nice, justified paragraphs when using a software with dumb
% line-breaking algorithm, such as LibreOffice.
% 
% According to Decree 936 of the Council of Ministers promulgated on 27
% November 1950, the Institute for Bulgarian Language at the Bulgarian
% Academy of Sciences is authorised to publish the rules of the
% orthography of the Bulgarian language (within certain limits).
% 
% Hyphenation rules between 1945 and 1983
% ---------------------------------------
% 
% Between 1945 and 1983 Bulgarian used syllable hyphenation with two
% morphological exceptions: hyphenation is preferred between a prefix
% and a stem and at the boundary of compound words.  The following were
% the rules governing the hyphenation:
% 
% 1. One letter does not stay alone.  Words of one syllable can not be
%    hyphenated.
% 2. No hyphenation before or after ь.
% 3. In a sequence of vowels at least one vowel stays before the
%    hyphen.
% 4. A single consonant between two vowels links with the second vowel.
%    For example по-ле /po-le/, ра-бо-та /ra-bo-ta/.
% 5. In a sequence of consonants between two vowels, at least one
%    consonant stays with the second vowel.  For example те-сто /te-sto/
%    or тес-то /tes-to/.[^b]
% 6. In a sequence of consonants between two vowels, if the first
%    consonant is sonorant (й /y/, л /l/, м /m/, н /n/, р /r/), then it
%    stays with the first vowel.  For example гер-дан /ger-dan/, сен-ки
%    /sen-ki/.
% 7. The hyphenation separates two successive equal consonants. For
%    example времен-но /vremen-no/, пролет-та /prolet-ta/.
% 8. When the letters дж /dzh/ and дз /dz/ denote a single consonant,
%    then they are not separated.  For example боя-джия /boya-dzhiya/
%    but not бояд-жия /boyad-zhiya/.  When these letters denote two
%    consonants, then the normal rules apply: над-живявам
%    /nad-zhivyavam/.
% 9. Word prefixes may not be broken.  Compound words are hyphenated
%    either at the boundary of the components or the hyphenation rules
%    are applied to each of the components separately.  For example:
%    пред-упреждавам /pred-uprezhdavam/ (not пре-дупреждавам
%    /pre-duprezhdavam/), пред-известие /pred-izvestie/ (not
%    пре-дизвестие /pre-dizvestie/), за-движвам /za-dvizhvam/ (not
%    зад-вижвам /zad-vizhvam/), авто-клуб /avto-klub/ (not авток-луб
%    /avtok-lub/), вакуум-апарат /vakuum-aparat/ (not вакуу-мапарат
%    /vakuu-maparat/).
% 
% In some rare cases the proper application of rule 9 depends on the
% semantics of the word.  For example пре-дреша /pre-dresha/ 'change
% clothes' but пред-реша /pred-resha/ 'predetermine' or прес-пите
% /pres-pite/ 'the snow-drifts' but пре-спите /pre-spite/ 'sleep for a
% while/overnight'.
% 
% [^b]: In several publications this rule is formulated with the
%     additional restriction that the sequence of consonants begins with
%     an obstruent.  I believe this restriction is unintentional.  It
%     makes no sense to forbid a hyphenation of the form AB-A but to
%     permit ABB-A (A denotes a vowel and B – a consonant).
% 
% Hyphenation rules between 1983 and 2012
% ---------------------------------------
% 
% The Orthographic dictionary published by the Institute for Bulgarian
% language in 1983 introduced new hyphenation rules.  The complexity of
% the previous rules was the main reason for the change.  The new rules
% aimed at two objectives: simplicity and unambiguity.
% 
% The new rules are:
% 
% 1. A consonant between two vowels links with the second vowel.  For
%    example ви-со-чи-на /vi-so-chi-na/.
% 2. In a sequence of two or more consonants between two vowels, at
%    least one consonant stays with first vowel and at least one with
%    the second vowel.  For example сес-тра /ses-tra/ and сест-ра
%    /sest-ra/.
% 3. Two equal consonants are separated.  For example плен-ник
%    /plen-nik/.
% 4. In a sequence of two or more vowels, the first vowel stays before
%    the hyphen.  For example пре-одолея /pre-odoleya/ and прео-долея
%    /preo-doleya/.
% 5. In a sequence of three or more vowels, the last vowel stays after
%    the hyphen.  For example мао-изъм /mao-izam/ but not маои-зъм
%    /maoi-zam/.
% 6. The letter й /y/ between a vowel and a consonant stays with the
%    vowel.  For example май-ка /may-ka/.
% 7. When a sequence of two or more consonants follows й /y/ then at
%    least one consonant links with й /y/.  For example айс-берг
%    /ays-berg/ (not ай-сберг /ay-sberg/).
% 8. The letter й /y/ between two vowels links with the second vowel.
%    For example ма-йор /ma-yor/.
% 9. No hyphenation before or after ь.
% 10. When the letters дж /dzh/ denote a single consonant, then they are
%     not separated.  For example су-джук /su-dzhuk/ (not суд-жук
%     /sud-zhuk/) but над-живея /nad-zhiveya/.
% 11. There must be at least one vowel before and after the hyphen.
% 12. One letter does not stay alone.
% 
% The total disregard of the morphology by these rules leads to some
% strange results.  For example пре-дизвестие /pre-dizvestie/ is
% permitted and пред-известие /pred-izvestie/ is forbidden, зад-вижвам
% /zad-vizhvam/ is permitted and за-движвам /za-dvizhvam/ is forbidden,
% авток-луб /avtok-lub/ is permitted and авто-клуб /avto-klub/ is
% forbidden, вакуу-мапарат /vakuu-maparat/ is permitted and
% вакуум-апарат /vakuum-aparat/ is forbidden.  Because of this, the new
% rules were not universally accepted.  The old rules are still
% mentioned in various places in Internet, they are included even in
% some grammar books published by the publishing houses of the Ministry
% of Education and of Sofia University.  The software developers,
% however, soon came into love with the new hyphenation rules.
% 
% Hyphenation rules after 2012
% ----------------------------
% 
% In 2012 new rules came into force.  There are two differences with
% respect to the previous rules:
% 
% 1. Rule 5 of the previous rules is revoked.  For example маои-зъм
%    /maoi-zam/ becomes a valid hyphenation.
% 2. The new rules permit morphologically based hyphenation (however it
%    is not obligatory).  For example пред-известие /pred-izvestie/,
%    за-движвам /za-dvizhvam/, авто-клуб /avto-klub/, вакуум-апарат
%    /vakuum-aparat/ are valid hyphenations.
% 
% Good hyphenation is a complex matter and it seems the linguists at the
% Institute for Bulgarian Language have recognised this.  They no longer
% attempt to provide universal rules about everything.  Instead, they
% provide some very permissible rules while the good application of
% these rules is leaved to the discretion and the experience of the
% printers and the developers of hyphenation software.
% 
% It makes sense to use at least two different sets of hyphenation rules
% for Bulgarian.  In most cases a more restrictive version should be
% used, one which attempts to eliminate the controversial cases of
% hyphenation.  When typesetting a Bulgarian text in a narrow newspaper
% column, however, it will be appropriate to use more liberal
% hyphenation rules.  It should be noted that one of the reasons for the
% hyphenation reform in 1983 was the desire to fix the chaotic
% hyphenation in the Bulgarian newspapers at that time.
% 
% Computer implementations
% ========================
% 
% Mathematical analysis of the Bulgarian hyphenation
% --------------------------------------------------
% 
% The earliest mathematical analysis of the Bulgarian hyphenation rules
% belongs to Veska Noncheva.[^1] In 1988 she proposed a mathematical
% formalisation of the hyphenation rules in a table with 22 rows.[^2]
% 
% [^1]: <http://www.researchgate.net/profile/Veska_Noncheva>
% 
% [^2]: Нончева В. Алгоритъм за автоматично пренасяне на думи в
%     българския език. Математика и математическо
%     образование. Сб. доклади на 17. ПК на СМБ. С., БАН, 1988, 479-482.
% 
% In the same year Eugene Belogay[^3] proposed an alternative
% formalisation with only 9 rules.[^4] Belogay proved that his rules are
% consistent and that they form a minimal set.  The rules of Belogay
% have negative character – every hyphenation which is not forbidden by
% a rule is possible hyphenation.
% 
% [^3]: <http://www.linkedin.com/in/belogay>
% 
% [^4]: Белогай Е. Алгоритъм за автоматично пренасяне на думи. Компютър
%     за вас (1988) 3, 12-14.
% 
% The following are the first 7 rules, as formulated by Belogay:
% 
% 1. Б-А
% 2. А-ББ
% 3. Б-ТТ, ТТ-Б
% 4. ААА-Б
% 5. й-ББ
% 6. Б-ь
% 7. д-ж
% 
% Here А denotes an arbitrary vowel letter, Б denotes an arbitrary
% consonant letter (including ь and й), ТТ denotes a sequence of two
% equal consonant letters and the letters й, ь, д and ж denote
% themselves.  For example the rule "Б-А" says that we are not permitted
% to separate a consonant letter from immediately following vowel
% letter.
% 
% The eighth rule of Belogay says that hyphenation is forbidden before
% the first and after the last vowel letter.  The ninth rule of Belogay
% says that hyphenation is forbidden immediately after the first or
% immediately before the last letter of the word.
% 
% Notice that is is very easy to translate the rules of Belogay in the
% form, required for the hyphenation algorithm of Knuth and Liang used
% in TeX.[^a] Let us remind that this algorithm matches the word with a
% set of string patterns in which the odd numbers say hyphenation is
% permitted in this position and even numbers say the hyphenation is
% forbidden.  When two patterns give conflicting numbers for the same
% position, then the greater number wins.
% 
% First, since the rules of Belogay are negative (they say where
% hyphenation is forbidden, not where it is permitted), we have to
% permit the hyphenation everywhere:
% 
% 1. А1
% 2. Б1
% 
% Then, the first seven rules of Belogay obtain the form:
% 
% 1. Б2А
% 2. А2ББ
% 3. Б2ТТ ТТ2Б
% 4. ААА2Б
% 5. й2ББ
% 6. Б2ь
% 7. д2ж
% 
% Since no Bulgarian word starts with more that four consonants and no
% Bulgarian word ends with more than three consonants, the eighth rule
% of Belogay can be translated in the following way:
% 
% 1. .Б2
% 2. .ББ2
% 3. .БББ2
% 4. 2Б.
% 5. 2ББ.
% 
% The ninth rule of Belogay means that left and right hyphen mins should
% be set to 2.
% 
% The work of Eugene Belogay was not limited to merely a mathematical
% analysis of the Bulgarian hyphenation rules.  In his paper he
% published a short algorithm in Pascal which implements these rules.
% It didn't take long for this algorithm to be used in various text
% processing software.  The algorithm of Belogay was famous for many
% years.  Even as late as 1997 in one book about TeX, the author didn't
% care to give any explanations but simply wrote about "the algorithm of
% Belogay" as something well known to the reader.[^5]
% 
% [^a]: Liang, Franklin Mark. Word Hy-phen-a-tion by
%     Com-put-er (Doctoral Dissertation). Stanford University, 1983
% 
% [^5]: Василев В. Ултимативният ТеХ.  Удоволствието да правим
%     предпечатна подготовка сами. София, Интела, 1997, 36
% 
% Bulgarian hyphenation in TeX
% ----------------------------
% 
% One unfortunate design decision of Knuth was that the hyphenation
% algorithm of TeX applied the hyphenation patterns not to the input
% character codes but to the internal codes of the glyphs in the font.
% This created a problem for the Cyrillic languages because in TeX the
% Cyrillic fonts did not have standardised encoding.  Perhaps this is
% one of the reasons why the earliest implementations of the Bulgarian
% hyphenation in TeX did not rely on the internal hyphenation algorithm
% of TeX.  Instead, external tools were used to insert soft hyphens in
% all Bulgarian words.  For example such a tool would replace the word
% сричкопренасяне /srichkoprenasyane/ with
% срич\\-коп\\-ре\\-на\\-ся\\-не /srich\\-kop\\-re\\-na\\-sya\\-ne/.
% The saying "To every disadvantage there is a corresponding advantage"
% is true – since Cyrillic and Latin letters use different character
% codes, an external tool could easily insert soft hyphens in all
% Bulgarian words while leaving the TeX commands intact.
% 
% The earliest known attempt to use the hyphenation algorithm of TeX for
% Bulgarian was made by Ognyan Tonev in 1990.[^6] He described his work
% as "a not very good translation of the rules.  I work in this
% direction.  But I don't have a 100% working complect of patterns.  So,
% the copy I send to you[^7] is only a beta-version."  The hyphenation
% patterns of Tonev don't work correctly and it seems he never completed
% his work.
% 
% [^6]: The author of this text was unable to find current information
%     about Ognyan Tonev in Internet.  Apparently in 1990 he worked in
%     the Center of Informatics and Computer Technology of the Bulgarian
%     Academy of Sciences.
% 
% [^7]: To Yannis Haralambous,
%     <http://perso.telecom-bretagne.eu/yannisharalambous>
% 
% The first usable Bulgarian hyphenation patterns for TeX were developed
% by Georgi Boshnakov[^8] in 1994.  In order to solve the encoding
% problem, Boshnakov had developed TeX fonts supporting the MIK encoding
% (the prevalent encoding at that time in Bulgaria).  This allowed him
% to introduce a fully working implementation only a few months after
% LaTeX2e became the official LaTeX version.  Later Boshnakov modified
% his work with the Babel system.  The hyphenation patterns of Boshnakov
% did their job well enough, so that for almost quarter a century after
% their initial creation, they remained the only Bulgarian hyphenation
% patterns in the standard distributions of TeX and CTAN.
% 
% [^8]: <http://www.maths.manchester.ac.uk/~gb/>
% 
% There are some similarities between the patterns of Boshnakov and the
% patterns of Belogay.  The following are the main differences.
% 
% First, Boshnakov used an ingenious and more compact implementation of
% the second and the third rule.  Instead of {А2ББ, Б2ТТ, ТТ2Б}, or
% 8×22×22+22×22+22×22=4840 patterns in total, Boshnakov has patterns of
% the form 2Б3Б2 and 4Т3Т4, or only 22×22=484 in total, with the same
% effect.
% 
% The second main difference between the patterns of Boshnakov and the
% patterns of Belogay concerns the letter combination дж /dzh/.  In
% Bulgarian this letter combination can denote either a single
% consonant, or a sequence of two consonants and the hyphenation rules
% change respectively.  Unfortunately, it is impossible to know the
% meaning of дж /dzh/ without a vocabulary.  The solution of Belogay was
% a cautious one – his rules do the hyphenation in a way which will be
% correct regardless of whether дж /dzh/ is a single consonant or a
% sequence of two consonant.  On the other hand, the approach of
% Boshnakov is a bold one – since дж /dzh/ is more often a single
% consonant, his rules assume that it is always a single consonant.  The
% number of the cases when this decision leads to bad hyphenations is
% insignificant in comparison with the cases in which we obtain improved
% hyphenation.
% 
% The third main difference between the patterns of Boshnakov and the
% patterns of Belogay concerns the eighth rule – its implementation in
% the rules of Boshnakov is rather limited which leads to wrong
% hyphenations like бри-дж /bri-dzh/.  A full implementation of this
% rule would require 11660 patterns in total and this would be too much
% for the computers in 1994.
% 
% Later developments
% ------------------
% 
% In 1995 Atanas Topalov defended a Masters thesis in the Faculty of
% Mathematics and Informatics at Sofia University titled "Algorithms and
% software about text processing".[^9] One of the main topics in his
% thesis was the Bulgarian hyphenation.  Topalov criticised vehemently
% the official hyphenation rules and their total disregard of the
% morphology.  He wrote:
% 
% > If we look at the history of the problems of the hyphenation, we
% > will discover something very strange.  Instead of the expected
% > involvement with the depths and aspiration for more admissible and
% > satisfactory style, we can find a growing tendency for
% > simplification.  One unpleasant discovery is that the development of
% > the hyphenation software stays firmly on the principle "let us do
% > the easiest thing".  The earliest works which have been studied are
% > from 1978.  It turned out that they present the best approach
% > concerning the automated hyphenation.  The authors have chosen the
% > most difficult but the most correct (from literary point of view)
% > method for hyphenation, namely the morphological approach.
% 
% Topalov proposed his own hyphenation algorithm.  The hyphenation it
% generated was smooth and easy to read.  One obvious defect of the
% algorithm of Topalov was that it contradicted the official hyphenation
% rules at that time.  One can argue, however, that his algorithm is
% compatible with the current hyphenation rules.
% 
% [^9]: The thesis of Atanas Topalov can be accessed at the author's
%     website <http://www.mind-print.com>
% 
% In 1999 Svetla Koeva[^10] wrote a paper about the automated Bulgarian
% hyphenation.[^11] At that time she was a junior member of the
% Department of Computational Linguistics at the Institute for Bulgarian
% Language but now she is a director of the whole institute.  The paper
% of Koeva contains a list of hyphenation patterns which can be used as
% a basis of automated hyphenation.  In 2004 with the help of Stoyan
% Mihov[^12] the rules of Koeva were formalised with regular relations
% and rewriting rules.  They were implemented in a software product
% named ItaEst which provided Bulgarian hyphenation and grammar checking
% for various software products of Microsoft and Apple.
% 
% [^10]: <http://dcl.bas.bg/svetla_koeva/>
% 
% [^11]: Коева, Светла. Правила за пренасяне на части от думите на нов
%     ред. Български език. 1999/2000, 1, 84-86
% 
% [^12]: <http://lml.bas.bg/~stoyan/>
% 
% The main differences between the hyphenation of Koeva and the official
% hyphenation rules effective after 2012 is that the separation of a
% long sequence of consonants between two vowels is done according to
% the rules valid before 1983.  For example се-стра /se-stra/ and
% ай-сберг /ay-sberg/ are permitted.  The main difference between the
% hyphenation of Koeva and the official hyphenation rules effective
% before 1983 is that the rules of Koeva disregard the morphology of the
% words.  The following rule of Koeva is specific: in a sequence of two
% sonorant consonants between two vowels, we are permitted to separate
% the first vowel from the first consonant, for example материа-лна
% /materia-lna/.
% 
% In 2000 Anton Zinoviev[^13] created new hyphenation patterns for TeX.
% He didn't know about the previous work of Boshnakov and he didn't
% bother to make his work available in the various TeX distributions and
% CTAN.  His work was used mostly by the local Linux enthusiasts and the
% colleagues of Zinoviev.  In 2001 Radostin Radnev[^14] created a free
% grammar dictionary of Bulgarian[^15] where he used the hyphenation
% patterns of Zinoviev.  From there the work of Zinoviev propagated to
% OpenOffice, LibreOffice and various online dictionaries, including
% <http://bg.wiktionary.org> and <http://rechnik.chitanka.info>.
% 
% [^13]: The author of this text.
% 
% [^14]: <http://bg.linkedin.com/in/radostinradnev>
% 
% [^15]: <http://bgoffice.sourceforge.net/>
% 
% The following are the main differences between the hyphenation of
% Zinoviev and the hyphenation of Boshnakov.
% 
% First, the eighth rule of Belogay is fully implemented.
% 
% Second, the rules of Zinoviev try to detect when the letters дж /dzh/
% (and дз /dz/) denote a single consonant and when they denote a
% sequence of two consonants.  By default, however, Zinoviev (like
% Boshnakov) assumes that дж /dzh/ is a single consonant and hyphenates
% accordingly.
% 
% Third, the rules of Zinoviev disable some cases of unpleasant
% hyphenations:
% 
% 1. In a consonant sequence like тст /tst/, the two equal consonants т
%    /t/ are separated.  For example братст-во /bratst-vo/ is forbidden
%    while братс-тво /brats-tvo/ and брат-ство /brat-stvo/ are
%    permitted.
% 2. The hyphenation is forbidden after a sonorant consonant following
%    an obstruent consonant.  For example отм-ра /otm-ra/ is forbidden
%    and от-мра /ot-mra/ is permitted.
% 3. The hyphenation separates two consecutive kindred voiced/voiceless
%    consonants.  For example субп-родукт /subp-roduct/ is forbidden and
%    суб-продукт /sub-product/ is permitted.
% 
% At the start of his work on the Bulgarian hyphenation, Zinoviev had
% the opportunity to discuss the hyphenation with Svetla Koeva.  He
% remembers that some cases of unpleasant hyphenation were suggested to
% him by Koeva.  Unfortunately, he hasn't taken notes so now he doesn't
% know which cases of unpleasant hyphenation have been suggested to him
% by Koeva and which are his own findings.
% 
% The present work
% ================
% 
% Motivation
% ----------
% 
% The present work was carried out on the initiative of the leader of
% the Bulgarian localisation team of Mozilla, who contacted Zinoviev,
% Boshnakov and the maintainers of the TeX hyphenation patterns.[^17]
% This work pursues the following main objectives:
% 
% 1. to update the hyphenation patterns in accordance with the current
%    hyphenation rules;
% 2. to generate the hyphenation patterns by a publicly available
%    script;
% 3. to make the hyphenation patterns customisable;
% 4. to provide documentation for the future developers.
% 
% [^16]: <http://mozillians.org/en-US/u/stoyan/>
% 
% [^17]: <http://hyphenation.org>
% 
% The current official hyphenating rules for Bulgarian are rather
% liberal.  Very often, in a long sequence of consonants we are
% permitted to split the word at any position, for example аген-т-с-т-во
% /agen-t-s-t-vo/.  This is prone to many unusual and unexpected results
% that interrupt the attention of the reader or deceive his expectations
% during the movement of his eyes to the next line.  On the other hand,
% in order to produce nice justified paragraphs there is no need for so
% many hyphenation possibilities.  It would be sufficient even if only
% one possible separation between any two syllables was permitted.
% 
% Therefore, it makes sense to use a more restrictive version of the
% Bulgarian hyphenation, one which eliminates the controversial cases of
% hyphenation.  Only when typesetting a Bulgarian text in a very narrow
% newspaper column it will be appropriate to use a more liberal version.
% It should be noted that some specialised English dictionaries also
% separate the word-division positions into two categories – preferred
% positions and less recommended positions.
% 
% There are two methods to determine the optimal division within a
% sequence of consonants between two vowels:
% 
% * we can hyphenate according to the syllables in the word or
% * we can hyphenate morphologically.
% 
% Hyphenation according to the syllables in the word
% --------------------------------------------------
% 
% Let us look at the properties of the Bulgarian syllables.  All
% syllables have the following structure:
% 
% > onset - nucleus - code
% 
% The nucleus in Bulgarian is always a vowel.  Both the onset and the
% code are (possibly empty) sequences of consonants.
% 
% The Bulgarian syllables adhere to the Sonority Sequencing Principle.
% According to this principle, the consonants within the onset have
% raising sonority and the consonants within the code have decreasing
% sonority.
% 
% Several grammar books agree that the following sonority scale is valid
% for Bulgarian:
% 
% > voiceless obtrusive < voiced obtrusive < sonorant consonant < vowel
% 
% According to the investigations of the author, the only exception to
% this law is due to the letter в /v/ which is a voiced obtrusive but it
% can be used also as a voiceless obtrusive.  This exception is due to a
% spelling particularity of the Bulgarian language.  Whenever the letter
% в /v/ seemingly violates the Sonority Sequencing Principle, in the
% spoken language this letter is read as ф /f/, that is as a voiceless
% obtrusive (for example the word отвсякъде /otvsyakade/ is read as
% отфсякъде /otfsyakade/).[^18]
% 
% [^18]: No Primitive Slavonic word contains the phoneme ф /f/.
% Therefore, we can safely assume that in the Primitive Slavonic
% language the consonant ф /f/ was a positional variant of the consonant
% в /v/.
% 
% The author has found that the sonorant consonants in Bulgarian have
% their own sonority scale:
% 
% > м /m/ < н /n/ < л /l/ < р /r/ < й /y/
% 
% Only a few words such as жанр /zhanr/ and химн /himn/ violate this
% scale.  Such words are always loan-words and their pronunciation is
% somewhat problematic for the native Bulgarian speakers.
% 
% In addition to the Sonority Sequencing Principle, the consonant
% clusters within the Bulgarian syllable adhere to the following
% additional principles:
% 
% 1. Both in the onset and in the code, the labial and dorsal plosives
%    precede the coronal plosives and affricates.
% 2. If the onset or the code contains two plosives or affricates, then
%    there are no fricatives between them.  Few words with the Latin
%    root 'text' are exceptions: контекст /kontekst/.
% 3. If the onset or the code contains two fricatives other than в /v/,
%    then there are no plosives or affricates between them.
% 4. If the onset or the code contains two plosives or affricates, then
%    they both have equal sonority (both are voiced, or both are
%    voiceless).
% 5. If the onset or the code contains two fricatives other than в /v/,
%    then they both have equal sonority (both are voiced, or both are
%    voiceless).
% 6. Neither the onset, nor the code may contain two labial plosives, or
%    two coronal plosives or affricates or two dorsal plosives.
% 7. Neither the onset, nor the code may contain two equal consonants
%    with the exception of в /v/ (for example втвърди /vtvardi/).[^19]
% 
% [^19]: Actually, the letter в /v/ is not a real exception because in
% all such cases this letter denotes two different consonants – в /v/
% and ф /f/.  Only in the Russian loan-word взвод /vzvod/ the two
% letters в /v/ denote a repeating consonant в /v/.
% 
% From all these properties of the Bulgarian syllable we can deduce the
% following hyphenation rules:
% 
% 1. In a sequence МК where М is a consonant with higher sonority than
%    K, we are not permitted to hyphenate before М.  Exception: when М
%    is в /v/ and К is a voiceless consonant.
% 2. In a sequence КМ where М is a consonant with higher sonority than
%    K, we are not permitted to hyphenate after М.
% 3. In a sequence KBT where K and T are plosives or affricates and B is
%    fricative, we separate K from T.
% 4. In a sequence CKB where K is a plosive or affricate and C and B are
%    fricatives other than в /v/, we separate C from B.
% 5. If in a consonant sequence a coronal plosive or affricate Т is
%    followed by a labial or dorsal plosive К, then we separate Т from К.
% 6. If a consonant sequence contains two plosives or affricates, one
%    voiced and one voiceless, then we separate them.
% 7. If a consonant sequence contains two fricatives other than в /v/,
%    one voiced and one voiceless, then we separate them.
% 8. If a consonant sequence contains two labial plosives or two coronal
%    plosives or affricates or two dorsal plosives then they are
%    separated.
% 9. If a consonant sequence contains two equal consonants (not
%    necessarily consecutive), then they are separated.
% 
% With so many prohibitive rules, a question arises: if we apply all
% these rules, aren't we going to eliminate too many hyphenation
% possibilities?  The answer is no.  It can be demonstrated that between
% any two consecutive syllables at least one separation point will be
% permitted.
% 
% 
% Hyphenation according to the morphology
% ---------------------------------------
% 
% Between 1983 and 2012 the official orthographic rules of the
% Bulgarian language forbade morphologically based hyphenation.  After
% 2012 such hyphenation is permitted (but not obligatory).
% 
% The most important case when it is very desirable to use
% morphologically based hyphenation is the case of the compound words.
% Divisions such as авток-луб /avtok-lub/ and вакуу-мапарат
% /vakuu-maparat/ are extremely irritating even if they are formally
% correct.  Unfortunately, we do not have a vocabulary of the compound
% Bulgarian words that would permit us to produce rules for automated
% hyphenation.  Therefore, the current Bulgarian hyphenation patterns do
% not attempt to apply morphological hyphenation to such words.
% 
% Second in importance (but far more significant in terms of numbers) is
% the case with the word prefixes.  While the eyes of the reader still
% look at the start of the word, the word is still unknown to him.  At
% this point, it is very important not to deceive his expectations.  For
% example, when the reader sees над- /nad-/ at the end of the line, he
% will expect that this is the prefix над- /nad-/ with semantics 'attain
% more than'.  This expectation will be fooled if this wasn't really a
% prefix, but a deceiving (while formally correct) hyphenation of the
% word надремя /nadremya/ 'have dozed enough' where the real prefix is
% not над- /nad-/ but на- /na-/ with semantics 'achieve a state after
% accumulation'.  Such hyphenation distracts the reader and makes the
% reading more difficult.
% 
% Third in importance is the case with the word suffixes.  With respect
% to the hyphenation rules we can divide the suffixes into three
% categories:
% 
% 1. Suffixes starting with a vowel, for example -ар /-ar/.  It is not
%    appropriate to follow the morphology with such suffixes because
%    this will contradict the whole hyphenation tradition of the
%    Bulgarian language.  For example крав-ар /krav-ar/ is unwarranted.
% 2. Suffixes starting with one consonant, for example -ка /-ka/.
%    Usually with such suffixes the syllable boundary in the word
%    coincides with morpheme boundary so no specific cares are
%    necessary, for example кравар-ка /kravar-ka/.  The exceptions are
%    rare, for example: обек-тната /obek-tnata/ instead of обект-ната
%    /obekt-nata/.
% 3. Suffixes starting with more than one consonant (-ски /-ski/, -ство
%    /-stvo/).  It is possible to use morphological hyphenation rules
%    with such suffixes.
% 
% Even if it is possible to use morphological hyphenation with the
% suffixes of the third category, it turns out, this is not as useful as
% it is with the case of the prefixes.  When the eyes of the reader have
% reached this part of the word, the word is already more or less known
% to the reader.  Therefore, at this point the morphological hyphenation
% does not provide any significant advantages in comparison to the
% simpler hyphenation based only on the syllables in the word.  Consider
% for example the word геройс-тво /geroys-tvo/ with suffix -ство
% /-stvo/.  When the reader sees геройс- /geroys-/ at the end of the
% line this will give him an early clue that the suffix of the word is
% -ство /-stvo/.  Such non-morphological hyphenation does not deceive
% the expectations of the reader.  On the contrary, it makes the reading
% easier because it gives clues to the reader about what follows on the
% next line.
% 
% Because of these considerations, the current Bulgarian hyphenation
% patterns do not attempt to use morphological hyphenation with respect
% to the suffixes of the words.  Though it would be useful to implement
% rules about the suffixes of the second cateogory.  Hopefully, some
% future version will have such rules.
% 
% Occasionally,[^20] a fourth morphological requirement is stated: that
% hyphenation should conform with the boundary between the word and the
% definitive articles -та /-ta/ and -те /-te/ (postfixed in Bulgarian).
% There is no need to pay attention to this rule because it seems to be
% satisfied by its own nature.  The author has searched in a dictionary
% with over 860000 Bulgarian words for cases when the hyphenation rules
% would hyphenate badly with respect to the definitive article.  He was
% unable to find even one such case with the hyphenation rules valid
% after 1983 and only about 10 cases with the rules valid before 1983
% (one of them is живопи-ста /zhivopi-sta/ instead of живопис-та
% /zhivopis-ta/).
% 
% One unavoidable characteristic of any morphologically based automated
% hyphenation is that it can create wrong hyphenations.  Because of
% this, one useful option is to use the morphology in a safe way – to
% use it in order to forbid bad hyphenations but to create no new
% hyphenation possibilities solely on the basis of the morphology.
% 
% Take for example the word дозрея /dozreya/ 'ripen fully'.  According
% to the phonological rules, we should hyphenate it as доз-рея
% /doz-reya/.  According to the morphology, however, we should hyphenate
% as до-зрея /do-zreyq/ because this word is formed with the prefix до-
% /do-/ with semantics 'complete or supplement' and this semantics would
% be lost if the reader sees доз- /doz-/ at the end of the line.
% Therefore, there are three methods to hyphenate this word:
% 
% 1. доз-рея /doz-reya/ when morphology is not used;
% 2. до-зрея /do-zreya/ when morphology is fully used;
% 3. дозрея /dozreya/ (no hyphenation) when morphology is used in a safe
%    way.
% 
% The option to use the morphology in a safe way is very attractive when
% the software uses a smart line-breaking algorithm which can produce
% good results even with less hyphenation possibilities.  TeX is one
% such software.  It should be noted that this option does not eliminate
% too many hyphenation possibilities because the morpheme boundaries
% most of the time are also syllable boundaries.
% 
% [^20]: Правописен и правоговорен наръчник. Състав. Иван Хаджов,
%     Цв. Минков; Ред. Ив. Хаджов и др. София, Бълг. кн., 1945
% 
% The following are results of a statistics about the quality of the
% morphological rules (the number after the sign ± is the expected
% standard deviation of our estimations):
% 
% With the option `--morphology`:
% 
% * in 0.1% ±0.3% of the dictionary words the morphological patterns
%   create very wrong hyphenation;
% * in 89.8% ±0.1% of the dictionary words the morphological patterns
%   hyphenate identically with the case when no morphology patterns are
%   used;
% * in 0.3% ±0.2% of the dictionary words the morphological patterns
%   hyphenate differently in comparison to the case when no morphology
%   patterns are used and the word is hyphenated in a way which
%   contradicts the morphology;
% * in 0.6% ±0.1% of the dictionary words the morphological patterns
%   hyphenate differently in comparison to the case when no morphology
%   patterns are used and there is a possible hyphenation which is
%   compatible with the word morphology but which is nevertheless
%   forbidden by the morphology patterns.
%   
% With the option `--safe-morphology`:
% 
% * in 0% of the dictionary words the morphological patterns create very
%   wrong hyphenation;
% * in 90.0% ±0.1% of the dictionary words the morphological patterns
%   hyphenate identically with the case when no morphology patterns are
%   used;
% * in 0.3% ±0.2% of the dictionary words the morphological patterns
%   hyphenate differently in comparison to the case when no morphology
%   patterns are used and the word is hyphenated in a way which
%   contradicts the morphology;
% * in 0.6% ±0.1% of the dictionary words the morphological patterns
%   hyphenate differently in comparison to the case when no morphology
%   patterns are used and there is a possible hyphenation which is
%   compatible both with the word morphology and with the syllable
%   boundaries but which is nevertheless forbidden by the morphology
%   patterns.
%   
% Notice that the morphological patterns create a different hyphenation
% only in about 10% of the words.  The following explanation can be
% given for this surprising fact.  First, the natural evolution of the
% human languages tends to simplify the complex sequences of consonants.
% Therefore, no morpheme contains a complex sequence of consonants.  And
% second, the Bulgarian orthography is morphological.  This means that
% the morphemes are written according to their actual pronunciation,
% however the simplifications in the spoken languages which take place
% at the morpheme boundaries are not taken into account in the
% orthography.  The independent operation of these two factors leads to
% the result that most of the time the morpheme boundaries coincide with
% the conventional syllable boundaries.  The main exception to this is
% when a morpheme starts with a vowel, in this case its syllable will
% include one or more consonants of the preceeding morpheme.  The second
% exception is when a morpheme ends with a vowel and the next morpheme
% starts with a sequence of two or more consonants.
% 
% Usage of the script `hyph-bg.sh`
% --------------------------------
% 
% The `hyph-bg.sh` is all-in-one script which can generate both
% documentation (this text) and Bulgarian hyphenation patterns.  When
% given the option `--help` the script gives short usage instructions:
% 
% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
% hyph-bg.sh --help
%           Show this info
% hyph-bg.sh [--doc-html | --doc-latex | --doc-txt]
%           Print documentation in various formats
% hyph-bg.sh [other options]
%           Generate Bulgarian hyphenation patterns
% 
% Options when generating hyphenation patterns:
% 
%  --standalone-tex
%           Produce hyphenation patterns for TeX with \patterns{ ... }.
% 
%  --no-hyphen-mins
%           Hyphenation patterns which do not require hyphen mins.
%           Otherwise: both left and right hyphen mins should be set to 2.
% 
%  --safe-dz
%           Do not try to guess whether DZ is a single consonant or not.
%           Only use hyphenation which will be correct in both cases.
% 
%  --permissible
%           Permit any formally correct hyphenation, including unnatural
%           divisions, such as studen-tstvo.  Useful for educational tools
%           or when typesetting Bulgarian text in a very short column.
% 
%  --morphology
%           Apply morphology when hyphenating, for example: za-dvizhvam.
%           May hyphenate incorrectly in some cases.
% 
%  --safe-morphology
%           Apply morphology when hyphenating.  Never hyphenates incorrectly
%           but may prohibit some correct hyphenations.
% 
%  --no-morphology
%           Disregard the morphology.  Default.
% 
%  --1945
%           Hyphenate according to the rules effective between 1945 and 1982
% 
%  --1983
%           Hyphenate according to the rules effective between 1983 and 2011
% 
%  --2012
%           Hyphenate according to the rules effective after 2012.  Default.
% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
% 
% The following are the recommended ways to generate hyphenation
% patterns by this script:
% 
% `hyph-bg.sh --standalone-tex --safe-morphology`
% :   For TeX.  Apply the morphology in a safe way when the software
%     uses a smart line-breaking algorithm.
% 
% `hyph-bg.sh`
% :   For most other software.
% 
% `hyph-bg.sh --no-hyphen-mins`
% :   The current versions of Mozilla (as of 2017) seem to ignore the
%     hyphen mins in words that contain a dash.
% 
% `hyph-bg.sh --morphology`
% :   For professional typography with human proof-reader.
% 
% `hyph-bg.sh --permissible`
% :   For educational tools and online dictionaries which can show only one
%     kind of hyphenation.
% 
% Notice that some specialised English dictionaries separate the
% word-division positions into two categories – preferred positions and
% less recommended positions.  It would be best if the Bulgarian online
% dictionaries could do the same.  For example hyphen "-" can be used to
% display the preferred positions and dot "." – the less recommended
% positions.  If a word-division position is permitted only by the
% patterns of `hyph-bg.sh --permissible`, then this position is less
% recommended.
% 

\message{Bulgarian hyphenation patterns (options: --safe-morphology --standalone-tex, version 21 October 2017)}
