You are not logged in.

#1 2013-06-19 09:35:54

ball
Member
From: Germany
Registered: 2011-12-23
Posts: 164

Create custom dictionary viewer

Hello there!

I've got a digital version of a very old but very good German-Latin dictionary (maybe German users know Digitale Bibliothek), however the GUI which I have to wrap in wine is absolutely not KISS and it doesn't fit into my workflow in a tiling wm. I just discovered that I can export the whole dictionary into different formats notably XHTML, which should make it possible to create a more user friendly interface.

The situation is as follows:

A file which contains all the German and latin lemmata alphabetically sorted (Latin and German mixed); sample:

abhinc#A139
abhold#C60768
abholen#C60769
Abholen#D60769
abholzen#C60770
abhorchen#C60771
abhorreo#A140
abhorresco#A142
abhorride#A143

A file which contains all the data; sample:

<p><a id="p0000139" /><strong>ab-hinc</strong>, Adv. I) <em>räumlich =</em> <em><strong>von hier,</strong></em> aufer abhinc lacrimas, Lucr. 3, 953 (954): toto abhinc orbe, Apul. flor. 16. – II) <em>in der Zeit:</em> 1) <em>von der gegenwärtigen Zeit rückwärts gerechnet,</em> a) <em>absol. =</em> <em><strong>von jetzt an,</strong></em> dies abhinc quintus an sextus est, cum etc., Apul.: centesimo usque abhinc saeculo, Fronto: anno abhinc tertio, Gell. – b) <em>mit</em> Acc. = <em><strong>seit nun, vor,</strong></em> abhinc annos sedecim, Caes.: abhinc annos prope viginti, Cic.: abhinc triennium, Cic., <em>Komik. u.</em> Apul. – c) <em>mit</em> Abl. = <em><strong>vor, vor ungefähr,</strong></em> qui abhinc sexaginta annis occisus foret, Plaut.: comitiis iam abhinc diebus triginta factis, Cic. <em>Vgl. (über</em> no. a <em>u.</em> b) <em>Madvig Bemerkgg. S. 65 f. Lorenz zu</em> Plaut. most. 479. – 2)<em> selten von der gegenwärtigen Zeit vorwärts gerechnet, absol. =</em> <em><strong>von jetzt an, von hier</strong></em> <em>od.</em> <em><strong>da ab,</strong></em> Pallad. 4, 13, 9. Symm. ep. 4, 59: <em>verb.</em> inde abhinc, Pacuv. 21.</p>
<div class="db5PageBreak"> </div>
<p> </p>
<p><a id="p0000140" /><strong>ab-horreo</strong>, uī, ēre, I) <em>vor etwas</em> <em><strong>zurückschaudern,</strong></em><em> etwas</em> <em><strong>verabscheuen,</strong></em> <em>gegen etwas</em> <em><strong>eine starke Abneigung haben,</strong></em> <em>von etwas</em> <em><strong>aus Abscheu oder Abneigung fernbleiben, -nichts wissen wollen,</strong></em> <em>jmdm. od. einer Sache</em> <em><strong>abhold sein,</strong></em> <em>gegen jmd. od. etwas</em><em><strong> eingenommen sein,</strong></em> <em>m.</em> ab <em>u. Abl.,</em> ab hac domo, Titin.: a pace, Caes.: ab re uxoria, Ter.: a ducenda uxore, Cic.: a Berenice, Tac.: <em>mit bl.</em> Abl. <em>(s. Nipperd. zu</em> Tac. ann. 14, 21), tanto facinore procul animo, Curt.: non abh. spectaculorum oblectamentis, Tac. – <em>nachaug. m.</em> Acc., cadaverum tabem, Suet.: pumilos, Suet.: exemplum huius modi, Dict.: <em>m.</em> Infinit., Augustin. serm. 184, 3. Porphyr. Hor. carm. 1, 1, 16. – <em>absol.,</em> sin plane abhorrebit et absurdus erit,<em> sollte er aber dazu gar keine Neigung u. Fähigkeit haben,</em> Cic. de or. 2, 85: omnes aspernabantur, omnes abhorrebant, Cic. Clu. 41: ut aut cupiant (sc. reo) aut abhorreant, Cic. de or. 2, 185: postquam abhorrere eos videt, Auct. b. Afr. 73, 5. – II) <em>übtr., gleichs. von Natur mit etwas</em> <em><strong>nicht im Einklang-, im Widerspruch stehen, unverträglich-,</strong></em> <em>ihm</em> <em><strong>zuwider sein, nicht zusagen, zuwiderlaufen,</strong></em> <em>zu etwas</em> <em><strong>nicht passen,</strong></em> <em>von etw.</em> <em><strong>abweichen,</strong></em> <em>von etw.</em> <em><strong>verschieden-, fern-,</strong></em> <em>ihm</em> <em><strong>fremd sein,</strong></em> ab oculorum auriumque approbatione, <em>den Augen und Ohren anstößig sein,</em> Cic.: oratio abhorret a persona hominis gravissimi, Cic.: abh. a fide, <em>unglaublich sein,</em> Liv.: consilium abhorret a tuo scelere, Cic.: spes ab effectu haud abhorrens, <em>Hoffnung der Ausführbarkeit,</em> Liv.: temeritas tanta, ut non procul abhorreat ab insania, Cic.: longe ab ista suspicione abhorrere debet, Cic.: a quo (vitae statu) mea longissime ratio voluntasque abhorret, Cic.: orationes abhorrent inter se, <em>widersprechen einander,</em> Liv.: <em>m. bl.</em> Abl. <em>(s. Nipperd. zu</em> Tac. ann. 14, 21), abhorrens peregrinis auribus carmen, Curt.: neque abhorret vero, Tac.: nec abhorrebat moribus uxor, Flor.: <em>u.m. bl.</em> Dat., huic tam pacatae profectioni abhorrens mos, Liv.: nec abhorret a veritate <em>m.</em> folg. Acc. <em>u.</em> Infin., Suet. Cal. 12, 3. – <em>Dah.</em> abhorrens, <em><strong>unpassend, unstatthaft,</strong></em> carmen nunc abhorrens, Liv.: vestrae istae absurdae atque abhorrentes lacrimae, Liv. – ☞ Abl. abhorrenti, Gell. 10, 12, 10.</p>
<div class="db5PageBreak"> </div>
<p> </p>
<p><a id="p0000142" /><strong>ab-horrēsco</strong>, ere, <em><strong>einen Abscheu bekommen,</strong></em> Eccl.<em> u.</em> Gloss.</p>
<div class="db5PageBreak"> </div>

Note how the id of the anchor tags correspond to the lemmata (the uppercase letter after the # may be ignored, it indicates the language or section of the lemma in the original binary file).

In a quick-n-dirty approach I hacked a bash script (I am a bloody beginner) making use of dmenu and lynx:

#!/bin/bash
# display dmenu with all the lemmata, strip the id's (lemmatas in the file "data/lemmata")
lemma="$(cat data/lemmata | sed -e 's/#[A-Z][0-9]*$//g' | dmenu)"
# search for the selected lemmata to get the whole string with the id
str="$(grep "^${lemma}[ .0-9()]*#[A-Z][0-9]*$" "data/lemmata")"
id="${str#*\#[A-Z]}"
# finally take the definition from the xml and pipe output to lynx
awk "/<a id=\"p0*${id}\" \/>/,/<div class=\"db5PageBreak\">/{if (!/<div class=\"db5PageBreak\">/)print}" < data/dictionary.xml | lynx -stdin

The script fires up dmenu which displays all the lemmata with all the identifiers (#<number>) stripped, then it searches the selected lemma in the lemmata file again to find the identifier and finally then the corresponding paragraph(s) is (are) taken and piped to lynx. (Using the tee program I could bind the script to a keybinding and open/close a terminal window displaying lynx)

However this is not robust -  there are identical Latin and German lemmata:

ab#B43
ab#C60601

So the approach above must fail because I don't pipe the identifiers to dmenu. I could alter the list which gets piped to dmenu to

ab [de]
ab [lat]

and could then get back to the identifier...

But the longterm goal would be a more suitable custom console interface: I am thinking of

  • a terminal window which is divided into two parts, the left showing the lemmata, the right showing the definition of the lemma currently selected (think of something like a console music player, e.g. moc)

  • possibility of an on the fly dmenu-like search of the lemmata while typing

  • for the viewer part: automatical wrapping of the lines, colored text for <strong> and <em> tags, stripping of the HTML, scrolling (a lemma definition/translation may be very long) and display of UTF-8

I tend to hack this in C as I lack the knowledge of any other language (as an emacs user I also thought of using emacs and displaying the definitions in w3m but I don't know any lisp other then setting variables...).

So my questions are:
Is this doable? If yes, which libraries should I consider? (ncurses, libhtml....)
Does anyone know a program which does something similar?
So far this would be my first program which makes use of menues, so any suggestions are very welcome! Any other ideas how to access lemmata and their definitions in a simple manner?

Last edited by ball (2013-06-19 09:44:33)

Offline

#2 2013-07-04 19:10:49

deepsoul
Member
From: Earth
Registered: 2012-12-23
Posts: 67
Website

Re: Create custom dictionary viewer

As you may have seen from the lack of replies, this is a bit hard to advise on.  Things that occur to me:

  • Rather than a library, you could use regular expressions for parsing, as you already did with awk.  They are available in C via POSIX functions, see the regex (3) manual page.

  • If all input you need is a decision between a small number of duplicate words, curses seems overkill.  Why not print all of them?  If you want more interaction, using curses is probably the right thing.

Personally I would probably use Perl for this task, which is the most productive programming language I know and is especially suited for text processing.  Seeing the rather advanced shell script you wrote (definitely more than beginner level), you may like it, as it was inspired by the shell scripts using sed, awk and the like.


Officer, I had to drive home - I was way too drunk to teleport!

Offline

Board footer

Powered by FluxBB