A verb conjugator in C (was: Ideas for an open source project in C)

wuischke · 2007-12-26 20:29:42

Hi,

I have to write a relatively complex program in C for my university programming course. We can either use a given topic (a hotel management software - quite some work, but not very challenging) or something of our own choice provided it is not too simple.

Being a good open source citizen, I want to do something from which other people might profit.

My first thought was a xmms2 client after an idea I had for a long time, but that's no fun without objects, particularly because xmms2 uses boost as well. I'm not sadistic enough to do this. (I like object orientated programming...:()

We can use external libraries (e.g. ncurses, GTK2), but should use ansi C (that's C89) and it should run on Windows as well, but I might talk my way out of these two things if necessary.

I'll do a lot of brainstorming myself, but I would appreciate it a lot if you could give me ideas. It would be a pity to waste this opportunity with a useless hotel management software if I could create something usefull instead.

Thanks a lot for any ideas.

kind regards

Last edited by wuischke (2007-12-27 11:54:57)

Eradest · 2007-12-26 20:59:50

if I could just have what I want ... I really miss an audio player for linux that's similar to foobar2000 in windows (www.foobar2000.org). I supports things like flexible tagging (i. e. adding own custom tags to audio files, very very handy e.g. for classical music), replaygain, masstagging (guessing tags from filenames and vice versa) etc.

But I think this would indeed be a very large project, and probably it would be easier to do it in an OOP language like c++ and not plain c.
I'm not sure about the sound support, be you could write the gui in e.g. QT which would also make it portable to windows

just an idea...

Allan · 2007-12-26 21:49:09

Really, you need to find something you are interested in doing yourself. Do you have a task you think should be automated? How about some software you think is inadequate? Without your own motivation, coding a big project can just get boring.

Anyway, being selfish, how about a full on GTK based latex IDE. Amyedit is just not full featured enough.

wuischke · 2007-12-27 09:05:37

I've got an idea, although it will have a limited user base.

When I write a verb conjugator for Italian (if I have the time I can write Spanish or German data files as well) I can make my C prof and my Italian teacher happy. I want it to act like a local webserver and control it with a browser, but the actual front-end could use a terminal or GTK2 as well, haven't decided yet.

I'm busy reading about garbage collection (read: simplified memory management), multi-platform TCP-sockets and similar stuff anyway, so if anyone has a good idea, please keep it coming.

Your above ideas have both a big problem: Not using existing code would be crazy. But with the amount of code already available it would be almost like piecing a puzzle together and that's not exactly what our prof expects.

Last edited by wuischke (2007-12-27 09:17:31)

Mantaar · 2007-12-27 10:22:38

wuischke wrote:

I've got an idea, although it will have a limited user base.

Why limited user base? It's a very nice idea - basically what I'm doing over here all day (OK, I'm more interested in parsing/semantics and word/neural nets, but morphology is still there somwhere).

Basically, you'll be happy to hear that doing this doesn't really require OOP. I think you should have heard about FSA-Theory - that's pretty much the holy grail of Morphology in NLP for now - though Kartunnen, the guy who actually proposed that back in the seventies or eighties already acknowledged it's not quite adequate for Finnish, his native language. But for Italian you should be OK with FSAs. You can then wade through a text and even recognize those forms (Basically, just producing them is quite boring... at least for a Computational Linguist )

If you need pointers to theory about that topic, just ask me, I can provide you with plenty (hopefully).

ibendiben · 2007-12-27 11:19:38

It would be great if you could extent that idea to become some sort of wikipedia for languages. Dictionary, verbs, but also frequency lists, for anyone to access and add… (well with some buffer to keep it accurate):
http://wordsgalore.com/
It's been an idea of mine lately. Imagine the availability of the 2000 most commonly used words in each languages with pronunciation, images, everything...
Add to that internet tv:
http://nl.wwitv.com/
And this becomes monstrous... honestly I learned Spanish like that in a couple of months and beated anyone else using their thousand of euro's classes!

wuischke · 2007-12-27 11:43:12

I'm still not very familiar with the Italian conjugation, but judging from Spanish it's a pretty simple thing to do, at least when you only think about outputting all the conjugated forms for an infinitive. I'll make an example with Spanish, but it's almost the same for Italian:

procedure:
Step 1: Check for suffix -ar/-er/-ir and the root of the verb and apply the default rules as defined in the data file. (i.e. suffixes -o,-as,-a,-amos,-áis,-an for -ar and present and so on)
Step 2: Check the data file for an entry for this verb and replace any irregular forms. (This means replacing all forms for a irregular verb or maybe only the participle - imho the most efficient way while still KISS)

Example: comer
Step 1: suffix -er, root com; use rules in data file for -er
Step 2: verb is not in database, regular

Example: escribir
Step 1: suffix -ir, root escrib; use rules for -ir
Step 2: verb is in database, only exception to regular conjugation is the particible "escrito"

Example: ir
Step 1: suffix -ir, no root; apply rules for -er (Doesn't make sense, but it is irregular anyway)
Step 2: verb is in database, about everything is replaced

Recognizing the forms, i.e. doing a reverse search is a bit harder this way, but actually really interesting.

ibendiben: I like the idea, but I wouldn't realize this in C, but a follow-up project written in PHP is very well possible. I'll have the data files and algorithms already, so it's not too much work.
But I wanted to do some work on the aMule skin code as well or we might actually release 2.2.0 before I'm finished...

Edit: I think I'll use a webfrontend (I'll worry about windows later...): http://www.ibm.com/developerworks/syste … -nweb.html

Last edited by wuischke (2007-12-27 11:56:14)

finferflu · 2007-12-27 12:27:26

I wonder if this could be ported to Ancient Hebrew and Ancient Greek (even though there would be a lot to tweak)... That would be *veeeeery* useful for me, and all Biblical Studies students and scholars all around the globe

By the way, Italian is my native language, so if you need any tip, just ask me, either here, or via email

Last edited by finferflu (2007-12-27 12:36:28)

wuischke · 2007-12-27 15:20:59

Thanks a lot for your offer, I'll be sure to ask you once I've done enough progress to work on the data files.

I don't know about Ancient Greek or Hebrew, but if they are similar to latin it should be possible. Could you point me to a quick overview of the conjugation possibilities with these languages?

finferflu · 2007-12-27 15:46:38

I have not studied any Latin, but as far as I know, between Ancient Greek and Ancient Hebrew, the former is closer, but I'm not so sure how close. Ancient Hebrew is far from both.

If you still think it's manageable for you, I'm going to see if I can find anything online. There is plenty of sources for Ancient Greek, but not so much for Hebrew (in fact, I am more interested in the latter).

Thanks

wuischke · 2007-12-27 18:09:47

I can't promise anything (especially with non-latin characters as hebrew uses, if I'm not mistaken), but if it follows a root+suffix rules, it should be possible.

Mantaar · 2007-12-27 20:31:57

Hebrew might present you with quite a difficulty, as virtually every text in Hebrew you'll find will contain only a few or no vowels at all. As in Arabic, they don't usually write the vowels, but just the consonants (both are Semitic languages, btw). This will force you to infer much of the conjugation from context - which isn't exactly easy.

Again, this is a problem for recognition and parsing, not so much for the pure generation of verb forms.

alex_anthony · 2008-01-19 23:47:01

Bit late but I thought this might be useful if you are considering Greek

I do latin and have in the past learnt a little ancient greek. I think that if the database was built up properly, the main functions could be portable (unless there were problems with the different alphabet)

wuischke · 2008-01-20 08:37:00

Hi,

I don't know how conjugation works for ancient greek, but the data files for my program use a simple syntax: (It is a stupid program after all.)

tense (pronoun) rule
rule might be +suffix, in this case it will look for a rule to create the stem and add the suffix to the stem. If it's an irregular form, you just use the irregular form.

An example:

$avere
stem -ere
ppre +ente
ppas avuto
geru +endo

The Italien verb "avere" (to have) has irregular forms as well as regular forms. The stem is (verb)-ere, i.e. "av" and to create the participio presente (present participle) I use the stem and add "+ente", i.e. "avente". The participio passato (past participle) on the other hand is irregular "avuto".

Displaying of characters is no problem. The source files are encoded in UTF-8 and the resulting htm-page, too. Just use the same encoding for both and you'll be fine.

btw: See http://code.google.com/p/verbconf/ for my SVN repository, if you are interested in this project.

avoulk · 2008-01-20 11:55:06

Although i am greek and would definately like to see the ancient greek project proposed, the extension and optimization of AmyEdit sounds great to me!

Arch Linux

#1 2007-12-26 20:29:42

A verb conjugator in C (was: Ideas for an open source project in C)

#2 2007-12-26 20:59:50

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#3 2007-12-26 21:49:09

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#4 2007-12-27 09:05:37

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#5 2007-12-27 10:22:38

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#6 2007-12-27 11:19:38

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#7 2007-12-27 11:43:12

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#8 2007-12-27 12:27:26

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#9 2007-12-27 15:20:59

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#10 2007-12-27 15:46:38

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#11 2007-12-27 18:09:47

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#12 2007-12-27 20:31:57

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#13 2008-01-19 23:47:01

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#14 2008-01-20 08:37:00

Re: A verb conjugator in C (was: Ideas for an open source project in C)

#15 2008-01-20 11:55:06

Re: A verb conjugator in C (was: Ideas for an open source project in C)

Board footer