You are not logged in.

#1 2008-10-17 10:21:01

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Comparing two files

I am blatantly exploiting the forum right now, without doing proper googling, due to pressure. I am trying to memorize several lists for an exam, and would like to know if there was a way I could make text files out of these lists, compare them, and have a program spit out the words that all these lists have in common, to make learning easier.

Desperate right now. sad

Cheers.

Offline

#2 2008-10-17 10:23:54

sessy
Member
Registered: 2006-01-20
Posts: 104

Re: Comparing two files

diff, vimdiff, kdiff3?

Offline

#3 2008-10-17 10:34:47

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: Comparing two files

If they aren't text files what are they?

(Maybe you need catdoc?)

Then if all entries are on separate lines you can use this to get duplicate entries (items that appear in 2 or more dicts)
cat dict1 dict2 dict3 | sort | uniq -d

To get entries that only appear in all dictionaries I think you can start with 2, and then gradually cat in more sorted dicts.
((cat dict1 dict2 | sort | uniq -d; cat dict3) | sort | uniq -d; cat dict4) | sort | uniq -d

Not sure about that though...

Offline

#4 2008-10-17 10:37:45

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Re: Comparing two files

The writeups are made in tomboy, it is my personal note-wiki for exams (so far a huge collection of more than 1000notes).
Where do I copy-paste the text to be able to use vi the way it is described here?

Offline

#5 2008-10-17 11:09:32

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: Comparing two files

If it's over 1000 notes, then you need a bit more work to find what you want. You can't perform 1000 vimdiff operations.

But first see if you can export the data to plain text.

If you can't mass-export it, see if you can find the data you want in ~/.tomboy or ~/.config/tomboy

Offline

#6 2008-10-17 13:37:31

zhuqin
Member
Registered: 2008-01-31
Posts: 61

Re: Comparing two files

how about meld?

Offline

#7 2008-10-17 15:58:31

max.bra
Member
From: Bologna - Italy
Registered: 2008-06-02
Posts: 93

Re: Comparing two files

zhuqin wrote:

how about meld?

very nice

Offline

#8 2008-10-17 16:26:11

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

Can you paste an example of a Tomboy note file so that I can see the format? (I don't feel like installing 30 packages just to check myself.)

If it's as simple as I expect, I might be able to write a script to do what you want.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#9 2008-10-17 16:54:56

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: Comparing two files

Yeah I think tomboy saves in some non-text format.

One important thing to note about using 'diff' is that order matters to diff. So what you'd want to do is sort things first.

This is going to be a PITA if it's binary and multi-line.

Can tomboy export each note to a separate file? That is, if you dump things to a dir, like so:

ls MyNotes1/
noteA   noteB   noteC

ls MyNotes2/
noteA noteC noteD

You can use "diff -uNr MyNotes1 MyNotes2" to get a diff including new files...

Offline

#10 2008-10-19 15:19:14

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Re: Comparing two files

I let you guys cold with my question, and I apologize for that, after having involved you. I'll get back to this on tuesday, I can't justify playing around with my computer today, as there will be some cramming nights at this point. sad

@ xyne, could you specify what I should do with the tomboy file? upload one file from .tomboy to the forum, open in in gedit and then copy/paste?

later, and thanx so far.

Offline

#11 2008-10-19 15:58:45

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

@ Reploid

Try opening it with gedit or some other text editor. If it only contains text data, copy it and paste it inside some code tags on the forum (create a small example file instead of uploading something huge). If it's not purely a text file, I'm not sure I'll be able to help, but you can still try uploading the file somewhere and posting a link for me to take a look at it.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#12 2008-10-20 14:04:06

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Re: Comparing two files

Yay! Exam over, went like a breeze! big_smile So let's get back to computer stuff, more fun anyway.

When highlighting the file in Thunar, it says that it is an HTML document, and it has a .note ending.

Irritatingly enough, the files are given names that I can't figure out with a quick glance. So I wouldn't know which two files to compare, unless I figure out what the relation is between the notename and the filename.

Here is one file:

<?xml version="1.0" encoding="utf-8"?>
<note version="0.3" xmlns:link="http://beatniksoftware.com/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size" xmlns="http://beatniksoftware.com/tomboy">
  <title>Quetiapine</title>
  <text xml:space="preserve"><note-content version="0.1">Quetiapine

SEROQUEL is indicated for the treatment of both:

    * depressive episodes associated with <link:internal>bipolar</link:internal> disorder
    * acute manic episodes associated with <link:internal>bipolar</link:internal> I disorder as either monotherapy or adjunct therapy to <link:internal>lithium</link:internal> or divalproex.


SEROQUEL is an antagonist at multiple neurotransmitter receptors in the brain: 
serotonin 5HT1A 
serotonin 5HT2 (IC50s=717 & 148nM respectively), 
dopamine D1
dopamine D2 (IC50s=1268 & 329nM respectively), 
histamine H1 (IC50=30nM)
adrenergic α1 and α2 receptors (IC50s=94 & 271nM, respectively). 
SEROQUEL has no appreciable affinity at cholinergic muscarinic and benzodiazepine receptors (IC50s > 5000 nM).

It has been proposed that the efficacy of SEROQUEL <link:internal>in schizophrenia</link:internal> and its <link:internal>mood</link:internal> stabilizing properties in <link:internal>bipolar</link:internal> <link:internal>depression</link:internal> and mania are mediated through a combination of dopamine type 2 (D2) and serotonin type 2 (5HT2) antagonism. Antagonism at receptors other than dopamine and 5HT2 with similar receptor affinities may explain some of the other effects of SEROQUEL.

SEROQUEL's strong antagonism of histamine H1 receptors may explain the somnolence observed with this drug.
SEROQUEL's antagonism of adrenergic α1 receptors may explain the orthostatic hypotension observed with this drug.

Higher binding affinity for 5HT2 receptors than for dopamine receptors.
Strong antihistaminic effect, very sedating. Zzzz zzzz zzzeroquel.
mild weight gain
very low rate of EPS
no prolactin elevation
short half life, twice daily dosing.
eye examination because of cataract in beagles. 

</note-content></text>
  <last-change-date>2008-08-21T18:09:28.0935410+02:00</last-change-date>
  <last-metadata-change-date>2008-08-21T18:09:28.0935410+02:00</last-metadata-change-date>
  <create-date>2008-07-17T23:02:22.4148960+02:00</create-date>
  <cursor-position>250</cursor-position>
  <width>450</width>
  <height>360</height>
  <x>439</x>
  <y>130</y>
  <tags>
    <tag>system:notebook:psych</tag>
  </tags>
  <open-on-startup>False</open-on-startup>
</note>

Offline

#13 2008-10-21 19:09:04

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

Ok, the xml format means that it will be easy to parse the files, now I just need to know exactly what you want to do.  You said that you had lists of words yet the note that you posted is just a collection of info about Quetiapine (i.e. not a list). Do you want a script that can take several notes, each of which describe a single molecule and its actions, and find words in the description that they have in common? I'm guessing that you really want something that could show a list of common receptor targets for a group of drugs or list common applications. The problem is that the script would be agnostic of the current file layout as there are no tags to specify target receptors or indicated treatments. Finding words in common would spit out lists with words like "either", "or", "to", etc.

The ideal would be to have true xml files with sections like this:

<target_receptors>
     <receptor>serotonin 5HT1A</receptor>
     <receptor>histamine H1</receptor>
</target_receptors>

but with 1000s of notes, that would just be an unnecessary pain for you.

If the example that you posted is pretty much a template for all of the notes, just give me a description of what you want to compare exactly and I'll see what I can do. One thing that would be easy would be to return all the titles for a given search term, e.g. a search for "serotonin 5HT1A" would return a list with "Quetiapine" and any others that match.

Last edited by Xyne (2008-10-21 19:12:12)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#14 2008-10-21 20:46:29

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Re: Comparing two files

Xyne wrote:

I'm guessing that you really want something that could show a list of common receptor targets for a group of drugs or list common applications.

You get the idea. Although, I was actually thinking more of taking a few files that are about illnesses (uterine cancer, endometriosis, ovarian cancer, appendicitis, etc) and identify which symptoms they have in common (abdominal pain, cramping, cyclic pain, etc). I wouldn't have to learn every symptom for every disease, I would in my head more like sort conditions according to symptom groups. But tomboy currently doesn't allow me to do this. That is why I wondered if I could manually compare these files in any way.

I used the quetiapine note just randomly, as the note-files aren't named in a way that allows me to identify them when browsing through the .tomboy folder. That makes it pretty hard for me to compare files manually, as I can't just open 1000 files randomly, to find out which one is endometriosis, ovarian cancer, etc. sad

The problem is that the script would be agnostic of the current file layout as there are no tags to specify target receptors or indicated treatments. Finding words in common would spit out lists with words like "either", "or", "to", etc.

Ah, I understand. The comparison would be extremely long to browse through, then.

The ideal would be to have true xml files with sections like this:

<target_receptors>
     <receptor>serotonin 5HT1A</receptor>
     <receptor>histamine H1</receptor>
</target_receptors>

but with 1000s of notes, that would just be an unnecessary pain for you.

Let's forget about the current notes, xyne. Let's think about how I could make new notes, that would be better suited to my needs. It doesn't even have to be in tomboy. How would I go about it, to create such a personal wiki, from information that I find in my books and on the net? That would be extremely handy, especially if those notes could be transferred to a linuxOS pda sometime or something similar.

Offline

#15 2008-10-21 20:51:52

creslin
Member
Registered: 2008-10-04
Posts: 241

Re: Comparing two files

Reploid wrote:

How would I go about it, to create such a personal wiki, from information that I find in my books and on the net? That would be extremely handy, especially if those notes could be transferred to a linuxOS pda sometime or something similar.

While I've never used it and really don't know too much about it, "personal wiki" reminded me of something I've read about called Zim.  It may or may not even apply to what you want, but I figured I'd throw it out there just in case.


ARCH|awesome3.0 powered by Pentium M 750 | 512MB DDR2-533 | Radeon X300 M
The journey is the reward.

Offline

#16 2008-10-21 21:42:15

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

Reploid wrote:

You get the idea. Although, I was actually thinking more of taking a few files that are about illnesses (uterine cancer, endometriosis, ovarian cancer, appendicitis, etc) and identify which symptoms they have in common (abdominal pain, cramping, cyclic pain, etc). I wouldn't have to learn every symptom for every disease, I would in my head more like sort conditions according to symptom groups. But tomboy currently doesn't allow me to do this. That is why I wondered if I could manually compare these files in any way.

With the current file layout, that would be difficult but not impossible. You could go through the notes and create a semi-formal layout, e.g. create a section that's headed by "SYMPTOMS" with each symptom on a separate line immediately following it. That would make it easy to grab them and find which symptoms a given set of illnesses have in common.

Reploid wrote:

Let's forget about the current notes, xyne. Let's think about how I could make new notes, that would be better suited to my needs. It doesn't even have to be in tomboy. How would I go about it, to create such a personal wiki, from information that I find in my books and on the net? That would be extremely handy, especially if those notes could be transferred to a linuxOS pda sometime or something similar.

Well, in this case, I would probably just create xml documents for the diseases, e.g.

<illness>
     <name>uterine cancer</name>
     <description>Some brief description</description>
     <symptoms>
          <symptom>abdominal pain</symptom>
          <symptom>cramping</symptom>
     </symptoms>
     <treatments>
          <treatment>substance_a</treatment>
          <treatment>substance_b</treatment>
     </treatments>
</illness>

then use xslt so that I could view them with a nice layout in my browser and use some scripts to collate information. The thing is that if you haven't done something like that before, you could find yourself wasting precious time on trying to figure out xslt syntax.

All you would really need to do though is stick to some standard layout of your notes so that they can be easily parsed programmatically, e.g.

the name of the disease

a description of the disease here

Symptoms
symptom A
symptom B
symptom C

Treatments
treatment A
treatment B
treatment C

Just make sure that you're consistent in naming the symptoms, treatments, etc, e.g. don't write "mild weight gain" in one place then "slight weight gain" somewhere else (probably a good idea to keep a list of symptoms that you've used and re-copy them from the list when creating a new file).

Other approaches would be to create a database but that depends on programming knowledge (not difficult to do with python for example. but you probably don't want to spend time going through the tutorials when you should be turning your brain into a pharmacological encyclopaedia).

You can also look for applications which might be suited to this, such as Zim as creslin mentioned above (never used it).


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#17 2008-10-21 22:39:09

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Re: Comparing two files

Xyne wrote:

With the current file layout, that would be difficult but not impossible. You could go through the notes and create a semi-formal layout, e.g. create a section that's headed by "SYMPTOMS" with each symptom on a separate line immediately following it. That would make it easy to grab them and find which symptoms a given set of illnesses have in common.

All you would really need to do though is stick to some standard layout of your notes so that they can be easily parsed programmatically, e.g.

Just make sure that you're consistent in naming the symptoms, treatments, etc, e.g. don't write "mild weight gain" in one place then "slight weight gain" somewhere else (probably a good idea to keep a list of symptoms that you've used and re-copy them from the list when creating a new file).

Sounds like the easiest option for a non-programmer. I think I want to try this.

Couple of questions:
1) Editing the files: manually open each file in vi/gedit or can I do it from within tomboy?
2 )How do I create the script that allows me to group notes according to this symptoms /treatments, or where do I start learning in order to do it?
3) If I have to manually load files from the .tomboy folder, how do I figure out which files are my desired ones, when they are named like this: "ae324bb0ewtr4454.note" and not "ovarian cancer?"

Well, in this case, I would probably just create xml documents for the diseases, then use xslt so that I could view them with a nice layout in my browser and use some scripts to collate information.

When I create xml documents, can I then have files that I can view in a program, which links to other documents, and can be read without having to overlook all the code tags? What software would I use to read my new xml files then?

The thing is that if you haven't done something like that before, you could find yourself wasting precious time on trying to figure out xslt syntax.

That is soo true, which is why I am now thinking like: quick fix, and long time learning -fix. Learning programming stuff and databases like you point out could be a cool longtime project. People need hobbies anyway, right? big_smile

Last edited by Reploid (2008-10-21 22:39:32)

Offline

#18 2008-10-21 22:43:40

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

Ok, here's something for now to work with the notes that you already have:

#!/usr/bin/perl
use strict;
use warnings;


my $path_to_notes_directory = '/path/to/notes/directory/';

my ($path_to_target_list_file,@illnesses) = @ARGV;
my %target_hash;
&build_target_hash;


my $egrep = 'egrep -lir \'<title>('.(join '|', @illnesses).')</title>\' '.$path_to_notes_directory;
my @files = split /\n/, `$egrep`;
foreach my $file (@files)
{
    my ($title, $content) = &get_file_contents($file);
    foreach my $target (keys %target_hash)
    {
        if ($content =~ m/\Q$target\E/i)
        {
            $target_hash{$target}++;
        }
    }

}

my $n = scalar @illnesses;
print "$_\n" foreach sort grep {$target_hash{$_} == $n} keys %target_hash;


sub get_file_contents
{
    my ($file) = @_;
    my ($title, $content) = ('','');
    open(my $fh, '<', $file) or die "Unable to open $file: $!\n";
    while (defined(my $line = <$fh>))
    {
        $content .= $line;
    }
    close $fh;
    $content =~ s/\s+/ /g;
    return ($title,$content) = $content =~ m/<title(?:\s+[^>]*)*>(.*?)<\/title>.*?<note-content(?:\s+[^>]*)*>(.*?)<\/note-content>/s;
}

sub build_target_hash
{
    my @list = ();
    open(my $fh, '<', $path_to_target_list_file) or die "Unable to open $path_to_target_list_file: $!\n";
    while (defined(my $line = <$fh>))
    {
        chomp $line;
        $target_hash{$line} = 0;
    }
    close $fh;
}

Change

/path/to/notes/directory/

to point to the directory that contains your tomboy notes, save the script as "find_common.pl" (or whatever you want to call it) and make it executable (chmod 755 find_common.pl).
Next, create a file named "symptoms.txt" with a simple list format of possible symptoms, one on each line (any order):

abdominal pain
cramping
cyclic pain

To find the common symptoms (if any) of uterine cancer, endometriosis, ovarian cancer, and appendicitis, invoke the script as follows:

find_common.pl /path/to/symptoms.txt "uterine cancer" endometriosis  "ovarian cancer" appendicitis

That will spit out a list of all symptoms they have in common (actually, it spits out a list of symptoms that were mentioned in each file). If you want to find common treatments, create a "treatments.txt" file with the same format as the symptoms.txt file and pass it as the first argument instead of symptoms.txt. For anything else, rinse and repeat.

This also assumes that the illnesses passed in on the command line correspond directly to the titles of the notes. Also, this should work with pharmaceuticals or anything else as it's just comparing the notes for given files against the target list to determine which targets they have in common. The efficiency of this approach depends on the layout of the notes and the consistency of the target list.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#19 2008-10-21 22:54:59

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Re: Comparing two files

Xyne wrote:

to point to the directory that contains your tomboy notes, save the script as "find_common.pl" (or whatever you want to call it) and make it executable (chmod 755 find_common.pl).
Next, create a file named "symptoms.txt" with a simple list format of possible symptoms, one on each line (any order):

abdominal pain
cramping
cyclic pain

To find the common symptoms (if any) of uterine cancer, endometriosis, ovarian cancer, and appendicitis, invoke the script as follows:

find_common.pl /path/to/symptoms.txt "uterine cancer" endometriosis  "ovarian cancer" appendicitis

That will spit out a list of all symptoms they have in common (actually, it spits out a list of symptoms that were mentioned in each file). If you want to find common treatments, create a "treatments.txt" file with the same format as the symptoms.txt file and pass it as the first argument instead of symptoms.txt. For anything else, rinse and repeat.

This also assumes that the illnesses passed in on the command line correspond directly to the titles of the notes. Also, this should work with pharmaceuticals or anything else as it's just comparing the notes for given files against the target list to determine which targets they have in common. The efficiency of this approach depends on the layout of the notes and the consistency of the target list.

Man, that is so nice of you. Really, really, superthx.

But what do I do about the fact that the files in tomboy aren't saved as .txt files with names like "endometriosis," "ovarian cancer," etc? Should I just copy/paste my tomboy notes, and create text files from them? Taking notes as plain text files is a bit limited, as I can't click on highlighted links and stuff...

Do I save the find_common.pl in /usr/local/bin?

Last edited by Reploid (2008-10-21 22:55:44)

Offline

#20 2008-10-21 22:55:25

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

Reploid wrote:

Couple of questions:
1) Editing the files: manually open each file in vi/gedit or can I do it from within tomboy?

As far as I can tell, everything between the note-content tags is a verbatim copy of what you've typed in, so you should be able to edit it directly in tomboy.

Reploid wrote:

2 )How do I create the script that allows me to group notes according to this symptoms /treatments, or where do I start learning in order to do it?

Perl and Python are the most suitable for this imo. I'm happy to provide some simple scripts once you've decided on a format.

Reploid wrote:

3) If I have to manually load files from the .tomboy folder, how do I figure out which files are my desired ones, when they are named like this: "ae324bb0ewtr4454.note" and not "ovarian cancer?"

Just grep them, e.g.

grep -lir "ovarian cancer" /path/to/notes
grep -lir "<title>ovarian cancer</title>" /path/to/notes

The second line should return the note about ovarian cancer, the first line will return that and any other notes that contain the phrase "ovarian cancer".

Reploid wrote:

When I create xml documents, can I then have files that I can view in a program, which links to other documents, and can be read without having to overlook all the code tags? What software would I use to read my new xml files then?

If I've understood the question, it would be possible to include links in the xml files to other xml files. You could view them in firefox (or whatever browser you use)... they can be view like local web pages basically.

Reploid wrote:

That is soo true, which is why I am now thinking like: quick fix, and long time learning -fix. Learning programming stuff and databases like you point out could be a cool longtime project. People need hobbies anyway, right? big_smile

Definitely. That's where my programming knowledge comes from. Just keep setting yourself fun little objectives that force you to learn new tricks.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#21 2008-10-21 23:01:02

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

Reploid wrote:

Man, that is so nice of you. Really, really, superthx.

Np, I find little tasks like this fun, especially when other people find the results useful.

Reploid wrote:

But what do I do about the fact that the files in tomboy aren't saved as .txt files with names like "endometriosis," "ovarian cancer," etc? Should I just copy/paste my tomboy notes, and create text files from them? Taking notes as plain text files is a bit limited, as I can't click on highlighted links and stuff...

The script doesn't care about the file names, it will find "endometriosis" regardless. You can use the grep command in my previous post to find the file if you need it yourself.

Reploid wrote:

Do I save the find_common.pl in /usr/local/bin?

You could if you want to be able to run it anywhere, but I personally wouldn't because I like to avoid clutter on the file system (if you really want it there, stick it in a package and install it with pacman). I would just dump it in a "scripts" directory and run it locally when I need it. You could put that directory on your path in .bashrc so that you can run it from anywhere.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#22 2008-10-22 16:00:01

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Re: Comparing two files

Quick feedback, I had the script working just fine, yay! big_smile

I have to admit though, that your idea of making xml documents is likely to be more valuable in the long run. The current script is a quick way of finding common entries in known diseases, but it would be more valuable to search among all my disease notes according to symptoms.

I have had this idea for a while know, that it would be insanely cool if there was an online medical wiki, editable by everyone, that could be downloaded to your computer, and searched through the way you described, according to categories like  <symptoms> list </symptoms>. This would result in an extremely versatile and free tool, that could be used for reference anywhere, anytime. As people already more than happily puts excellent articles on wikipedia.org, there wouldn't be a problem having ppl contribute to such a project, if it is GPL.

What would I need to learn to implement such an idea? XML, HTML and so on? There is an "export to HTML" plugin for tomboy, if I have understood everything correctly, I could just start right away writing from within tomboy in XML and then use these notes later, as web pages?

Last edited by Reploid (2008-10-22 16:01:13)

Offline

#23 2008-10-22 17:17:33

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

I've had similar ideas too for general medical knowledge. Most routine work by doctors involves recognizing symptoms and connecting them to possible causes. A general database that relates the two would provide some valuable insights and serve as an aetiological aid, especially in cases of rare disorders (of course, for medical applications, such a database would need to be strictly checked for accuracy by several qualified individuals... perhaps using a system whereby accurate pages in the public wiki are transferred into a separate academic one)

You don't need any HTML for XML. The basics of XML can be learned in 20 minutes (if even that). All XML is is a way to logically structure information. The learning curve comes in when you want to start parsing that information, but with the XML format, that's not really difficult. Perl and Python are both very adept at this and have the added bonus of being relatively easy to learn (Python probably more so, but to me Perl feels more versatile when using regexes, which come up quite often when parsing text data). For a serious database application capable of handling a large load, a faster language would probably be needed (I haven't needed that yet, so I don't have much insight to share).

It's possible to format XML files to be displayed directly on a site, but a better approach would probably be to have a display page that parses the file and reformats the information in an appropriate way. Going that way though, an even better approach would most likely be to ditch the XML files and start building databases to store the information (faster and smaller).

I have several ideas about how to organize such a medical wiki (mostly conceptual, but also programmatical, e.g. how to standardize input to prevent disparities in describing symptoms). If you're serious about implementing it, I would like to get involved to whatever degree my time permits, provided that my input would be useful. I've tinkered with enough html/php/asp/databases/javascript to be able to get a number of things working well and I'm always happy to learn more.

Last edited by Xyne (2008-10-22 17:28:39)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#24 2008-10-22 18:15:22

Reploid
Member
From: Cold Country up North
Registered: 2008-03-27
Posts: 110

Re: Comparing two files

There are already many medical applications that uses databases, like drug lists, dictionaries, etc. What I was thinking of, was something along the lines of wikipedia. I'm a total n00b when it comes to computers, but I found that this personal tomboy wiki has been very helpful in editing notes by means of copy/past from research articles, emedicine and wikipedia. With some better searching tools, and some tools for creating notes on a community basis, you would have a very powerful utility.

I don't know anything about databases, or if you could get a community to build up databases as a joint project. How would a computer n00b like me take notes from textbooks and fill into those databases? That is half the idea, having ppl contribute. There is such an insane amount of medical information, to have a comprehensive and always up-to-date tool, you would need the aid of many students.

I have several ideas about how to organize such a medical wiki (mostly conceptual, but also programmatical, e.g. how to standardize input to prevent disparities in describing symptoms). If you're serious about implementing it, I would like to get involved to whatever degree my time permits, provided that my input would be useful. I've tinkered with enough html/php/asp/databases/javascript to be able to get a number of things working well and I'm always happy to learn more.

If I can work with this for fun a few months onward, I might have to take a raincheck on that one. It would definetly be better than my trying to think of all the practical and difficult part with shallow knowledge base. And to you and anyone else, I'd be more than happy if you would grab whatever you want from the wishlist I coughed up, and make such a wiki/program/community project yourself. I just started reading the tutorials on the w3school page you gave me, so this could take a while, seeing as I have to learn a wee bit as well tongue

Last edited by Reploid (2008-10-22 18:23:26)

Offline

#25 2008-10-22 18:30:30

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: Comparing two files

That's where the interface comes in. You need to have a (nearly) foolproof way of entering the information through some well-planned editing form. This would take the user's input, structure it and load it into the database. Having the information in such a structured form would be the very thing that enables advanced searches etc. The community would still be the source of the information, the site would just provide a way of structuring it. Along with the editor, there would be standard display pages that present the information in a wiki-esque layout for viewing purposes and advanced search pages that are able to present more complex information. I would even throw in APIs for external programs that want to interact with the site.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

Board footer

Powered by FluxBB