You are not logged in.

#1 2009-03-18 03:27:56

estex198
Member
Registered: 2009-02-17
Posts: 39

using grep to search for http in json file

I need to extract my old bookmark links from a json file left by firefox3. unfortunately I no longer have ff3 installed, so I can't just open it and export my bookmarks as html or something. My goal is to get my bookmark links from the last json backup file into opera 9.64. Every time I open the json file with bookmark management I can't see the links. Can I use grep or something to search for "http*" including the quotes and extract all substrings matching this pattern (pattern starts with a double quote (") proceeded by http* and ends with the next double quote (")?)
I'm not very familiar with reg ex, but I'm open to learn. Sorry I'm not exactly sure what to search google for so I'm asking here. Thanks in advance!


- Rusty

Last edited by estex198 (2009-03-18 03:29:47)

Offline

#2 2009-03-18 03:48:49

fumbles
Member
Registered: 2006-12-22
Posts: 246

Re: using grep to search for http in json file

try

$ grep -oP 'http[\w\.\/\:]+' bookmarks-xxxxx.json

Last edited by fumbles (2009-03-18 03:49:59)

Offline

#3 2009-03-18 04:29:28

cactus
Taco Eater
From: t͈̫̹ͨa͖͕͎̱͈ͨ͆ć̥̖̝o̫̫̼s͈̭̱̞͍̃!̰
Registered: 2004-05-25
Posts: 4,622
Website

Re: using grep to search for http in json file

there are lots of python libraries to read json files.

something like....

#!/usr/bin/env python
# foobar.py
import sys
import json

def parse(treeish):
    if hasattr(treeish, 'keys'):
        if 'children' in treeish:
            parse(treeish['children'])
        if 'uri' in treeish:
            if 'place:' not in treeish['uri']:
                print treeish['uri']
    else:
        for x in treeish:
            parse(x)

bookmarks = json.loads(open(sys.argv[1], 'r').read())
parse(bookmarks)
python foobar.py bookmarks-2009-03-15.json

should spit out most of the urls (with some extra firefox junk)

EDIT: copy in case forum mungs python spacing

Last edited by cactus (2009-03-18 04:33:56)


"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍

Offline

#4 2009-03-18 05:53:30

estex198
Member
Registered: 2009-02-17
Posts: 39

Re: using grep to search for http in json file

Thanks for such speedy replies. Unfortunately the method fumbles provided wouldn't print links starting with http://en-
I think adding a hyphen to the reg ex would solve this issue. I should really read up on reg ex's. Thanks a lot for your input fumbles!
cactus I saved the python script as foobar.py and ran the script using the python interpreter but got this

[estex@myhost hyperlink_extractor]$ python foobar.py bookmarks.json
Traceback (most recent call last):
  File "foobar.py", line 17, in <module>
    bookmarks = json.loads(open(sys.argv[1], 'r').read())
  File "/usr/lib/python2.6/json/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.6/json/decoder.py", line 319, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.6/json/decoder.py", line 336, in raw_decode
    obj, end = self._scanner.iterscan(s, **kw).next()
  File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = action(m, context)
  File "/usr/lib/python2.6/json/decoder.py", line 183, in JSONObject
    value, end = iterscan(s, idx=end, context=context).next()
  File "/usr/lib/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = action(m, context)
  File "/usr/lib/python2.6/json/decoder.py", line 219, in JSONArray
    raise ValueError(errmsg("Expecting object", s, end))
ValueError: Expecting object: line 1 column 15239 (char 15239)
[estex@myhost hyperlink_extractor]$

So I did a bit of tinkering with java and got it to work by using the Scanner class and double quotes as a delimiter. I printed each line in between quotes to the screen on its own line, and piped the output through grep and then through cat to redirect the output to a file. Maybe it was a bit too much work but hey i got to write some code so I'm happy. smile

I suppose I could have used tee in place of cat........ either way. Now I gotta figure a way to put the links into a file that opera can interpret......


Thanks again everyone!

- Rusty

Offline

#5 2009-03-18 07:27:20

fumbles
Member
Registered: 2006-12-22
Posts: 246

Re: using grep to search for http in json file

.

Last edited by fumbles (2020-09-26 11:52:48)

Offline

Board footer

Powered by FluxBB