Since google is no longer providing the info, I have to do this myself. using takeout.google.com you can backup the whole thing, and get an emailed link to a .zip file that contains the lot. When you unzip it you'll find a "feed.atom" file that looks like a fairly simple XML file.
I figured BFI was the way to go, and so I settled on making a copy of the text contents, one line per post. The simple approach is something like
grep -i horse searchable | grep -i boy | grep -i lewis
which gives me the full text, along with a lot of clutter like
The Last Battle</U>.</P> <P>I think
that is trivial to clean up.
If you're interested in trying it too, I used the following (no error handling or sanity checking, sorry) awk script as
awk -f makesearch.awk feed.atom > searchable
BEGIN{ready=0;} { if(index($0,"<id>") > 0)printf("%s ",$0); if(index($0,"<content") > 0){printf("%s ",$0);ready=1} else{ if(index($0,"content>") > 0){printf("%s\n",$0);ready=0;} else{if(ready==1){printf("%s ",$0);}} } }
Then I put the file somewhere I'll remember it and put a reference to it in .bashrc, and I should be good. At least WRT searching old items in my blog.
UPDATE: Yes, I know about the search feature in the blogger home page, but I am not always logged into my gmail account when composing and searching--nor should I be. Why should random cookies for random sites have any extra information about me?
2 comments:
Early on in my experience of google, I found that I could get to the information I wanted much more quickly by using boolean search operators, many of which work on other search engines as well.
So, for example, if I wanted to search something I remember you writing about Liberia, I would start with a search with the string: liberia site:idontknowbut.blogspot.com
It sure looks like that's broken, however, as I'm only getting 8 results -- at least they are all really from your blog.
Yep, that's why I decided to do this. Even when I specify that I want only to look at the one site, google seems to have instituted a limit on the number of items to return.
Or perhaps I should say "number of items to keep track of," since even a very specific search for a phrase I know to be present returns nothing.
Post a Comment