Eric Hartwell's InfoDabble
 Thursday, May 05, 2005
Infoscraper update
  • RSS Bandit is a desktop news aggregator written in C# and .NET under active development at SourceForge. See Revamping the RSS Bandit Application for a 2003 MSDN article about RSS Bandit.
     
  • Creating a generic Site-To-RSS tool [9/29/2003] describes a generic HTML-to-RSS scraper tool that uses regular expressions with VB.NET.
     
  • Template Based Scraping [10/28/2002] A quick overview of screen scraping.
     
  • RSSxl is an HTML to RSS converter that will generate an RSS feed from pretty well any HTML web page - with no requirement to edit the source HTML first. It is a free online service that translates HTML to RSS.
Now to put it all together ...
3:12:34 PM    

 Tuesday, May 03, 2005
Data integration can be a hoot with OWL [SearchWebServices.com 2/17/2005] The movement to standards-based computing that XML and Web services herald is eerily analogous to the work done in the first half of the twentieth century to establish international long distance telephone standards. The use of semantic integration technologies, like Web Ontology Language (OWL), can solve the problem of data composition. Using ontologies as an abstraction layer for enabling automated information exchange is analogous to the use of Service contracts to abstract the implementation of service providers from consumers. However, just as SOA requires an advanced investment in architecture, the creation of ontologies are quite time-consuming, and require a leap of faith by implementers before they can realize their value.
7:41:52 PM    

Beating the RSS crunch with aggregation/bloglines [SearchWebServices.com 10/20/2004] Bloglines has created a freely available, simple and straightforward set of APIs that developers can use to access their aggregated blog database and relieve congestion problems. What Bloglines does for RSS feeds is very much like what Google and Yahoo do for popular Web pages and information: they compile this content into their databases, so that accesses to frequently requested pages are satisfied from a local cache, instead of requiring the original server to handle yet another update or access request.
7:35:14 PM    

 Wednesday, April 20, 2005
Target Remakes the Pill Bottle - sensibly and beautifully [New York Metro 4/18/2005 via Gizmodo, Boing Boing 4/19/2005] The standard-issue amber-cast pharmacy pill bottle has remained virtually unchanged since the second World War. An overhaul is finally coming, courtesy of Deborah Adler, a 29-year-old graphic designer whose ClearRx prescription-packaging system debuts at Target pharmacies May 1.
  1. Easy I.D. The name of the drug is printed both on the top and side.
  2. Code red. The bottle is Target’s signature red color - and a symbol for caution.
  3. Information hierarchy. Most important information (drug name, dosage, intake instructions) above the line, less important data below.
  4. Flat sides for readability; Upside down to save paper.
  5. Green is for Grandma. Different colored rubber rings for each family member.
  6. Info card that’s hard to lose tucked behind the label.
  7. Take “daily.” Avoids the word "once" on label, since it means eleven in Spanish.
  8. Clear warnings. Revamped the 25 most important warning symbols.

3:44:07 PM    

 Thursday, March 24, 2005
43 Folders "Life Hacks" Wiki. [Street Tech 3/24/2005; 2:53:38 PM] The getting things done gurus over at 43 Folder have just put up a Wiki for collecting "life hacks," cool little tips and tricks for making life easier and more manageable. Right now, the hacks are uncategorized -- it's just a dump list -- but they plan on
5:54:56 PM    

 Sunday, March 13, 2005
Perspective Wiki is a .NET-based, GPL Wiki that uses IIS and Indexing Service. Features include: User login and control of what can be edited, with support for transparent logins in Windows; Full version history of pages and their attachments so it is clear who did what; WYSIWYG formatting of and editing; easy attachments including embedded images and searchable documents (including searching over MS Office documents). Currently under active development.
8:30:40 AM    

 Tuesday, March 08, 2005

Using Google Desktop Search engine to scan mapped network drives

I tried to find if there was a way to change some of the Google Desktop Search settings to allow for indexing network drives. According to the FAQ the tools will not index a network drive. But with some registry setting changes we can have the Google Desktop Search engine scanning mapped network drives. For example, locate the following registry key:

[HKEY_CURRENT_USER/Software/Google/Google Desktop/HistoricalCapture/Crawler]
CRAWL_DIRS=
CRAWL_FILE=DONE

By entering the !C:<TAB>!M:<TAB> (where C: and M: are drives to index) into CRAWL_DIRS and removing DONE from CRAWL_FILE we can instruct the engine to actually index remote drives. Note that <TAB> is the TAB character. Best way to enter it is using Notepad, type !C: and press the TAB key. Then CTRL-A (Select All) and CTRL-C (Copy). Paste into the key.

Using Google Desktop Search as a network search server [Geekzone BlogBlog, 15-JAN-2005]


11:10:43 AM    

 Sunday, February 06, 2005
Advanced Security for Outlook. Use Advanced Security for Outlook to learn what programs are trying to access Outlook and permanently allow or deny access to the program and the next time it requests access, the action you choose will be automatically executed and Outlook Security will not annoy you with messages about trying to access e-mail addresses you have stored in Outlook. Freeware, available in English, German and Russian. Version 1. [Slipstick - Outlook and Exchange News 12/7/2004]
10:47:21 AM    

 Tuesday, February 01, 2005
New York Times weblog-safe link generator. Mark Frauenfelder: Jon Lasser let me know about the NYT Permalink generator, which generates a non-decaying link to New York Times stories. When readers click on one of these links, they don't have to sign into the NYT site, and they won't have to pay a fee to read the story, even if it's in the archives.  [Boing Boing 1/25/2005]
8:53:22 AM    

 Sunday, January 02, 2005
RS3 - The RSS Aggregator Killer?. I've been trying to get the word out (specifically to Adam Curry) about a project I've created on SourceForge (link here), called RS3, that takes a set of RSS feeds, crawls and scrapes the _original_ article linked to in each feed item, summarizes that page, and then optionally converts that text summary to speech and a playlist. A set of ogg files is created from this which is wrapped up in an M3U playlist which can be copied to your favorite media player. So, in essence, you can have your RSS feeds actually read aloud to you through a podcast on any media player or PC. [Doc Searls' IT Garage - News, ideas and real world stories about how IT folks solve their own problems 11/11/2004]
11:43:21 AM    

 Saturday, December 04, 2004
The feedmesh group at Yahoo is working on a distributed system for weblog pinging. It wasn't started in a very nice way, but now they seem to have turned the corner. If you're interested in pinging systems like weblogs.com, check it out. At some point we're going to need a distributed system, and this may be a good way to get there. I monitor the list through its RSS feed. [Scripting News 12/4/2004; 8:53:06 AM.]
10:28:34 AM    

And thanks for all the interest in helping with the weblogs.com rewrite. I got over a hundred emails from qualified developers. That's totally amazing, and totally appreciated. We were able to fix the existing system, so we can now easily handle the flow of pings (there was an unobvious performance bug). If you want to participate, I recommend joining the group, above. [Scripting News 12/4/2004; 8:53:06 AM.]
10:27:31 AM    

 Friday, December 03, 2004
A picture named noodles.jpgIt's worth noting that in the last three weeks we've gotten some remarkable support from Microsoft. First, Halo 2 shipped with RSS 2.0 support. Who thought of RSS for games? It wasn't even on our radar. Then yesterday MSN Spaces, a blogging system, shipped with RSS 2.0 support. And the third bit was pinging weblogs.com. Now I'm sure we'll be able to turn the corner for two reasons. First, we got a huge response from serious developers, and several credible projects started today, in a variety of environments. Since eventually this will have to be a distributed system, like DNS, it's important to have a variety of compatible implementations. Second, Andre and I did a back and forth on this over a few hours. Andre used to be responsible for the kernel at UserLand years ago, and now is back in the loop after the Frontier open source release. He asked me some questions, I sent him the source, we put in some diagnostics, tested a theory and boom, all of a sudden the server is performing beautifully. We still have the scaling issue but we got some breathing room today. Anyway, thanks to Microsoft for trusting us and using our formats and protocols. [Scripting News] 12/2/2004; 8:53:00 PM.
4:31:43 PM    

I've been getting lots of mail about the programming project described below. My challenge will be to try to organize the energy to actually create the needed software. People ask if C# or Java would be okay, and the answer is, of course. I basically meant "compiled code" as opposed to interpreted code. Static instead of dynamic. We have to cut to the metal. I also need to write up a spec that explains what the software does. Anyway, let's give it a couple of days to gestate. In the meantime you might start writing code. ";->" [Scripting News] 12/2/2004; 2:53:20 PM.
4:30:40 PM    

Weblogs.com needs a rewrite.

With Typepad, MSN Spaces and Blogger and a gazillion other blogs pinging weblogs.com, the server, which is written in scripts, has met its match. It's needed a rewrite in C for some time, now it really needs a rewrite.

I've been trying to get help with this privately, I personally don't have the requisite skills to write the code. If this were 1994, and I had Think C (a development environment I was expert in) the project would take a couple of days. Today, in a modern environment with even deeper libraries, it might take even less time.

I'd be happiest if this could be done as an open source project, the lots-of-eyeballs thing is particularly suited to this kind of project. It has to scale well, obviously, from Day One. No time to ramp up.

I have a full modern server to host this application, with no other apps running on the machine. Right now it's running Windows 2000, but we could switch over to any other operating system.

What I don't want: Offers from companies to buy weblogs.com. It's important that this resource stay independent. The only reason companies would want to pay so much for this service is if they planned to take it private.

Anyway, please send me an email if you're a skilled C programmer who would like to work on a such a project to help out the weblogs community.

Thanks!

[Scripting News] 12/2/2004; 8:53:09 AM.
4:30:04 PM    

 Sunday, October 10, 2004

AimAtSite IE Toolbar - the complete search solution provides History search, Favorites search, Current page search, Words highlighting and it helps you query search engines, dictionaries, encyclopedias and other Web search sites. The toolbar is available for free download [no feature limitations for a 30-day trial period]. It uses Microsoft Indexing Service to index the content of Web pages you are visiting to perform fast History and Favorites search. Unlike the Internet Explorer's built-in history search tool AimAtSite IE Toolbar provides better performance, sorting by rank or title or date of visit, pages abstracts and flexible queries. No favorites search feature is available in Internet Explorer. But our toolbar provides you with such functionality. With AimAtSite IE Toolbar you will never lose important information!
11:28:55 AM    

 Monday, September 27, 2004
Create a randomized RSS feed: XML developers can leverage their existing knowledge to use RSS in a novel way, such as creating a randomized RSS feed. Rather than showing the top content items as ranked by your third-party RSS sources, you can randomize the selections to create a new listing of RSS content for every page refresh--giving site visitors a reason to keep coming back. [builder.com]
7:49:13 PM    

 Friday, September 24, 2004
TabTag. TabTag turns Outlook into a flexible SQL database solution, adding a range of features to classify, link and edit data as required. Easy to install, use and manage, TabTag saves valuable resources and improves productivity. Free single user version now available. [Slipstick - Outlook and Exchange News 4/22/2004]
6:14:23 PM    

DateLens: Zoom in / out on Outlook's calendar. What makes this application different is that it provides a fisheye representation of dates as it is called. Just click on the days, months, icons, to zoom in / out of the calendar. DateLens has a Mondrian skin (not the default) to give it character. If you're already using Outlook, this is a great add-on feature. It also has a Pocket PC version, but it's not free and it supports Tablet PC's Ink. Requires .Net framework v1.1 and Outlook. [via Lockergnome Windows Fanatics 3/17/2004]
12:30:39 PM    

 Thursday, September 23, 2004
CSS Creator helps generate styles and layouts for your projects. You just choose from a number of values, and the code is written for you. The site features links to other sites that will be of benefit to you as well. You'll find resources that contain articles, tools, templates, and much more.
10:23:29 PM    

Building Your Own ASP.NET Feed Parser. John Crocker writes in an informative article on The Code Project: “After seeing numerous applications available on the Internet to download and view RSS Feeds off the Internet, I wondered what would be needed to develop a .NET component to Read, Parse and Display RSS Feeds in aspx Pages. The Component development is beyond the scope of this article, Which will cover the base class that is used by the component to render the detail…”By dhenry@howdev.com (Daniel Henry). [Lockergnome’s RSS & Atom Tips 4/21/2004]
5:32:24 PM    

Replace MSNBC's Outlook Today page. Outlook MVP Diane Poremsky provides a replacement for the now-discontinued MSNBC customized Outlook Today page. [Slipstick - Outlook and Exchange News 12/11/2003]
1:19:46 PM    

 Wednesday, September 22, 2004
Creating a Generic Site-To-RSS Tool. This in-depth article provides you with all of the technical information, examples and references you need to taste the flavour of true site scraping. [via Lockergnome's RSS & Atom Tips 5/30/2004]
10:40:23 AM    

 Tuesday, September 21, 2004
Turn FedEx tracking into RSS. Ben Hammersley has hacked a way to turn the tracking data from your FedEx package into an RSS feed. [via Boing Boing 7/5/2004]
11:00:50 AM    

 Monday, September 20, 2004
Details on my link blog
Yes, I'm addicted to reading RSS. It's 2:33 a.m. and I just got done with looking through my 808 NewsGator feeds (representing about 2000 blogs). OPML file here. Feeds.Scripting list here. Just uploaded a bunch of good stuff to my link blog. I haven't talked about how I do my link blog lately so my newer readers might have missed that.
First, I'm reading all 808 feeds in Outlook. They come into 808 folders. If someone updates their blog, their folder turns bold. I click on the folder. The new items are also bold. I'm reading Techdirt right now and three items just were posted in the past hour or so. So, I read those.
If I like an article, I drag it to a folder named "Blog This." For instance, I just read the Techdirt article titled "Who Do You Trust, The Wiki Or The Reporter?" I think that article belongs on my link blog. So, I drag it to my Blog This folder. Then a tool named "OutlookMT" takes over. It is a .NET app that watches the Blog This folder, and posts anything dropped in it. Now, notice that's all I do. Just drag-and-drop. No editing. No commenting. No linking. OutlookMT does it all.
Outlook MT can either repost the entire original post, or it can try to quote a little bit of the post. I used to have it quote the entire post, but people complained that I was stealing their content. So, Kunal Das (the guy who wrote Outlook MT) rewrote his tool to pull only a portion of the original post and put that up there. Either way, this lets me scour a large number of weblogs and pick the best stuff and put it up on that blog. I call this my "magic folder." It's totally changed how I blog and lets me share my favorite stuff with you in a very efficient way. [Scobleizer: Microsoft Geek Blogger 8/27/2004]
7:39:18 PM    

RSSCalendar is an exciting new way for individuals and organizations to share their calendars with family, friends, and colleagues - utilizing the latest in RSS technology, including RSS channel creation and aggregation. Not only is RSSCalendar easy to use, but it is also easy to administer, and setup is a snap. RSSCalendar is well-suited for a variety of uses, including: Individual calendars Company calendars School calendars Organization calendars Team calendars City calendars…
6:38:16 PM    

 Tuesday, September 14, 2004
An Overview of Web Browser Express. Build your own Web browser that supports tabs and an integrated link to a search engine. [MSDN Just Published 7/20/2004]
12:39:02 PM    

 Saturday, September 11, 2004
Calendar Updates TV Listings. Calendar Updates provides TV Listings that can be viewed directly from within Microsoft Outlook. Each listing in the TV grid includes a calendar icon that allows you to add your favorite shows directly to your calendar. Just click the calendar icon on the TV grid and the show is added to your calendar with a convenient reminder. Free, requires .NET Framework. [Slipstick - Outlook and Exchange News 8/23/2004]
8:39:02 PM    

 Monday, March 15, 2004
Jeff moves to Das Blog: My mentor and co-worker Jeff Sandquist moved his weblog from Radio UserLand to Clemens Vasters' "Das Blog" (which was written in .NET) over the weekend and writes about the experience.[Scobleizer: Microsoft Geek Blogger 2/23/2004]
7:38:18 PM    

 Saturday, March 13, 2004
The Future of Blog Tools. Lisa Williams has been going through all the notes of ideas that people left on Dave Winer's blog about "the future of blog tools." She's written up this awesome summary. Thanks to Amy Wohl for pointing to this.[Scobleizer: Microsoft Geek Blogger 3/13/2004]
7:31:30 AM    

Ian Hanschen: "Presenting BlogNavigator. The ultimate in RSS experience." Very cool looking. Anyone try this yet? Ian's stuff always looks so cool. [Scobleizer: Microsoft Geek Blogger 3/13/2004]
7:30:44 AM    

 Friday, March 12, 2004
New version of SharpReader ships. Luke Hutteman has released a new version of SharpReader (an RSS News Aggregator done in .NET).[Scobleizer: Microsoft Geek Blogger 2/29/2004]
5:51:39 PM    

 Wednesday, March 03, 2004
Creating a Generic Site to RSS Tool. Roy Osherove writes, “I’ll show how to use regular expressions to parse a Web page’s HTML text into manageable chunks of data. That data will be converted and written as an RSS feed for the whole world to consume. Finally, I’ll show how to create a generic tool that enables you to automatically generate an RSS feed from any website, given a small group of parameters. At the end of the day we will have a working RSS feed.” [3/1/2004]
8:56:20 AM    

 Monday, February 23, 2004
.NET Screen Scraping in depth by Damian Manifold 30 Oct 2003 Everything you need to know about screen scraping, from simply pulling down a page to more complex issues like submitting forms and cookies. Here you will learn how to use the Webclient and httpWebresponse classes and which is better for what task
6:16:37 PM    

 Saturday, February 14, 2004
RSS 2.0 Framework. The RSS 2.0 Framework. Enables .NET programmers to add syndication to their apps.[The Scobleizer Weblog 12/16/2003]
6:35:54 PM    

 Friday, February 13, 2004
RSS self-defense. Now that I'm accumulating my inbound feeds as XHTML, in order to database and search them, I find myself in the aggregator business, where I never planned to be. The tools I'm using to XHTML-ize my feeds are Mark Pilgrim's incredibly useful ultra-liberal feed parser and the equally useful HTML Tidy, invented by Dave Raggett, and maintained by folks like Charlie Reitzel, one of CMS Watch's Twenty Leaders to Watch in 2004 (along with yours truly). ... [Jon's Radio 2/8/2004]
9:08:30 AM    

 Friday, January 09, 2004
Eclipse RSS Reader. Eclipse RSS Reader: Publishing web-based news of all kinds via a summary format (RSS) is becoming increasingly popular. The applications include regular headline news (Yahoo! News), web logs (Slashdot.org), professional bulletins (IBM developerWorks), and project updates (SourceForge.net). A variety of RSS formats currently exist, which increases application complexity. Often, a reader capable of understanding one format cannot handle another.... [Lockergnome's RSS Resource]
5:44:18 PM    

 Saturday, November 15, 2003
intraVnews vs NewsGator. In short: intraVnews competes directly with NewsGator as an RSS reader for use on desktops for private (FREE) as well as corporate (licensed) use. It is 100% .Net based, developed in C# using Microsoft's Interop libraries, works on Outlook XP and 2003, Windows 98SE and up (.Net Framework v1.1). They are trying to take a fresh approach based on the following principles: (1) RSS is what counts, NNTP is not interesting; (2) the user must never be hampered when they are using Outlook for real work (things other than RSS);...
7:04:09 PM    

 Sunday, October 19, 2003
Generic Site-To-Rss Tool. Via the MSDN Academic Alliance: "I got to thinking: 'All the data on the site that’s important to me seems to be arranged in an orderly and predictable manner. I should be able to parse it in a fairly easy manner and make it into an RSS feed.' So I started trying. It worked out pretty well. So well that I’ve come up with a way to let you do your own site scraping using a generic tool, providing it with only simple rules expressed as a single regular expression."... [Lockergnome's RSS Resource]
6:09:30 PM    


 Older than October 19, 2003