History

Posted by Mike Wed, 16 Apr 2008 13:34:00 GMT

Here I give in to a Silly Internet Meme (Tim made me do it).


mpierson@macbook:~$  history|\
awk '{a[$2]++} END{for(i in a){printf "%5d\t%s \n",a[i],i}}'|\
sort -rn|head
  195	ls 
  121	g 
   83	svn 
   75	cd 
   71	sudo 
   50	less 
   32	grep 
   28	xbacklight 
   22	ssh 
   20	./update.sh 

“g” is an alias for gvim and “update.sh” is an IDM build script.

... and I've added Tim's “lh” to the macbook

Posted in  | Tags ,  | no comments | no trackbacks

Web 2.0

Posted by Mike Mon, 03 Oct 2005 19:24:00 GMT

As Tim O'Reilly and Tim Bray say: 'there's still a huge amount of disagreement about just what Web 2.0 means'. Herewith, my summary of O'Reilly's piece What Is Web 2.0.

O'Reilly describes priciples shared by successful 'Web 1.0' successes and interesting recent applications. See the meme map that came out of a brainstorming session of a FOO Camp conference.

  1. The Web As Platform

    Web as platform is an old idea but it's implementation has been refined. See Netscape vs. Google, DoubleClick vs. Ad Sense, Akamai vs. BitTorrent.

  2. Harnessing Collective Intelligence

    Open Source software, open content, collaborative categorization, viral marketing, all rely on a collective intelligence. Site attributes such as extensive (permanent) hyperlinks, low barriers to participation, organized content and meta data facilitate or enhance the affect of collective intelligence. Blogs are a special case of collective intelligence (and RSS a special attribute) in that the collective intelligence only emerges from a critical mass of blogs/articles.

  3. Data is the Next Intel Inside

    Based on the way they approached their databases, MapQuest is a Web 1.0 story and Amazon is a Web 2.0 story. MapQuest licensed map data from Tele Atlas, but did not enhance (e.g. user annotations) or control the data. Amazon licensed ISBN data from R.R. Bowker and enhanced the data with data from publishers and customers. MapQuest was soon joined in the marketplace by competing services (Yahoo, Google, MSN) and Amazon is the standard source for bibliographic data.

  4. End of the Software Release Cycle

    In Web 2.0 software is delivered as a service not a product.

    O'Reilly suggests a number of fundamental changes to the business model of software companies.

    • Operations must become a core competency. Google has become experts at managing the servers that deliver their web services. And the expertise is closely guarded.
    • Users must be treated as co-developers. Release early and often (daily, hourly) and/or a perpetual beta. Real time monitoring of user behaviour.
  5. Lightweight Programming Models

    Simple, lightweight service interfaces appear to be successful with the masses (i.e. the intelligent collective). (One assumes that housingmaps.com enhances the value of Google maps?)

    Three lessons identified:

    • Support lightweight programming models that allow for loosely coupled systems
    • Think syndication, not coordination
    • Design for 'hackability' and remixability
  6. Software Above the Level of a Single Device

    ITunes, Tivo, blackberry...

  7. Rich User Experiences

    Google/Flickr/Basecamp are at the forefront, but Yahoo and others have made AJAX the basis for major product releases.

O'Reilly finishes with a summary of the core compentencies of a Web 2.0 company:

  • Services, not packaged software, with cost-effective scalability
  • Control over unique, hard-to-recreate data sources that get richer as more people use them
  • Trusting users as co-developers
  • Harnessing collective intelligence
  • Leveraging the long tail through customer self-service
  • Software above the level of a single device
  • Lightweight user interfaces, development models, AND business models

Posted in , ,  | no comments

McGrath on Documentation

Posted by mop Tue, 12 Apr 2005 15:30:00 GMT

Go read this short article by Sean McGrath on the subject of test driven documentation. Unit tests as documentation is not what Knuth had in mind when he coined the phrase Literate Programming, but it’s a step in the right direction.

Posted in  | no comments | no trackbacks

XDoclet code generation

Posted by mop Tue, 30 Nov 2004 01:43:00 GMT

X, as in eXtreme, not XML. XDoclet leverages metadata encoded withing Java classes as Javadocs, generating content (Java classes, JSPs, etc.) as part of a build process. The model is well suited to EJBs, Struts, as well as mixed content (generated plus hand crafted) files. XDoclet is also easy to apply in ad-hoc situations.

The premise is simple enough: put a custom javadoc tag in a Java source file then apply an XDoclet transform to produce a helper class, a JSP, a unit test, whatever. The transform can be as simple as an XDt template that references the custom javadoc tag, or a custom Java-based processor that applies complex logic to the tag-encoded metadata.

Posted in  | no comments | no trackbacks

Web quickies

Posted by mop Mon, 01 Nov 2004 23:22:00 GMT

Some tidbits from my Bloglines RSS subscriptions.

Lint4J

"Lint4j ("Lint for Java") is a static Java source code analyzer that detects locking and threading issues, performance and scalability problems, and checks complex contracts such as Java serialization by performing type, data flow, and lock graph analysis."

JotSpot

It’s Wiki++. Typical intranet functionality is available to Wiki users. As seen on John Udell’s blog. (I’ve added John’s blog to my roll.)

Blogging Your Build

Blogs aren’t just for people, you’re processes should be blogging too. Oh yeah.

Debian on Dell Servers

ISOs and pointers for those brave enough to run Dell servers.

Posted in , , ,  | no comments | no trackbacks

Google-like searches with Lucene

Posted by mop Wed, 20 Oct 2004 16:30:00 GMT

Lucene is a Java system for "high-performance, full-featured text search". The software apears to be mature, and the community has produced a fair bit of documentation. A replacement for RDBMS-based searches?

No doubt that the searching is more intuitive, and would make it easier for users to perform keyword searches. Not sure that a Google-like engine could match RDBMS for field-based searching and fancy list navigation.

  • Phonetix integrates phonetic algorithms into Lucene
  • Luke provides a high level interface (Java and GUI) to Lucene’s generated indexes
  • limited benchmarks are available

Posted in ,  | no comments | no trackbacks

HTML quickies

Posted by mop Fri, 10 Sep 2004 15:47:00 GMT

Some clever solutions to file for a rainy day...

Javascript popup object

Matt Kruse seems to have done a good job creating a flexible Javascript object for browser popups. It support tool-tip style boxes, as well as traditional pop-up windows.

Norm Walsh, the don of DocBook, mentioned Matt’s work in his discussion of DocBook annotations.

Tag Soup

John Cowan wrote a SAX compatible parser for ’nasty and brutish’ HTML, called Tag Soup. This lenient parser takes poorly formatted HTML snippets and parses them into a valid tree. Seems like a must-have for any web application that allows users to enter HTML mark-up.

Norm Walsh uses Tag Soup to parse comments authored by visitors to his blog. Interesting that even the comments to Norm’s blog are syndicated.

Posted in ,  | no comments | no trackbacks

Converting a MS SQL Server database to PostgreSQL

Posted by mop Wed, 11 Aug 2004 16:46:00 GMT

Herewith some notes from my attempt to migrate a database instance from Microsoft SQL Server 7 to PostgreSQL 7.3. My journey began with Ian Harding’s how-to, and it’s a good place to start.

export from SQL Server

The bcp utility is a quick and flexible command line utility that extracts raw table data (or query result) to a file. It works pretty much as advertised, with the only tricky parts being the treatment of nulls and character encoding. Ian suggested using the -k parameter which forces bcp to use a null character (x00) to represent an empty field, it’s probably the right thing to do. Unfortunately bcp does not distinguish between empty fields(i.e. value is null) and fields containing an empty string. Character encoding can be dealt with in two ways: the -c parameter will force all text data into ASCII text, or the -w parameter will encode text as UTF-16. The two-byte representation would be a no-brainer, except that the data swells to (almost) twice the original size.

Here’s what I used for each table in the database:

 bcp dbname..tablename out ’filename’ -w -k -t "<f-end>" -r "<record-end>" -b 1000

where -b is the number of rows per transaction, and the -t and -r parameters indicate the field and record delimiters. The key when choosing delimiters is to avoid conflicts with field values.

mangle the exported data

Here’s what I did to the export of each table (after moving files to a Linux box):

 # transform to 8 bit encoding
 recode utf-16..utf-8 $1

 # TODO check for literal ’
’, ’	’

 # replace back slash with forward slash
 perl -pi -e ’s!\!/!g’ $1

 # replace tabs with literal ’	’
 perl -pi -e ’s/	/\t/g’ $1
 # replace line breaks with literal ’
’
 perl -pi -e ’s/
/\n/g’ $1

 # replace field delimiter with tabs
 perl -pi -e ’s/<f-end>/	/g’ $1
 # replace record delimiters with line break
 perl -pi -e ’s/<record-end>/
/g’ $1

 # remove Windoze line feeds
 perl -pi -e ’s/
//g’ $1

 # remove nulls
 perl -pi -e ’s/x00//g’ $1

Here’s the step by step explanation:

  • bcp exports Unicode using UTF-16, PostgreSQL expects UTF-8; and UTF-8 is easier to move around via SCP
  • the backslash character is significant when importing into PostgreSQL, and I couldn’t think of a reason to keep them in a field value
  • PostgreSQL uses the backslash to encode tabs and line breaks within fields values
  • obviously tabs and line breaks are used as delimiters
  • just housekeeping, I don’t think the line feeds cause a problem
  • nulls seem to confuse PostgreSQL’s import process

Notes: next time around I’ll use sed instead of perl, but I was too lazy to check the syntax of recode for stream ops; you’ll see that the null characters inserted by bcp to represent empty fields are being stripped - could be that we don’t need the nulls in the exports, or that we should keep them in the export and convince PostgreSQL that they are significant.

create PostgreSQL schema

I used brute force. It would be nice to build a schema.sql script with ant and makedata.

import into PostgreSQL

The COPY command allows table data to be imported from a local file. The only tricky part is the interpretation of null field values; following Ian’s lead I’ve specified the empty string:

 COPY tablename FROM ’filename’ WITH NULL AS ’’;

This approach worked for all tables except those that contained empty strings in columns defined as ’NOT NULL’. I kluged these tables by altering the schema: "... ALTER COLUMN xxx DROP NOT NULL".

update

It’s probably also a good idea to run the maintenance.sql script to clean up some tables before extracting. Smaller is better.

Posted in ,  | no comments | no trackbacks

Older posts: 1 2