A brief survey of Java IDEs

Posted by mop Thu, 29 Jul 2004 03:47:00 GMT

Herewith a record of my reaction to four Java IDEs.

Eclipse

I’ve used Eclipse at various times to do some code quality analysis (managing import statements mostly). In the past it has been cumbersome to use, especially for our Ant/CVS based projects. It’s getting better. Of course my hardware is better than it used to be; IDEs certainly benefit from a beefy CPU and big monitor. The CVS integration is pretty slick even when dealing with branches and remote repositories. The integration with Ant is still a bit tricky, although I’m not sure there is a nice way to mesh Ant builds with a built-in compiler. The built-in compiler appears similar to Jikes (maybe same code base?), perhaps there are plug-ins that offer a more in-depth analysis. I’ve yet to give the debugger in Eclipse a run-through.

Overall rating: 7/10.

JBuilder

Note that I was prejudiced against JBuilder because it’s a commercial product without providing an obvious advantage over the open source alternatives. After filling out several pledge-my-first-born registration forms and downloading 100MB of files, the installation process was painless. Creating projects is intuitive and Borland is the only IDE I know of that is explicit with respect to character encoding -- smart. Unfortunately I could not get JBuilder to check out the example project from CVS, so I did limited testing. The web site implied that the IDE would integrate with Borland’s OptimizeIt tool -- I couldn’t verify this because of the CVS issue, but I did try OptimizeIt outside the IDE. Profiling is probably a good idea if you have oodles of time and a lot of patience (I have neither). The standard version of JBuilder is $500/seat, the enterprise version (EJBs etc) is $3500.

Overall rating: 5/10.

IntelliJ IDEA

I’ve heard IntelliJ referred to as the Cadillac of Java IDEs. It has the nicest visuals of any of the other candidates -- nice widgets, decent sized icons, and pleasant colours are important if you’re staring at them all day. IntelliJ integrates will with Ant and CVS, although there is a bug in the CVS client that corrupts jar files on checkout. The big selling feature of IntelliJ are the code inspection tools: detection of unused methods and members, complexity measurements, scope analysis, etc.. Most of the issues identified by the inspection tools are presented with suggested fixes. Impressive but overwhelming. IntelliJ IDEA is $500/seat.

Overall rating: 8/10

NetBeans

The IDE backed by Sun, and I hear there are some mind-altering plug-ins on the way from Tim Bray, but it’s not ready for prime time yet. The file system paradigm used for projects was not a good fit with our CVS project, and the compiler was unable to build for lack of memory. I have heard that the NetBeans debugger is head and shoulders above the rest.

Overall rating: 5/10.

Posted in  | no comments | no trackbacks

Modelling Application Statistics

Posted by mop Fri, 23 Jul 2004 19:28:00 GMT

I recall one of my stats profs saying once: ’not all random phenomena can be modelled with a normal distribution, but if your measurements are not normal then you are measuring the wrong thing’. He said it in jest, me thinks, but he had a point.

What happened was, I have been measuring response times for a busy web application. Forgetting most of what I learned in school, I’ve been recording the mean average and peak response times, i.e. average time and peak time it takes to service HTTP requests. (I use a fixed size sample, not samples from a fixed period, see Calculating a moving average.)

Recognizing that peak and average values are not entirely representative of server behaviour, I started by examining the distribution of response times for an arbitrary time period (1000 requests, about 10 minutes worth). It’s obviously not a traditional normal distribution, eh? If it were a normal distribution, we could estimate the mean average and variation at be confident that we’ve accurately characterized the distribution of response times. But it’s not normal so the fun begins.

Like any good engineer, I know just enough to bluff my way through a cocktail party, and enough to know where to look. The NIST has a great online resource: NIST/SEMATECH e-Handbook of Statistical Methods. Skip to the Tools and Aids section and you’ll find a gallery of probability distributions. If I squint a little, any one of Weibull, Lognormal, Power Lognormal could be an appropriate match for our distribution. The Power Lognormal curve is probably the closest, so on to the next step.

Remember, we’re trying to model our response times in a way that will allow us to characterize the distribution at any time with simple metrics like mean average and variance. The characteristic formula for Power Lognormal distributions is

which is expressed in terms of the normal distribution. So, describing our distribution would involve mean, variance plus the power and shape parameters. Four metrics, hmmm.

Here’s where I realize that I’m probably measuring the wrong thing.

Posted in  | no comments | no trackbacks

Calculating a moving average

Posted by mop Sun, 18 Jul 2004 17:41:00 GMT

What I need is an efficient generalized approach to calculating a moving average. I hope to measure the time required to complete an operation (many ops per second) and record a moving average of these times for at least two periods. I expect that the periods may be sufficiently large to cause storage issues if all measurements for the period are stored.

As an example, lets say that 100 operations are performed per second, and that I’ll be calculating a 30 minute moving average. That’s 180,000 data points. Of course, I don’t need to store all the measurements in order to calculate a simple mean average, just keep a running sum and a count. Now let’s say I want to record this moving average once every ten minutes -- storing a sum and a count is no longer enough.

There are a few ways I can think of to deal with the problem:

  • change the definition of the moving average to include a fixed number of measurements instead of a fixed period
  • make an assumption about the number of measurements in a thirty minute period, then the average could be approximated for each new data point using μ = (μ⋅n + x)/(n + 1)
  • again, make an assumption about the number of measurements in a period, but adjust the assumption as the rate of operations changes
  • store all the data points for at least thirty minutes ;-)

The first option is appealing for a number of reasons: it is simple, accurate, and allows for more more in-depth statistical measurements. It does however call for a judicious choice of the sample size. In our example, we likely would not choose a sample size equivalent to 30 minutes of measurements. Assuming that 30 minutes was an arbitrary measure (it is) I’ll use an equally arbitrary sample size.

Posted in  | no comments | no trackbacks

In-Touch Makedata

Posted by mop Thu, 08 Jul 2004 03:01:00 GMT

Not sure where I’d like this type of documentation to exist, but it should be preserved for posterity. So... here’s some background on In-Touch’s object-relational framework: Makedata.

Makedata is a Java app that creates *.java files that encapsulate the RDBMS <--> Java mapping. Each object/table is defined in a simple text file () which Makedata transforms into three files:

table.sql
a SQL script that creates the table and indices
tableRawData.java
a java object representing the raw data (members plus accessor methods, very bean-like)
tableRawRowBuffer.java
a java object that knows how to read and write the raw data to/from the database

The generated classes are generally found in the ./target/generated-datafiles/ directory. The Ant task ’datac’ invokes makedata.

Each *RawData class is a sub-class of intouch.data.Data. Each *RawRowBuffer class is a sub-class of intouch.sql.RowBuffer. Generally we extend each of the generated classes and these sub-classes of *RawData and *RawRowBuffer become containers for business logic.

Another important class is intouch.data.View which defines a bit-field representing each fields in a data object. A view of the table/data is defined by turning bits on and off.

We also use these classes to represent table joins. There are two abstract classes intouch.data.ViewData and intouch.data.ViewDataRowBuffer which are extended to define joins. Again, for each object-rdbms map there are two classes, one for data and one for db read/write.

Posted in ,  | no comments | no trackbacks

measuring application performance

Posted by mop Sun, 04 Jul 2004 13:46:00 GMT

I’m thinking a lot about measuring performance these days. Not benchmarks, stress testing or profiling, but real time performance. The largest web site we support expects to grow substantially in the coming months and predicting the hardware and software requirements is tough without real time metrics. What I’ve quickly realized is that measuring is the easy part (relative, eh).

We’re running multiple application servers (Apache + Java) behind a single load balancer and in front of a single instance of SQL Server. It’s not difficult to extract metrics from each component in the stack: Apache stats are available via mod_status, the load balancer uses Linux Virtual Server so ipvsadm -l yields lots of info, and database metrics are as easy as select count(*) from .... For now, I’m using MRTG to manage these metrics.

There are still some important metrics missing (running average of response time, SQL TPS) but already it’s apparent that we’ll need a way to distill all the data. ISPs must have the same problem, many servers, many graphs. And what Google must have to do with all their servers, yikes. MRTG does have the ability to trigger an alarm when a metric exceeds a pre-defined threshold, might be worth pursuing.

Posted in  | no comments | no trackbacks

a first look at Jython

Posted by mop Thu, 01 Jul 2004 22:42:00 GMT

I’ve had my first in-depth look at Python and the Java implementation of the Python syntax: Jython. The goal was to create some scripts to generate simple reports on the status of a Java web app and back-end database.

A good thing: Python supports explicit character encoding of the program source. A bad thing: still not sure if there are any Python boolean primitives (Update: found the grammar, no boolean primitives yet).

Python’s syntax comes reasonably natural -- significant white space is less awkward than I expected. I have not explored the object model yet; instead I’ve been relying on (existing) Java classes to do the hard work.

Combining a well-defined scripting language with Java is a powerful idea. The scripts are short and readable, and the process was very efficient. There was re-use within the scripts themselves and they are not particularly fast, but that’s ok when writing quick and dirty reports.

If you’re interested, there is a lengthy comparison of the Java apple and Python orange, probably out of date. I stumbled on this while looking for the Python equivalent of javadoc (G: python + javadoc). I think the answer is pydoc.

Posted in  | no comments | no trackbacks

fun with character encoding

Posted by mop Tue, 29 Jun 2004 03:45:00 GMT

More fun with character encoding... this time with Perl. I’ve been down the encoding path with Java, discovering on the way some of the flaws in the IO libraries (see esp. FileWriter and FileReader).

Similar to Java, Perl does support Unicode in it’s string representation, UTF-8 actually. It’s nice, ’cause Perl regular expressions can act on Unicode strings -- as long as you’re consistent in using UTF-8 characters. The problems start when using modules that don’t handle anything other than 7-bit ASCII. Same mistakes as you’ll find in Java libraries. To be fair, the Perl modules are centrally organized via CPAN, but not designed/written/QA’d by the owners of Java, so we should cut the Perl module authors some slack.

But you’d think that a Perl module related to XML would be different. The XML:CSV module is a handy tool that uses the Text::CSV_XS library to parse comma delimited (whatever that is) records and spit out proper XML. Unfortunately the CSV_XS module silently fails when it encounters non-ascii data.

Perl itself does a decent job of explicit handling of character ecoding when doing IO. Perl started as a text processing engine, and text based IO is still the bread and butter for the Perl sandwich. There is even an extensive description of Perl’s Unicode and encoding if you go looking. Specify the encoding for any file or stream, before or after it’s been opened, and Perl transparently converts as required.

Ok, so we know that Perl uses Unicode internally, and that some modules expect ASCII only. My immediate problem involves XML so I’m quite happy to use 7-bit ASCII plus entities for the extended characters. Turns out this is easily done with the Encode module. Problem solved, I think.

Update: of course, all this applies to recent versions of Java/Perl. In particular, the clever conversion from ISO-8859-1 to ASCII doesn’t work with the version of Perl in Debian woody. yuck.

Posted in ,  | no comments | no trackbacks

Older posts: 1 2