Tuesday, September 29, 2009

Browsing open source code history

There is a killer characteristic of open source that seems to be often overlooked: the ability to browse the source code history.

This will sound trivial to many, but it's something that I've seen a lot in questions asked on stackoverflow.com and other forums. People ask: "why has feature X changed?" "Why am I getting an exception P since I upgraded project Q to version R?"

The first thing you need to do is assume there is a good reason for that change and set out to find it, instead of considering it a bug in the OSS. Keep yourself humble, remember that the people working on OSS are generally very smart and really know what they're doing.

And the really generic answer to such questions would be (without any intention of being harsh): you can usually find out yourself.

An example: this stackoverflow question (here summarized):

In the previous version of NHibernate (2.0.1) the following property will validate:

internal virtual BusinessObject Parent
{
  get { /*code*/ }
}

However, in 2.1 it errors saying that the types should be 'public/protected virtual' or 'protected internal virtual'. Why is this requirement now there?

Now I've been using NHibernate since 0.83 but I'm no expert in NHibernate internals. Despite that, I was able to answer the question by doing this:

  1. Check-out NHibernate's source.
  2. Grep the source code for "public/protected virtual". Only result is NHibernate.Proxy.DynProxyTypeValidator.
  3. Go to NHibernate's Fisheye, browse to DynProxyTypeValidator.
  4. Browse the file's history, starting from the latest diff (at the time of writing it's this one), looking for changes in the proxy validation exception message.
  5. Only two diffs back I find the relevant commit.
  6. The commit message says:
    - Fix NH-1515 (BREAKING CHANGE)
    - Minor refactoring of proxy validation stuff
  7. Go to NHibernate's JIRA. Browse to NH-1515. Read the issue description.

That's it. No special knowledge needed. This process could be further simplified by using Fisheye's search instead of locally grepping (but I never seem to get good results from Fisheye) or getting NHibernate from github instead of svn and then grepping with git-grep or gitk. Browsing history with svn is so painfully slow that I prefer to do it with Fisheye.

By the way, this is the same approach I use on closed code at work when my boss asks me "Hey, this feature used to behave differently! When and why did we change it?" (and this happens a lot)

This is where best practices like specific commit messages, atomic commits and good issue tracking really pay off.

Documentation? I save that for stuff that this approach can't possibly handle.

Saturday, September 26, 2009

Testing IIRF rules with MbUnit

If you're running IIS 6 like me, you know there aren't many options if you want extensionless URLs. I already had a custom URL rewriting engine in place with Windsor integration and stuff, but it couldn't handle extensionless URLs. The 404 solution seems kinda kludgy to me so after some pondering I decided to go with IIRF.

Ok, now let's see how do we test IIRF rules. IIRF is a regular Win32 DLL written in C, so there are no managed hook points that we can use from .Net. Fortunately, it comes with a testing tool called TestDriver, which is a standard .exe that takes a file with source and destination URLs and runs all source URLs against the rules, asserting that each result matches the destination.

All we need now is some glue code to integrate this to our regular tests so if any routes fail it also makes the whole build fail. I also want to see on my TeamCity build log exactly which route failed, and also be able to easily add new route tests. We can do all this with MbUnit's [Factory] and some stdout parsing. Here's the code:

[TestFixture]
public class IIRFTests {
    [Test]
    [Factory("IIRFTestFactory")]
    public void IIRFTest(string orig, string dest) {
        File.WriteAllText(@"SampleUrls.txt", string.Format("{0}\t{1}", orig, dest));
        var r = RunProcess(@"TestDriver.exe", @"-d .");
        if (r.ErrorLevel != 0) {
            var actual = Regex.Replace(r.Output.Replace("\r\n", " "), @".*actual\((.*)\).*", "$1");
            Assert.AreEqual(dest, actual);
        }
    }

    public IEnumerable<object[]> IIRFTestFactory() {
        yield return new object[] { "/questions/167586/visual-studio-database-project-designers", "/Question/Index.aspx?id=167586" };
        yield return new object[] { "/users/21239/mauricio-scheffer", "/User/Index.aspx?id=21239" };
    }

    public ProcessOutput RunProcess(string fileName, string arguments) {
        var p = Process.Start(new ProcessStartInfo(fileName, arguments) {
            CreateNoWindow = true,
            RedirectStandardOutput = true,
            RedirectStandardError = true,
            UseShellExecute = false,
        });
        p.WaitForExit();
        return new ProcessOutput(p.ExitCode, p.StandardOutput.ReadToEnd() + p.StandardError.ReadToEnd());
    }

    public class ProcessOutput {
        public int ErrorLevel { get; private set; }
        public string Output { get; private set; }

        public ProcessOutput(int errorLevel, string output) {
            ErrorLevel = errorLevel;
            Output = output;
        }
    }
}

Monday, September 14, 2009

SolrNet 0.2.3 beta1

Just released SolrNet 0.2.3 beta1. Here's the changelog:

  • Fixed minor date parsing bug
  • Added support for field collapsing
  • Added support for date-faceting
  • Upgraded to Ninject trunk
  • Upgraded sample app's Solr to nightly
  • Added StatsComponent support
  • Added index-time document boosting
  • Added query-time document boosting
  • Bugfix: response parsing was not fully culture-independent
  • All exceptions are now serializable
  • Fixed potential timeout issue
  • NHibernate integration
  • Fixed Not() query operator returning wrong type

These are the interesting new features:

Field collapsing

This is a very cool feature that isn't even included in the Solr trunk. It's currently only available as a patch, but hopefully it will make its way to trunk soon. It allows you to filter query results based on a document field, thus making a flexible duplicate detection.

StatsComponent

This one is a Solr 1.4 feature (currently only available from trunk or nightly builds). Like the name says, it gives you statistics about your numeric fields within your query results. The statistics are: min, max, sum, count, missing (i.e. no value), sum of squares, mean, standard deviation. The cool thing about this is that you can facet it, thus getting separate stats for each value of the field.

Date faceting

This allows you to trigger faceting based on date ranges, i.e. you can create a facet for each day from 8/1/2009 to 9/1/2009.

NHibernate integration

This is similar to the NHibernate.Search project. It synchronizes a database with Solr (if the Solr document fields are similar to a NHibernate entity fields) and it allows you to issue Solr queries from a regular NHibernate ISession (well, actually a ISession wrapper). You can see more details about its usage in the wiki.

Contributors to this release: Derek Watson, Matt Mondok, Juuso Kosonen.

Get it here:

I'll probably call this a GA release in a couple of weeks if there aren't any serious bugs and once I get the wiki updated.

P.S.: I'll take this opportunity to clear some things up about the project. SolrNet started, like many open source projects, as a way to scratch my own itch. But in the last few months, it has grown beyond that. As a result, I don't have a use for many new features so I am not so motivated to implement them and I don't have any chance to test them in the wild to iron out any bugs and to make sure they are release-quality. This means that I need help from the community (yes, that includes you! ;-) in the form of:

  • patches for new features, bugfixes, documentation, code samples
  • bug reports
  • feature requests
  • general suggestions (e.g. "it would be cooler to do x like this instead of how it's currently done")
  • voting for issues you consider important/useful in http://code.google.com/p/solrnet/issues/list might boost their priority.
  • general usage feedback (e.g. "we've been using SolrNet for 3 months now at www.example.com. The features we especially use are: facets, ninject module, more like this")

Trunk is very stable, right now there are 368 tests that cover around 80% of the code. I strongly encourage you to get new builds from the build server (see artifacts links) and let me know how it works out for you (both positive and negative constructive feedback are useful).

Finally, I could offer some basic commercial support if you need some feature urgently and don't have the resources to code it.