Keynoting RubyNation

Posted on May 08, 2008 by stu

I will be the closing keynote speaker at RubyNation. I'll be beating my current favorite drum: Ending Legacy Code In Our Lifetime. (And yes, the Ruby community has legacy code. If we don't change our ways soon, we will end up with a lot more of it.)

Intro to Ruby on Rails coming in May

Posted on April 07, 2008 by stu

I (Stu) will be heading to Washington, D.C. the week of May 5-9 to teach a public Introduction to Ruby on Rails course.

We'll start at the very beginning, exploring Ruby from the console shell irb. Then we'll dive into Rails 2.0. By the end of the course we will be talking about how best practice shops use Rails (hint: test and refactor!).

Rails is one important step on the path to a future without legacy code. Isn't it time to take that first step?

Ruby: real-world performance metrics

Posted on March 26, 2008 by stu

As people spend more time with Ruby 1.9, JRuby, and Rubinius, we are seeing a lot more benchmarks. It has been a while since we published any metrics, so I thought now would be a good time to summarize our recent experience on some real projects. We chose to measure some different things, however:

  • DPC. Developer Productivity, as judged by the Client. This is the percentage of iterations where the client felt that the developers were more productive in Ruby than they would have been with their second choice "Platform B."
  • DPR. Developer Productivity, as judged by Relevance. Like above, but in Relevance's opinion.
  • DP5. Is Developer Productivity in the Top 5 list of things we would like to have more of? In other words, a project with a high DPC and DP5 means that the client is happy with Ruby, but would be even more happy if Ruby was, um, Rubyer.
  • CS5. Is Code Speed in the Top 5 list of things we would like to have more of?

Here's the numbers. The trends are not subtle.

            DPC      DPR      DP5      CS5
Project A:  100%     100%     100%     0
Project B:  100%     100%     100%     0
Project C:  100%     100%     100%     0
Project D:  100%     100%     100%     0

At this point, even caring about Ruby's language performance would be a premature optimization at the business process level. Ruby's runtime performance is a non-issue for a broad spectrum of applications. In fact, I believe that all of our customers would be even happier with a language that ran 50% slower than Ruby, if it also made the development team a mere 10% more productive.

Frozen Gems Generator

Posted on March 06, 2008 by glenn

Jay Fields blogged recently (and not for the first time) about managing gems within Rails projects. This is a problem a lot of people have wrestled with; there are close to a dozen plugins, rake tasks, uncommitted patches, and published hacks that attempt to provide a solution (and those are just the ones I know of).

FrozenGemsGenerator is the solution that we’ve been using on some projects at Relevance, and we’re happy enough with it that we’ll be using it more. It’s a rails generator, packaged as a gem, that gives your Rails app a private gem repository, fully self-contained, and manageable just like your system-wide repository (except using script/gem instead of gem).

$ sudo gem install frozen_gems_generator
$ script/generate frozen_gems
$ script/gem install money

script/gem supports all of the subcommands that the regular gem command does.

I haven’t yet implemented a solution for gems that install binary extensions. I’m very interested in suggestions for how best to solve that problem. Several of the other approaches have at least partial support for architecture-specific gems; the best may be Jeremy Voorhis’ CarryOn plugin, which is also the solution that’s closest in spirit to the FrozenGems approach. If you have ideas or suggestions about how architecture-specific gems should be handled, please add comments here or post them on our Trac instance.

Tarantula and TextMate

Posted on February 28, 2008 by stu

Tarantula is under active development as we use it internally to police our apps. If you grab the bits from this morning (R243 or later in the repository), you will see that stack traces in the log report now link back into TextMate.

We have a ton of features we would like to add, and I bet the community can think of plenty more. Please add comments to this post, or post into Trac, letting us know what features you would like to see next. Here's some possible choices to get you started:

  • a "Johnny Droptables" fuzzer that tries specific SQL injection attacks
  • docs detailing the kinds of errors we have been finding and how to fix them
  • an XSS fuzzer that tries to inject script tags (this is challenging because it isn't obvious how to automatically detect the symptom)
  • CSS validation
  • JS validation
  • UI features to make the reports more navigable and usable (be specific!)
  • integration with RSpec
  • blacklist of files your server should never return
  • Ajax crawling (Tarantula currently simulates plain old web requests)
  • Integration with other IDEs (you'll probably have to send us a tested patch because we're happy with TextMate)

Get your votes in today and we can look at them during open source Friday.

Tarantula vs. your Rails app

Posted on February 26, 2008 by stu

The Tarantula is a fuzzy spider. It crawls your rails app, fuzzing inputs and analyzing what comes back. We have pointed Tarantula at about 20 Rails applications, both commercial and open source, and have never failed to uncover flaws.

How does your Rails app stand up? It's easy to find out. Install the plugin, and create a Tarantula integration test: (Update: Note that Tarantula integration tests live in test/tarantula so that you can treat them separately in your cruise builds. For a substantial app or fixture set Tarantula can take a while to run!)

 
# somewhere in your test
require 'relevance/tarantula'            

# customize to match your security setup  
def test_with_login
  post '/sessions/create', :password => 'your-pass'
  assert_response :redirect
  assert_redirected_to '/'
  follow_redirect!
  t = tarantula_crawler(self)
  t.crawl '/'
end

Then rake tarantula:test, and then start looking through the Failures section of the HTML report.

Tarantula is just a baby now, but we plan to feed it until it is a lot bigger and meaner. Suggestions and contributions are welcome via the Relevance Open Source Trac.

Hat tip to Courtenay, whose SpiderTest plugin inspired me to go down this road. Also congrats to Mephisto, which is the best behaved app under Tarantula to date (only three problems, all minor broken windows).

Ruby puzzler: gsub, blocks, and procs

Posted on February 13, 2008 by stu

See if you can guess what this code will do before you run it in ruby.

upc = Proc.new {|m| $1.upcase}

puts "hello world".gsub(/([aeiou])/, &upc)
puts "hello world".gsub(/(\w)/, &upc)

def doit(str, re, blk)
  puts str.gsub(re, &blk)
end

doit "hello world", /([aeiou])/, upc
doit "hello world", /(\w)/, upc

Now try running it in JRuby. Whoa.

Have you killed a design pattern today?

Posted on February 13, 2008 by stu

Design patterns are the enemy of agility. They introduce repetition and accidental variation to your codebase. Design patterns encourage you to create "point solutions" throughout your application, instead of cleanly isolating concerns. And they will make your code refactor-proof, no matter how cool your IDE is. But there is hope: Catch your design patterns while they are young, and teach them to be library calls instead. Here's one example:

In Ruby, we often re-open existing classes and add instance methods. One approach is simply to open the class:

class NilClass
  def blank? 
    true
  end
end

Or, you could create a new module and mix it in:

module MyNilExtensions
  def blank?
    true
  end
end
class NilClass
  include MyNilExtensions
end

There are other approaches that are similar but not quite the same. In other words, this is a design pattern. From the Wikipedia entry:

A design pattern is not a finished design that can be transformed directly into code. It is a description or template for how to solve a problem that can be used in many different situations.

The problem with design patterns is the "not a finished design" part of the definition. Rather that a DRY solution, design patterns give you repetition throughout the code. Worse yet, the repetition is not exact. It is repetition with variation, and there is often no evidence whether the variation is intentional or accidental.

I like Ruby because I can eliminate design patterns when they start to annoy me. This "Open Class Add Method" pattern annoyed me for the last time earlier today, when two different libraries defined incompatible versions of Object#metaclass. Enough is enough. Let's make a library call for reopening classes.

Here are my design goals:

  1. The syntax should be terse. I hate defining a module and including it in two separate steps, I just want to open a block and go.
  2. I should have an audit trail for where new methods came from. (This is one reason to define a named model, because I can then reflect against it.)
  3. I should be protected from method collisions. This protection should be configurable. Collisions can be explicitly approved, or they can generate a warning, or an error.

These three goals are in conflict (and we could easily come up with more). This illustrates another problem with design patterns. Each time a design pattern is used, a programmer favors some design goals over others. Over time, this leads to a codebase at odds with itself. If the same pattern were captured in a reusable module, then changing design priorities could be handled from that module alone.

Here's a strawman proposal for cleanly adding methods to existing Ruby classes. The following code adds #jump to Object:

embrace{Object}.and_extend do
  def jump
    puts "jumping"
  end
end

The syntax is simple and involves just one block, meeting goal one. Behind the scenes, I use __FILE__ and __LINE__ to define a module, which gives us auditability (goal two):

> puts Object.ancestors
Object
Anonymous module from /Users/stuart/Desktop/temp.rb 56
Kernel  

Finally, the code that mixes in the module walks the inheritance hierarchy first, printing a warning whenever a name collision is encountered (goal three).

Warning: /Users/stuart/Desktop/temp.rb 64 is attempting to redefine jump
         Originally defined in /Users/stuart/Desktop/temp.rb 56  

The complete implementation is included at the bottom of this post. I am sure it can be improved in several ways, but even in its primitive state it beats a design pattern. As long as the API is decent, we can always make the implementation suck less later.

Should I make a gem out of this? What changes would you like to see in the API? How should the handling of method collisions be specified?

require 'set'
module Embrace
  class <<self
    def check_for_collisions(clazz, module_to_include)
      new_methods = Set.new(module_to_include.instance_methods(false))
      clazz.ancestors.each do |anc|
        anc.instance_methods(false).each do |meth|
          collision(anc, module_to_include, meth) if new_methods.member?(meth)
        end
      end
    end

    def collision(clazz, module_to_include, method)
      puts "Warning: #{module_to_include} is attempting to redefine #{method}"
      puts "         Originally defined in #{clazz}"
    end
  end
end

def embrace(&clazz_block)
  m = Module.new
  file = eval("__FILE__", clazz_block.binding)
  line = eval("__LINE__", clazz_block.binding)
  clazz = clazz_block.call
  meta = class << m; self; end
  meta.class_eval do
    def and_extend(&blk)
      self.class_eval(&blk)
      mixin_to_class
      self
    end
    define_method("mixin_to_class") do
      Embrace.check_for_collisions(clazz, m)
      clazz.class_eval do
        include m
      end
    end
    define_method("to_s") do
      "Anonymous module from #{file} #{line}"
    end
  end   
  m 
end

o = Object.new

embrace{Object}.and_extend do
  def jump
    puts "jumping"
  end
end

o.jump

embrace{Object}.and_extend do
  def jump
    puts "jumping higher"
  end
end

o.jump  

How should metaclass work?

Posted on February 12, 2008 by stu

Facets defines metaclass like this:

def meta_class(&block)
  if block_given?
    (class << self; self; end).class_eval(&block)
  else
    (class << self; self; end)
  end
end
alias_method :metaclass, :meta_class

RSpec defines it this way:

def metaclass
  class << self; self; end
end

I just spent an hour figuring out why some carefully-tested code went no-op after adding RSpec to a project. As a community we need to commit to a standard definition here. What should it be?

Why they fear the meta

Posted on February 11, 2008 by stu

Giles says It's Not Meta, It's Just Programming. Darn straight! The specific example he gives is the ability to add methods to specific instances, instead of to an entire class. As he demonstrates, this is invaluable for isolation when testing.

# written this way to demonstrate eigenclass syntax
class << @response
  def body
    ({:foo => "bar"}.to_xml(:root => "thing"))
  end
end

Combine eigenclass methods with open classes, and almost any idiom can be automated. If you regularly add stubbed instance methods for testing purposes, why not write a helper for just that? For many common tasks, including this one, the work is already done. Here is a more literate version of the above code:

# exact syntax depends on your choice of mocking library
@response.stubs(:body).returns(some_canned_response)

If you are a n00b, this power is scary. Once you start treating code as data, the elegance of your code is dependent on your skill. You cannot hide behind the limitations of your programming language anymore, because there aren't any.

Troubleshooting LoadErrors in Rails tests

Posted on February 08, 2008 by stu

I am proposing a patch to help cope with the dreaded Rails LoadError:

LoadError: Expected foo.rb to define Foo

In Ruby, it is simple to load code, just require it. In script/console:

>> require 'account_controller'
=> ["AccountController"]

Rails extends this to magically find classes just based on their name:

>> AccountController
=> AccountController

Most of the time, if a class does not exist, you get a helpful exception:

>> AccountController
MissingSourceFile: no such file to load -- hpricot_scan

Ah, so my AccountController depends on hpricot, which isn't available for some reason. Solution: go find hpricot.

But once in a while this problem presents a different symptom:

>> AccountController
LoadError: Expected account_controller.rb to define AccountController

This is confusing, since account_controller.rb does define AccountController! Experienced Rails developers know that this cryptic message actually means "Something went wrong in application_controller.rb, but Rails swallowed the real exception." After being bitten by this on three different projects in the last two weeks, I decided to track the issue down. Turns out the problem is in how fixtures get loaded:

begin
  require_dependency file_name
rescue LoadError
  # Let's hope the developer has included it himself
end

After fixtures swallow the real MissingSourceFile for a subdependency such as hpricot, ActiveSupport raises a misleading LoadError for the original dependency (account_controller.rb) that references hpricot.

In a perfect world, I would simply have fixtures stop swallowing LoadErrors. But the comment strongly suggests that some code depends on this behavior. So weaker sauce is to at least log the problem:

def try_to_load_dependency(file_name)
  require_dependency file_name
rescue LoadError => e
  ActiveRecord::Base.logger.warn("Unable to load #{file_name}, underlying cause #{e.message} \n\n #{e.backtrace.join("\n")}")
end

If anybody knows how to distinguish the confusing LoadErrors from the expected ones, please go and improve the patch.

Layering and platform choice

Posted on February 04, 2008 by stu

Over the last few weeks I have repeatedly linked to Ola's post about the stable layer. I didn't take the time to go into detail, and I trusted that people (if they wanted to) could follow the link and understand what Ola was talking about.

Well, that didn't work so well. Most responders clearly did not understand Ola. A few informed me that I didn't understand Ola. :-) So I am going to make a clean break, and lay out my own argument in more detail. What follows are my views about how layered architecture affects language and platform choice. First, some ideas that are hopefully uncontroversial:

  • Good design is layered.
  • Leakage between layers should be minimal.
  • Features within a layer should be orthogonal, and should not have to be re-implemented in higher layers.
  • All kinds of programs benefit from this kind of layering, including languages, libraries, frameworks, and application code.

A small leap:

  • The lowest layers are the most important.

This might seem obvious. All other layers depend on the lower layers, so a problem at the bottom affects a lot of code. But if you are working several layers higher, problems at the bottom are part of the air you breathe. The air may smell terrible, but you are acclimated and don't notice.

A big but uncontroversial leap:

  • Java, the VM, is a good VM for the bottom layer.

This is uncontroversial because the majority has chosen. And they are right to do so: the Java VM is well-specified, widely implemented, carefully optimized, and supported by a huge array of tools.

A mistake:

  • Java, the language, is a good language for the bottom layer.

Noooo! Java is a high-ceremony language. At every turn, Java enforces a high busy-work/real-work ratio. Specifically:

  1. Java's checked exceptions bloat code, make components harder to use and maintain, and lead to tons of boilerplate code, each line of which is a bug-in-waiting.
  2. Java's new operator/constructors cannot pick a return type. The amount of code that exists only to work around this is staggering. Two entire cottage industries have sprung up to deal with this single issue: factory patterns and dependency injection.
  3. Java has no metaprogramming features to automate common tasks such as field accessors, standard constructors, and simple delegation.
  4. Primitives, functions, and classes are not first-class objects, leading to huge code bloat to deal with these types specially.
  5. Java's core reflection and interception capabilities are clunky, requiring tons of bolt-on technologies to make them workable, including AOP, annotations, and code generators.

That's a pretty big stink, but if you are used to it you probably can't smell it anymore.

The net result of these problems is that bottom layer code written in the Java language will be bloated and difficult to maintain. These problems multiply if we use the Java language for higher layers as well. What should we do?

Keeping the VM, avoiding the language

For better or worse, Java is already the bottom layer for many businesses. A complete rewrite is impossible, so we need an approach that lets us continue to use our existing Java code. There are two obvious choices:

  • Use a framework to hide Java's most glaring flaws, and continue to use the Java language for development. The most popular option here is Spring. Spring provides framework-level fixes for several problems in Java: dependency injection, unchecked exception wrappers, and a powerful AOP capability, to name a few.
  • Use a better VM language. There are lots of choices, including Clojure, JRuby, Rhino, and Groovy. All of them can interoperate nicely with existing Java code.

My opinion, based on extensive experience with both options, is that the "better VM language" approach is better than the "fix Java with a framework" approach.Smaller is better.

Some advice

In the past few weeks, I have been approached by several organizations to advise them on platform decisions. Every organization is different, but here are some guidelines to consider.

  • Your team matters far more than your language. Pick a platform at random, then take your platform analysis budget and spend it finding good team members and helping them get better.
  • Your process matters far more than your language. If your team is not delivering real business value on a regular, repeating timeframe, stop worrying about your platform and start worrying about things like estimating, agility, testing, and continuous integration.
  • The static/dynamic languages debate is a red herring. The Java language's problem is ceremony, not static typing. Use whatever combination of static and dynamic typing works for you. [1]
  • There can be more than one! The Java VM simplifies the interop and deployment story. So quit trying to decide, and try a few different JVM languages.
  • The hot new JVM languages have different syntaxes, but similar features. They all solve the problems with Java that I enumerated above. Throw a dart at the wall, pick one, and get started coding.
  • Beware "Use the right tool for the job." This is true, but useless without context, and it is becoming the weapon of choice for pundits who write no code. Be a polyglot, but also be articulate about why tool X is the right fit for job Y.
  • Stop writing plain old Java code. Groovy obsoletes plain old Java. We ought to just say "Java 7 = Groovy" and move on.

Keep an open mind. Try several approaches. Judge your choices by how easy they would be to unmake or adapt. Have fun!

Notes

[1] In the past I have had a lot to say about static/dynamic typing. I realize now that I was trying to talk about ceremony. I am still worried about the same problems, but I think I now know them by more accurate names.

Rails plugin authors on OS X, beware!

Posted on January 31, 2008 by stu

This morning I was troubleshooting a production problem with the simple_localization plugin. The code worked fine in development, had 100% passing C0 coverage in test, and worked fine in production on my local box. But on the staging box, we were getting the dreaded load error:

LoadError: Expected /simple_localization/lib/cached_lang_section_proxy.rb to define CachedLangSectionProxy

If you use Rails plugins and ever see this problem, read on...

A little background

In Ruby, you can load a Ruby source file from the load path by requiring it.

require 'my_class'

This is explicit, and easy to understand. But you might get tired of spelling things out all the time. So in Rails you can also load a class implicitly when it is needed:

MyClass

This is somewhat Java-like, in that magic happens to find the code based on some naming conventions, e.g. My::Namespaced::MyClass should be in a file namedmy/namespaced/my_class.rb somewhere on the load path. It is also Java-like in being difficult to debug, leading to errors like the LoadError above.

Workaround: ducking the issue

Knowing that the LoadError is a failed implicit load, the first step is to look at the point of failure in the file cached_lang_section_proxy. Here is is, elided for clarity:

module ArkanisDevelopment
  module SimpleLocalization
    class CachedLangSectionProxy

Ah hah, you say. The error is right on. This file doesn't define CachedLangSectionProxy, it defines CachedLangSectionProxy in the ArkanisDevelopment::SimpleLocalization module. So implicit loading can't work with the code as written. But we have a workaround: we can move this file (and probably several others) into a directory structure that matches Rails conventions. I am not going to do that, because...

Solution: getting deterministic

We can get implicit loading to work, but we still haven't tackled the real problem. Why did the code ever work on my local box to begin with? We know that implicit loading can't work, so somehow my local box must be explicitly loading the files, but in a machine-dependent way that fails on the staging box.

Rails plugins include an init.rb that runs during Rails startup, and is often used to explicitly load configuration and code. Here is that code from simple_localization:

Dir[File.dirname(__FILE__) + '/lib/*.rb'].each do |lib_file|
  require File.expand_path(lib_file)
end

This is broken, but if you develop on Mac OS X you may never notice. The plugin's internal dependencies are arranged in such a way that loading the files in alphabetical order works. In all of my experiments, Ruby's directory traversal APIs on the Mac return files in alphabetical order. However, this ordering is not required by the Ruby language. On Linux, the files can come back in any order.

Given that many Rails developers work on OS X, and deploy to Linux, this leads to an amusing variant of "It works on my box": It works on all developer boxes, and fails on all production boxes..

An easy fix is to sort the files explicitly:

Dir[File.dirname(__FILE__) + '/lib/*.rb'].sort.each do |lib_file|
  require File.expand_path(lib_file)
end

Better would be to organize init.rb so that the dependencies are clear (the fact that alphabetical order happens to work is a fragile coincidence).

Lessons learned

  1. If you write Rails plugins on Mac OS X, be careful how you use globbing APIs in init.rb. They will work deterministically on your box, but maybe not everywhere else.
  2. If you plan for your Ruby code to be used from Rails, follow the directory and naming conventions.
  3. Loading code is and always will be tricky. Many years ago, I thought that COM had solved many of the problems. I was so enthusiastic about Java's approach that I wrote a book about it. By the time .NET came out with yet another approach, I was a bit jaded and assumed it would have problems. (It did.) It's a hard problem.

On language aesthetics

Java and Ruby both have an explicit and implicit loading story. What is interesting is that in Java this story is implemented in the language, while in Ruby a significant part of the story is in the libraries. It is Rails, not Ruby, that implements implicit loading, and you can read much of that story in this source file (updated link: with syntax highlighting). Understand this file, and you will know much of what is best and worst in Ruby.

EDRY, meet CoC

Posted on January 29, 2008 by stu

Jay Fields has envisioned a beautiful future for software development with his EDRY dialect of Ruby. But what is Enhanced DRY without better CoC (Convention over Configuration)?

I have modified Jay's code to rely more on convention. Why have a distinct vocabulary for fields vs. mixins, when the right thing to do can be inferred from the types involved? The result is some really tight code:

C Enumerable, :first_name, :last_name, :favorite_color do
  d.complete_info? { nd(first_name,last_name) }
  d.white?.red?.blue?.black? { |color| favorite_color.to_s == color.to_s.chop }
end

I am including the full source at the bottom of this entry. Can you make it even DRYer and more convention-driven?

class Object
  def C(*args, &block)
    attrs = args.find_all {|arg| Symbol === arg}
    includes = args.find_all {|inc| inc.instance_of?(Module)}
    name = File.basename(eval("__FILE__", block.binding),".rb")
    klass = Struct.new(name.capitalize, *attrs)
    Kernel.const_set(name.capitalize, klass)
    klass.class_eval(&block)
    klass.send :include, *includes
  end

  def s
    self
  end
end

class Class
  def ctor(&block)
    define_method :initialize, &block
  end

  def i(mod)
    include mod
  end

  def d
    DefineHelper.new(self)
  end

  def a(*args)
    attr_accessor(*args)
  end
end

class DefineHelper
  def initialize(klass)
    @klass = klass 
  end

  def method_stack
    @method_stack ||= []
  end

  def method_missing(sym, *args, &block)
    method_stack << sym
    if block_given?
      method_stack.each do |meth|
        @klass.class_eval do
          define_method meth do
            instance_exec meth, &block
          end
        end
      end
    end
    self
  end
end

# http://eigenclass.org/hiki.rb?instance_exec
module Kernel
  def instance_exec(*args, &block)
    mname = "__instance_exec_#{Thread.current.object_id.abs}_#{object_id.abs}"
    Object.class_eval{ define_method(mname, &block) }
    begin
      ret = send(mname, *args)
    ensure
      Object.class_eval{ undef_method(mname) } rescue nil
    end
    ret
  end
end

def nd(*args)
  args.each {|x| return false unless x}
  true
end

# convention: symbols are attributes, modules are to be included
C Enumerable, :first_name, :last_name, :favorite_color do
  d.complete_info? { nd(first_name,last_name) }
  d.white?.red?.blue?.black? { |color| favorite_color.to_s == color.to_s.chop }
end

Safe by default

Posted on January 29, 2008 by stu

I met Luke Francl at Code Freeze last week, but we only had time to speak for a minute. It was enough to know we are of like mind: security should be on by default. Luke has written a new plugin, xss_terminate. It is inspired by acts_as_sanitized, but it has stricter defaults and more options. Nice.

energizing development:

nobody does it better.