Distributed Filesystems Roundup (first draft)

Thursday, December 20, 2007 by Nate Murray.

I'm currently working on an article titled "Distributed Filesystems Roundup". You can find it here.I'm just learning about these types of systems so there are a few holes in the data. In our case, we were looking for a system with the following requirements:

  • Open-source or free
  • Operates on consumer hardware over ethernet
  • Could scale to minimum of 10TB
  • Recovery if one of the nodes fail
  • Ability to replicate for high-availability
  • Ability to add new nodes on the fly and increase the storage pool easily
  • Prefer ability to run in SuSE Linux
  • Prefer to be able to run in a Virtual Machine using Xen
I hope this article can be an overview and starting point for someone starting to learn about these options.

Labels: ,

convert white to transparent

Thursday, December 06, 2007 by Nate Murray.

I've had to work a lot with ImageMagick recently. Anyone who uses ImageMagick should definately know about Anthony Thyssen's ImageMagick tutorials.

My current project was to convert a white background to be transparent for thousands of product images. This was so that the product could be put on a gradiant background.

Note: because this article uses transparency I have converted the transparent images to be on a check background. The command to do this is:

 composite  -compose Dst_Over -tile pattern:checkerboard in.png out.png

My first attempt was to simply take all the white pixels and convert them to be transparent.

convert -fuzz .2% -transparent white tool.png ugly_fuzz.png

The problem with this one is that there is too much fuzz on the outside. My next attempt was to try to improve the situation by increasing the fuzz.

convert -fuzz 10% -transparent white tool.png removed_too_much.png

The problem with this approach is that this tool (and many others) has highlights that get removed if you just go for a blending of the white pixels. What we need is a way to use the -floodfill directive.

convert tool.png -bordercolor white -border 1x1 -matte -fill none -fuzz 7% -draw 'matte 1,1 floodfill' -shave 1x1 white_floodfill.png

Now thats more like it! Now some people may be satisfied to stop right there, but the pixel-halo really bothers me. I don't mind having some halo, but I'd much rather have something that is softer. The problem is we can't just use a straight blur because that would blur the picture as well.

ImageMagick supports masking. Which is basically a grayscale image that we use to specify the transparency. What we are going to do is create a black and white image of the silhouette of the tool and try remove the background based on that. If we can get the just the silhouette we may be able to blur the mask to give us a smooth edge but a keep a sharp image.

First what we are going to do is take the Difference of our image:

convert tool.png \( +clone -fx 'p{0,0}' \)  -compose Difference  -composite   -modulate 100,0  +matte  difference.png

Then we want to take the black portion and replace it with transparency:

convert difference.png -bordercolor white -border 1x1 -matte -fill none -fuzz 7% -draw 'matte 1,1 floodfill' -shave 1x1 removed_black.png

Then we want to extract the matte channel from this image.

convert removed_black.png -channel matte -separate  +matte matte.png

Then we need to negate the colors so that the mask we want is in the correct direction.

convert matte.png -negate -blur 0x1 matte-negated.png

Then we are ready to compose our matte onto our original image!

composite -compose CopyOpacity matte-negated.png tool.png finished.png

Beautiful.

Here is the full script to generate all of above:

cd images/im/masking
# doesnt work
convert -fuzz .2% -transparent white tool.png ugly_fuzz.png
composite  -compose Dst_Over -tile pattern:checkerboard ugly_fuzz.png ugly_fuzz_check.png

# doesnt work
convert -fuzz 10% -transparent white tool.png removed_too_much.png
composite  -compose Dst_Over -tile pattern:checkerboard removed_too_much.png removed_too_much_check.png

# works okay
convert tool.png -bordercolor white -border 1x1 -matte -fill none -fuzz 7% -draw 'matte 1,1 floodfill' -shave 1x1 white_floodfill.png
composite  -compose Dst_Over -tile pattern:checkerboard white_floodfill.png white_floodfill_check.png

# start real
convert tool.png \( +clone -fx 'p{0,0}' \)  -compose Difference  -composite   -modulate 100,0  +matte  difference.png

# remove the black, replace with transparency
convert difference.png -bordercolor white -border 1x1 -matte -fill none -fuzz 7% -draw 'matte 1,1 floodfill' -shave 1x1 removed_black.png
composite  -compose Dst_Over -tile pattern:checkerboard removed_black.png removed_black_check.png

# create the matte 
convert removed_black.png -channel matte -separate  +matte matte.png

# negate the colors
convert matte.png -negate -blur 0x1 matte-negated.png

# you are going for: white interior, black exterior
composite -compose CopyOpacity matte-negated.png tool.png finished.png
composite  -compose Dst_Over -tile pattern:checkerboard finished.png finished_check.png

Custom YAML Emitter

Friday, March 23, 2007 by Nate Murray.

Rational Numbers

Just recently I needed to store a rational number in a database. YAML is perfect for this sort of thing. Unfortunately there isn’t a built in to_yaml for the standard Rational class.


  require 'yaml'

  rat = Rational(4,3)  # => Rational(4, 3)
  rat.to_s             # => "4/3"

  y = YAML.dump(rat)   # => "--- !ruby/object:Rational 4/3\n"

  back = YAML.load(y)  # => Rational(nil, nil)

Notice that rat gets emitted as a vanilla ruby object with class Rational but then the emitter just converts rat into a string and we get "4/3" appended to the YAML output. Because the YAML parser doesn’t know what do to with the string "4/3" we get back a Rational object but it doesn’t have its numerator or denominator set. We want back to be set to Rational(4, 3), just like the original object.

Register Your Class

What we need to do is register our Rational class with YAML so that it knows how to emit and parse our specific type of object.

We can specify our yaml_type by defining a method to_yaml_type. We then register with YAML by calling YAML::add_domain_type and passing it a block. The YAML parser will then call this block when it tries to emit an object of this matching type.

Notice below that YAML::add_domain_type yields two variables type and val. type is the YAML type we specified with to_yaml_type and the val is the value that was stored during the YAML creation process.



  require 'yaml'
  class Rational
    def to_yaml_type; "!pasadenarb.com,2007-03-23/rational"; end
  end

  YAML::add_domain_type( "pasadenarb.com,2007-03-23", "rational") do  |type, val|
    type                  # => "tag:pasadenarb.com,2007-03-23:rational"
    val                   # => "4/3"
  end

  rat  = Rational(4,3)    # => Rational(4, 3)
  yam  = YAML.dump(rat)   # => "--- !pasadenarb.com,2007-03-23/rational 4/3\n"
  back = YAML.load(yam)   # => "4/3" 



Simple Parsing

Notice here that YAML.load returned the value returned by the block we passed add_domain_type. In this case it is val("4/3"). We are a little closer to our goal, but back is still not a Rational, it’s a String. What we need to do is improve on the block we are passing to add_domain_type.

We are getting the string "4/3" in val so we can derive the numerator and denominator from that string and then return a Rational number from that string.


  require 'yaml'
  class Rational
    def to_yaml_type; "!pasadenarb.com,2007-03-23/rational"; end
  end

  YAML::add_domain_type( "pasadenarb.com,2007-03-23", "rational") do  |type, val|
    num, den = val.split("\/")       # => ["4", "3"]
    Rational(num.to_i, den.to_i)
  end

  rat  = Rational(4,3)     # => Rational(4, 3)
  yam  = YAML.dump(rat)    # => "--- !pasadenarb.com,2007-03-23/rational 4/3\n"
  back = YAML.load(yam)    # => Rational(4, 3)


Notice that back is Rational(4, 3), just as we originally wanted.

However this method is not as perfect as it could be. In this case we are able to derive the attributes we need pretty easily, but what if we had a more complicated object that did not store all of its attributes if you call #to_s on the object? What we need is more control over the YAML creation process. Thankfully that power is available by creating our own #to_yaml method.

Advanced Emitting

If you look at the #to_yaml method below you will see that we are iterating through the instance_variables and setting the key to be the instance variable name and the value is the instance variable value.

Then when we need to create the Rational number from that we just grab the hash keys from val.


  require 'yaml'
  class Rational

    def to_yaml_type; "!pasadenarb.com,2007-03-23/rational"; end

    def to_yaml( opts = {} )
      YAML.quick_emit( self.object_id, opts ) { |out|
        out.map( taguri, to_yaml_style ) { |map|
          instance_variables.sort.each { |iv|
            map.add( iv[1..-1], instance_eval( iv ) )
          }
        }
      }
    end

  end

  YAML::add_domain_type( "pasadenarb.com,2007-03-23", "rational") do  |type, val|
    num, den = val["numerator"], val["denominator"]  # => [4, 3]
    Rational(num.to_i, den.to_i)
  end

  rat  = Rational(4,3)    # => Rational(4, 3)
  yam  = YAML.dump(rat)   # => "--- !pasadenarb.com,2007-03-23/rational \ndenominator: 3\nnumerator: 4\n"
  back = YAML.load(yam)   # => Rational(4, 3)


Conclusion

As you can see YAML is a very powerful way to get complex objects into strings. There are a few other shortcuts to get custom objects into YAML such as defining #to_yaml_properties. If you are interested in doing something simple, I’d start by looking here.

Labels: , ,

Introduction to Bindings

Tuesday, March 20, 2007 by Nate Murray.

The Pixaxe book defines Binding objects to:

encapsulate the execution context at some particular place in the code and retain this context for future use.

You can get a Binding for the current context by calling Kernel#binding.

The Binding stores information about the variables, methods, and self and you can access them by passing the Binding to eval.


  class Product
    def set_title(title)
      @title = title
    end

    def get_binding
      binding
    end
  end

  p = Product.new
  p.set_title("nice and shiny")

  q = Product.new
  q.set_title("old and ugly")

  eval "@title", p.get_binding    # => "nice and shiny"
  eval "@title", q.get_binding    # => "old and ugly"

You can see here that @title gets evaluated differently depending on the binding. The first eval returns "nice and shiny" because that is the value of @title for the first Product p.

Blocks and Procs

Blocks carry information about their Binding.



  a = "inside a"
  a_block = lambda { a }

  def try_to_set_a(block)
    a = "resetting a"
    block.call
  end

  try_to_set_a(a_block)          # => "inside a"


Notice here that a is "inside a" and not "resetting a". This is beacuse a block stores the variables as they were originally defined. The a in try_to_set_a does not interfere with the a in a_block.

An interesting note is that you can redefine variable within a Binding.

  a = "inside a"
  a_block = lambda { a }

  def try_to_set_a(block)
    a = "resetting a"
    block.call
  end

  eval "a = 'something else'"
  try_to_set_a(a_block)          # => "something else"

This is because the Binding in this case is the top-level binding which happens to be the same binding in which a was defined in originally.

Practical Use

Bindings are often used when evaluating ERB. (For those of you who don’t know, ERB is a template system that is included in the Ruby Standard Library.)

ERB#result takes a Binding object as its argument and the variables in the ERB template are evaluated in this context.

Going back to our Product example from earlier, lets see how we can use the Product’s bindings in this fashon:

  require 'erb'

  class Product
    def set_title(title)
      @title = title
    end

    def set_cost(cost)
      @cost = cost
    end

    def get_binding
      binding
    end
  end

  p = Product.new
  p.set_title("nice and shiny")
  p.set_cost("19.95")

  q = Product.new
  q.set_title("old and ugly")
  q.set_cost("230.00")

  template = ERB.new <<-EO_ERB
    == Invoice
    Title: <%= @title %>
    Cost:  <%= @cost  %>
  EO_ERB

  template.result(p.get_binding)   # => "  == Invoice\n  Title: nice and shiny\n  Cost:  19.95\n"
  template.result(q.get_binding)   # => "  == Invoice\n  Title: old and ugly\n  Cost:  230.00\n"

Conclusion

As you can see Binding is a very handy object but this article serves as only an introduction to the subject. Here are a couple articles that deal with binding a little more in-depth.

Jim Weirich’s Variable Bindings in Ruby
Pick Axe page on Binding

Labels: , ,

6 Ways to Run Shell Commands in Ruby

Tuesday, March 13, 2007 by Nate Murray.

Often times we want to interact with the operating system or run shell commands from within Ruby. Ruby provides a number of ways for us to perform this task.

Exec

Kernel#exec (or simply exec) replaces the current process by running the given command For example:


  $ irb
  >> exec 'echo "hello $HOSTNAME"'
  hello nate.local
  $

Notice how exec replaces the irb process is with the echo command which then exits. Because the Ruby effectively ends this method has only limited use. The major drawback is that you have no knowledge of the success or failure of the command from your Ruby script.

System

The system command operates similarly but the system command runs in a subshell instead of replacing the current process. system gives us a little more information than exec in that it returns true if the command ran successfully and false otherwise.


  $ irb             
  >> system 'echo "hello $HOSTNAME"'
  hello nate.local
  => true
  >> system 'false' 
  => false
  >> puts $?
  256
  => nil
  >> 

system sets the global variable $? to the exit status of the process. Notice that we have the exit status of the false command (which always exits with a non-zero code). Checking the exit code gives us the opportunity to raise an exception or retry our command.

System is great if all we want to know is “Was my command successful or not?” However, often times we want to capture the output of the command and then use that value in our program.

Backticks (`)

Backticks (also called “backquotes”) runs the command in a subshell and returns the standard output from that command.


  $ irb
  >> today = `date`
  => "Mon Mar 12 18:15:35 PDT 2007\n" 
  >> $?
  => #<Process::Status: pid=25827,exited(0)>
  >> $?.to_i
  => 0

This is probably the most commonly used and widely known method to run commands in a subshell. As you can see, this is very useful in that it returns the output of the command and then we can use it like any other string.

Notice that $? is not simply an integer of the return status but actually a Process::Status object. We have not only the exit status but also the process id. Process::Status#to_i gives us the exit status as an integer (and #to_s gives us the exit status as a string).

One consequence of using backticks is that we only get the standard output (stdout) of this command but we do not get the standard error (stderr). In this example we run a Perl script which outputs a string to stderr.


  $ irb
  >> warning = `perl -e "warn 'dust in the wind'"`
  dust in the wind at -e line 1.
  => "" 
  >> puts warning

  => nil

Notice that the variable warning doesn’t get set! When we warn in Perl this is output on stderr which is not captured by backticks.

IO#popen

IO#popen is another way to run a command in a subprocess. popen gives you a bit more control in that the subprocess standard input and standard output are both connected to the IO object.


  $ irb
  >> IO.popen("date") { |f| puts f.gets }
  Mon Mar 12 18:58:56 PDT 2007
  => nil

While IO#popen is nice, I typically use Open3#popen3 when I need this level of granularity.

Open3#popen3

The Ruby standard library includes the class Open3. It’s easy to use and returns stdin, stdout and stderr. In this example, lets use the interactive command dc. dc is reverse-polish calculator that reads from stdin. In this example we will push two numbers and an operator onto the stack. Then we use p to print out the result of the operator operating on the two numbers. Below we push on 5, 10 and + and get a response of 15\n to stdout.


  $ irb
  >> stdin, stdout, stderr = Open3.popen3('dc') 
  => [#<IO:0x6e5474>, #<IO:0x6e5438>, #<IO:0x6e53d4>]
  >> stdin.puts(5)
  => nil
  >> stdin.puts(10)
  => nil
  >> stdin.puts("+")
  => nil
  >> stdin.puts("p")
  => nil
  >> stdout.gets
  => "15\n" 

Notice that with this command we not only read the output of the command but we also write to the stdin of the command. This allows us a great deal of flexibility in that we can interact with the command if needed.

popen3 will also give us the stderr if we need it.


  # (irb continued...)
  >> stdin.puts("asdfasdfasdfasdf")
  => nil
  >> stderr.gets
  => "dc: stack empty\n" 

However, there is a shortcoming with popen3 in ruby 1.8.5 in that it doesn’t return the proper exit status in $?.


  $ irb
  >> require "open3" 
  => true
  >> stdin, stdout, stderr = Open3.popen3('false')
  => [#<IO:0x6f39c0>, #<IO:0x6f3984>, #<IO:0x6f3920>]
  >> $?
  => #<Process::Status: pid=26285,exited(0)>
  >> $?.to_i
  => 0

0? false is supposed to return a non-zero exit status! It is this shortcoming that brings us to Open4.

Open4#popen4

Open4#popen4 is a Ruby Gem put together by Ara Howard. It operates similarly to open3 except that we can get the exit status from the program. popen4 returns a process id for the subshell and we can get the exit status from that waiting on that process. (You will need to do a gem instal open4 to use this.)


  $ irb
  >> require "open4" 
  => true
  >> pid, stdin, stdout, stderr = Open4::popen4 "false" 
  => [26327, #<IO:0x6dff24>, #<IO:0x6dfee8>, #<IO:0x6dfe84>]
  >> $?
  => nil
  >> pid
  => 26327
  >> ignored, status = Process::waitpid2 pid
  => [26327, #<Process::Status: pid=26327,exited(1)>]
  >> status.to_i
  => 256

A nice feature is that you can call popen4 as a block and it will automatically wait for the return status.


  $ irb
  >> require "open4" 
  => true
  >> status = Open4::popen4("false") do |pid, stdin, stdout, stderr|
  ?>            puts "PID #{pid}" 
  >>          end
  PID 26598
  => #<Process::Status: pid=26598,exited(1)>
  >> puts status
  256
  => nil

Please send comments and revision suggestions to Nate Murray
$Id$ Tue Mar 13 07:45:42 PDT 2007

Labels: ,

Testing Private Methods

Friday, March 09, 2007 by Nate Murray.

This probably isn't news to most of you, but it might help someone. Sometimes you want to test private methods. If you want you can just set the method to be public from within a #class_eval. Then call it in your test. For example:

def test_private_method
  product = products(:first) # grab our fixture
  product.class.class_eval do
    public :some_private_method
  end

  assert product.some_private_method
end

Labels: , ,

Directory Trees

Tuesday, March 06, 2007 by Nate Murray.

Below is a code snippet for putting directory tree into a data structure. Basically what I wanted was for each folder to be a hash with the key being the folder name and the value was an array of the files and folders it contains. For example: The folders:

       content/policy
       content/policy/privacy_policy.txt
       content/policy/about_us.txt
       content/policy/mean_policy
       content/policy/mean_policy/nice_people.txt
       content/policy/mean_policy/mean_people.txt
       content/index.txt
       content/content
       content/content/misc
Creates the structure:
      {"content"=>
         [{"content"=>[{"misc"=>[]}]},
           "index.txt",
          {"policy"=>
            ["about_us.txt",
            {"mean_policy"=>["mean_people.txt", "nice_people.txt"]},
             "privacy_policy.txt"]}]}
The recursive code snippet is posted below:
    def content_files_in_dir(dir, results = {}, opts = {})
      return nil unless File.exist?(dir)
      entries = Dir.entries(dir).delete_if { |f| f =~ /^\./ }

      key = File.basename(dir)
      values = []

      entries.each do |entry|
        full_entry = File.join(dir, entry)
        values << ( File.directory?(full_entry) ?
          content_files_in_dir(full_entry, results, opts) :
          entry )
      end
      { key => values }
    end

Labels: , ,

Using Parameters as Default Parameters

by Nate Murray.

I noticed something interesting about arguments in parameters today. You can actually use default parameters in data structures in other default parameters. For instance:

[nathan@nate ~]$ irb
>> def foo(arg1, arg2 = [arg1])
>>   puts arg1.inspect
>>   puts arg2.inspect
>> end
=> nil
>> foo 3
3
[3]
=> nil

Labels: , ,

Who we are:
The Pasadena Ruby Brigade is a group of programmers who are interested in Ruby. We enjoy sharing our ideas and code with anyone who is interested in this wonderful programming language.

Who can join:
Anyone! Just head over to our mailing list and drop us an email.

What we do:
We recently started a project over at RubyForge. This project is a group of programs written and maintained by our members that we thought could be beneficial to the whole community.

Projects

Downloads

Recent Posts

Archives