Computing Set Operations on Arrays






Computing Set Operations on Arrays

Problem

You want to find the union, intersection, difference, or Cartesian product of two arrays, or the complement of a single array with respect to some universe.

Solution

Array objects have overloaded arithmetic and logical operators to provide the three simplest set operations:

	#Union
	[1,2,3] | [1,4,5]                    # => [1, 2, 3, 4, 5] 

	#Intersection
	[1,2,3] & [1,4,5]                    # => [1]

	#Difference
	[1,2,3] - [1,4,5]                    # => [2, 3]

Set objects overload the same operators, as well as the exclusive-or operator (^).If you already have Arrays, though, it's more efficient to deconstruct the XOR operation into its three component operations.

	require 'set' 
	a = [1,2,3]
	b = [3,4,5]
	a.to_set ^ b.to_set                  # => #<Set: {5, 1, 2, 4}>  
	(a | b) - (a & b)                    # => [1, 2, 4, 5]

Discussion

Set objects are intended to model mathematical sets: where arrays are ordered and can contain duplicate entries, Sets model an unordered collection of unique items. Set not only overrides operators for set operations, it provides English-language aliases for the three most common operators: Set#union, Set#intersection, and Set#difference. An array can only perform a set operation on another array, but a Set can perform a set operation on any Enumerable.

	array = [1,2,3]
	set = [3,4,5].to_s  
	array & set                      # => TypeError: can't convert Set into Array
	set & array                      # => #<Set: {3}>

You might think that Set objects would be optimized for set operations, but they're actually optimized for constant-time membership checks (internally, a Set is based on a hash). Set union is faster when the left-hand object is a Set object, but intersection and difference are significantly faster when both objects are arrays. It's not worth it to convert arrays into Sets just so you can say you performed set operations on Set objects.

The union and intersection set operations remove duplicate entries from arrays. The difference operation does not remove duplicate entries from an array except as part of a subtraction.

	[3,3] & [3,3]                        # => [3]
	[3,3] | [3,3]                        # => [3]
	[1,2,3,3] - [1]                      # => [2, 3, 3]
	[1,2,3,3] - [3]                      # => [1, 2]
	[1,2,3,3] - [2,2,3]                  # => [1]

Complement

If you want the complement of an array with respect to some small universe, create that universe and use the difference operation:

	u = [:red, :orange, :yellow, :green, :blue, :indigo, :violet]
	a = [:red, :blue]
	u - a                    # => [:orange, :yellow, :green, :indigo, :violet]

More often, the relevant universe is infinite (the set of natural numbers)or extremely large (the set of three-letter strings). The best strategy here is to define a generator and use it to iterate through the complement. Be sure to break when you're done; you don't want to iterate over an infinite set.

	def natural_numbers_except(exclude) 
	  exclude_map = {}
	  exclude.each { |x| exclude_map[x] = true }
	  x = 1
	  while true
	    yield x unless exclude_map[x] 
	    x = x.succ
	  end
	end

	natural_numbers_except([2,3,6,7]) do |x|  
	  break if x > 10  
	  puts x  
	end
	# 1
	# 4
	# 5
	# 8
	# 9
	# 10

Cartesian product

To get the Cartesian product of two arrays, write a nested iteration over both lists and append each pair of items to a new array. This code is attached to Enumerable so you can also use it with Sets or any other Enumerable.

	module Enumberable
	  def cartesian(other)
	    res = []
	    each { |x| other.each { |y| res << [x, y] } }  
	    return res
	  end
	end

	[1,2,3].cartesian(["a",5,6])
	# => [[1, "a"], [1, 5], [1, 6],
	#    [2, "a"], [2, 5], [2, 6],
	#    [3, "a"], [3, 5], [3, 6] 

This version uses Enumerable#inject to make the code more concise; however, the original version is more efficient.

	module Enumerable  
	  def cartesian(other)
	    inject([]) { |res, x| other.inject(res) { |res, y| res << [x,y] } }
	  end
	end

See Also

  • See Recipe 2.5, "Generating Random Numbers," for an example (constructing a deck of cards from suits and ranks)that could benefit from a function to calculate the Cartesian product

  • Recipe 2.10, "Multiplying Matrices"



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows