Finding the Best Match

Finding the Best Match

Let's conclude our adventures in the SQL wilderness by combining several of the techniques shown in this chapter and try to select employees on the basis of some rather complex and fuzzy conditions. We want to find, from among our employees, that one member of staff who happens to be the best candidate for a project that requires a range of skills across several different environments (for example, Java, .NET, PHP, and SQL Server). The ideal candidate is a guru in all environments; but if we issue a query asking for the highest skill level everywhere it shall probably return no row. In the absence of the ideal candidate, we are usually left with imperfect candidates, and we must identify someone who has the best competency in as many of our environments as possible and is therefore the best suited for the project. For instance, if our Java guru is a world expert, but knows nothing of PHP, that person is unlikely to be selected.

"Best suited" implies a comparison between the various employees, or, in other words, a sort, from which the winner will emerge. Since we want only one winner, we shall have to limit the output of our list of candidates to the first row. You should already be beginning to see the query as a select ... from (select ... order by) limit 1 or whatever your SQL dialect permits.

The big question is, of course, how we are going to order the employees. Who is going to get the preference between one who has a decent knowledge of three of the specified topics, and one who is an acknowledged guru of two subjects? It is likely, in a case such as we are discussing, that the width of knowledge is what matters more to us than the depth of knowledge. We can use a major sort key on the number of skills from the requirement list that are mastered, and a minor sort key on the sum of the various skill_level values by employee for the skills in the requirement list. Our inner query comes quite naturally:

select e.employee_name,
       count(ss.skill_id) as major_key,
       sum(ss.skill_level) as minor_key
from employees e,
     skillset ss,
     skills s
where s.skill_name in ('JAVA', '.NET', 'PHP', 'SQL SERVER')
  and s.skill_id = ss.skill_id
  and ss.employee_id = e.employee_id
group by e.employee_name
order by 2, 3

This query, however, doesn't tell us anything about the actual skill level of our best candidate. We should therefore combine this query with a double conversion to get an encoding of skills. I leave doing that as an exercise, assuming that you have not yet reached a semi-comatose state.

You should also note, from a performance standpoint, that we need not refer to the employees table in the inner query. The employee name is information that we need only when we display the final result. We should therefore handle only employee_id values, and do the bulk of the processing using the tables skills and skillset. You should also think about the rare situation in which two candidates have exactly the same skillsdo you really want to restrict output to one row?

To paraphrase General Robert E. Lee, "It is well that SQL is so terrible, or we should grow too fond of it."

 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows