Advanced Web Site Search with SQL

I was recently working on a custom content management system for a client and I was adding a search box to the site to allow users to search the client’s site. Because all the content is contained within a database, I can easily search through the pages table to grab any pages that have the search terms contained within their title, content, or keywords meta tag. I wanted to make it a little more robust than simply doing a select statement and grabbing any pages with the search term and I wanted to also rank the results based on a relevance factor determined by where the search words were found.

I was recently putting together a custom search form for a client, and wanted to improve on it from the basic “SELECT title FROM page WHERE title LIKE ‘%search-query%’” method of searching a site. I wanted to include some more relevance to the search so pages that have the search words in their title were a little more relevant, as well as pages with the search term in the filename, and keywords meta tag as well as the body.

I hate most search forms on web sites simply because they’re so horrible. Many of the sites I’ve worked on with pre-existing search components have just done a simple select statement that grabs any page from the database that has the search word in the body of the content. They don’t take into account the page title, keywords, filename, along with the content and it can often make for less-than-intelligent search results. This has even led to many site owners installing a google search box on their site, however, for the kind of projects I take on, I like to custom-build as much as I can for design flexibility and future maintainability.

Search is Important

Search is also very overlooked by site owners. They’ll setup something like google, or atomz, or htDig, and not really test the relevance of the search to see if the search results as relevant as they should be. Even google’s custom site search isn’t the most relevant when you’re site isn’t optimized accordingly, and when you’re clients are in control of the content you can’t ensure that the site is optimized for google – making even google site search less than favorable.

The search function in an e-commerce site is even more important than a content site as you’ll find that the search box is the primary method for users to find product on your site. That’s why Amazon’s search is so prominent on their web site.

Relevancy-based Search

Just as google considers so many factors in their search results, you should consider more than just the content of the page. In the case of my custom CMS, I’m allowing users to add keywords, descriptions, titles and other standard META tags and I’ll base my search results off each of those fields as well as the content of the page. Some of the search components I’ve seen have used multiple database queries to grab all the pages where the search query matched the title, another query to grab the pages that matched the keywords, another for the description, and finally one more for the content. Then moving those results into a bunch of arrays and applying some more logic to determine how relevant those results are and send them to the search results page. There isn’t anything particularly wrong with this, but I like to do things in the least number of database connections as possible, and to keep it as simple as possible from the code side of things.

Searching Based on Multiple Fields

Here’s the sql statement I’ve used – I’ll explain it below:

SELECT title, filename, sum(relevance)
FROM (
SELECT title, filename, 10 AS relevance FROM page WHERE title like ‘%about%’
UNION
SELECT title, filename, 7 AS relevance FROM page WHERE filename like ‘%about%’
UNION
SELECT title, filename, 5 AS relevance FROM page WHERE keywords like ‘%about%’
UNION
SELECT title, filename, 2 AS relevance FROM page WHERE description like ‘%about%’
) results
GROUP BY title, filename
ORDER BY relevance desc;

The sql is a little more complex than a basic select statement but if you’re familiar with unions then it’s pretty simple. All I’ve done is created a number of select statements that grab the pages where the field matches the search term. In this case, the search term is "about". I’ve searched through the title, filename, keywords, and description fields and left the content field off the search results. I didn’t want to include the content because I decided if the search isn’t found in the title, filename, keywords or description then it really isn’t very relevant for the search and even if it does have it in the content, it’s most likely a minor part of the content and not too helpful for the user.

I’ve manually included a relevance field in each of these select statements assigning a "point-value" to each page in those results in order to weight certain fields. These relevance points are totally up to you, and for the purposes of this example, I’ve given the results 10 points if the search word is in the title, 7 if it’s in the filename, 5 if it’s in the keywords, and 2 if it’s in the description.

Next, I union all of those search results together to create a large list of results. However, if a page appears in more than one of those select statements then we need to group them, so using a subquery, I’ve summed the relevance fields and then grouped by the filename and title then ordering the results descending by relevance.

Search Results

title filename sum(relevance)
About the Chamber about-chamber 17
About our Programs programs 10
Policies policies 2
Opportunities opportunities 2

Here the about page came up number one because it had the word about in the filename as well as in the title, and the programs page came in second since the word about was in the title, but not in the filename. There were some other pages that had the word in the description so they were only given 2 points. Obviously a search for the word about isn’t going to be very helpful, but to outline this example I think it worked well. Now I just need to simply output these results from mysql with php and the search results are done. I grabbed the title to output on the page and then filename to link to from the search results page.

Relevance Percentage

Rather than outputting the actual relevance score sometimes you’ll see a percentage of how relevant the page is. In our case that’s easy to do as well. Based on our formula of 10 points for the title, 7 for the filename, 5 for keywords, and 2 for description it’s possible to get a total of 24 points. To get the pages percentage relevance we just divide the relevance score from our query by 24 and then multiple by 100. For example, 17/24*100=70% so our about page was 70% relevant.

Tweaking for Better Search Results

You will need to tweak this formula as you test your search functionality, and adjust the weighting for each field to get the best results. You might also want to add another select statement that grabs all pages where the title is exactly the same as your search – in that case you might want to give them 100 points to ensure that that page is weighted much higher than the others. This will help you tweak your search results for the best user experience.

Full Text Search

I’ve provided the examples using the LIKE clause simply because most users will already be familiar with it, but also because the fields that I’m searching are relatively small – I wasn’t searching the entire contents field of the pages on the site. If I wanted to add the content field, I’d probably use full-text searching using the match() function, you can read up on that here.

Keyword Research Is Not Just For Search Engine Marketers

Keyword research is one of a number of skills that any decent SEO or search marketer has to be good at in order to be successful. But are they the only ones who should be aware of keyword research tools like Keyword Discovery and Wordtracker?

Information architects, content writers, and usability professionals are just a few job titles, aside from SEOs, that need to be using tools like Wordtracker and Keyword Discovery to conduct a core component of their services.

Keyword Research for Usability

A big part of usability is making it easy for users to find what they’re looking for on your site, and navigate to the content they are trying to reach. As a usability expert you need to know how users think and how they’ll navigate and "use" your site. Keyword research is arguably the best way of finding out how users think when navigating the web. How do people search for your product or service? Do they commonly use industry acronyms and slang or is that left to industry professionals? As a usability consultant you’ll need to find this information when planning the links, content, and navigation of the site. Keyword research tools are among the best tools for this sort of research.

Keyword Research for Copywriting

A vital skill in copywriting is being able to communicate and speak to your user in language they use and understand. You need to be able to connect with them and make them think while they’re reading the content on your web site. If the copy on your site is going over their head, then it won’t work. Don’t use complicated or technical language that your users won’t recognize when you can keep it simple and write your content in their words. Once you work with a keyword research tool, you may find that a lot of people do search for your product or services using technically-advanced words and slang and that means you can use those types of words in your content, but most often you’ll find that you need to keep things simple and not try to impress your users with how well your content is written, but how well it speaks to them.

A lot of writers just like to write incorporating college-level vocabulary into their work and over-using adjectives that can really lose the attention of your audience on the web. The web is a unique medium not just because of how you navigate it, but also because of the attention span of a typical user. Most web users are inpatient and need to be sold to or communicated with quickly. Don’t drag out your message with content that doesn’t connect to your users immediately, in their language.

Keyword Research for Information Architects

How do you name the different sections of your site? Everyone has an "About Us" page on their site that provides background information on the company. Sometimes this page is called "About Us", "Company", "Profile" or "Firm". How do you choose which term to use? Do some keyword research.

In the web industry, I’ve found that people search for words like "web design company" much more than "web design firm" and rarely do people search for web development companies using the word "about" in their search phrase – so maybe it’s best to use the word "company" in your main navigation and refer to yourself as a "company" as opposed to a "firm" or "agency". Doing your keyword research will point this out to you and will help you make decisions on what keywords to use on your site and how to incorporate them into your navigation.

Bringing It All Together

As you start looking at this keyword research and incorporating language into your content, your internal links, your navigation, and overall site architecture you’ll find that a large part of your optimization is complete. Your SEO tasks will then consist of crafting title and META tags, creating internal links and doing some of the technical work like setting up redirects and rewriting, removing duplicate content, and ensuring that the site is crawlable - but much of the editing and fine tuning of content will be completed. Leaving less work for you as an SEO.

This also shows the advantages of having skills in each of these areas of web development. A lot of programmers have really bad usability skills, and a lot of writers, well, write a lot. Being a great SEO means having a strong knowledge of usability, writing, and content architecture and keyword research tools are a strong asset in every aspect of developing your web site strategy.

Linux Administrators Need to Know PHP

A couple years ago I was doing some work on a client’s web site on a shared hosting environment at a small local ISP and I was setting up some file uploads with PHP and was having problems getting permissions setup and enabling them on the server. I had done it many times before but for some reason just couldn’t get it to work. I resorted to calling the ISP to ask if they had any non-standard setup or configuration on their server that would be causing me problems.

My Own Experiences With System Administrators

I was talking to their system administrator and was asking why I wasn’t able to do any file uploads. He took a few minutes and simply went to PHP.net and found their “handling file uploads in php” document and began reading it to me. I told him I had done it before and had never had any problems, but this was the first time I was trying to do so on their servers. He said to me “I’m just a system admin, I don’t know anything about PHP.” I was kind of shocked. I thought it was pretty important to have at least a basic knowledge of PHP if you’re going to be managing any number of Linux servers that run PHP.

This same thing has happened with another client’s site that has hosting elsewhere from my servers. Anytime their server is upgraded they recompile PHP and forget to enable the GD library. Then the back-end of the client’s site stops working and they contact me for help. After a simple test on the back end I can tell them that the GD library isn’t installed and they need to talk to their system administrator.

One other example was when I had to work on a large web site for a company that hosted their site internally. Their admins had been running Unix servers for years, but when I asked them if they could upgrade to PHP 5 from PHP4 they were really cautious about doing so. It’s not that they had many PHP applications on their server, but more that they were unsure how to do it and wanted to take a couple weeks to learn about what PHP was. That seemed a little odd to me.

Linux System Administrator Job Postings Not Even Requiring PHP

I even see a lot of job postings for Linux/BSD system admins and they don’t mention any requirement to have experience with PHP – sometimes I don’t even see PERL! Of course, it might be assumed to have that knowledge, but when you list things like experience with Sendmail, Apache, MySQL and SSH they should probably be including PHP and PERL in the mix as well.

Why Is It Important?

There probably isn’t much explanation necessary of why PHP should be required skill but when you need to setup PHP modules or be in charge of a server running PHP you need to have a basic understanding of the language. As a system admin, the server security is up to you, and if you aren’t aware of security exploits within PHP or at least know enough to secure your PHP installation – things like register_globals, allow_url_fopen, and magic_quotes – then you’re really not doing your proper research and shouldn’t be installing it on your server.

This goes for any module or program you install on your server. If you are in charge of the server you need to be knowledgeable of everything you put on it, and if you’re going to be setting up PHP, PERL, Sendmail, or an FTP program you need to be aware of their security implications. That’s your job as a system administrator, and it can be really frustrating having to deal with a system admin for a server you’re managing a web site on that can’t even talk to you about configuring the PHP installation because it’s over their head.

Freelance Developers Create Better Software

Jonathan Snook, who I’m a huge fan on, just wrote a post about freelance developers being a recipe for poor software. After reading about the planning and development methodologies of NASA he talks about how freelancers may not be able to create products that are of as high quality as they should be. Although I agree that some freelancers can create poor software – real programmers create higher quality code as freelancers in my mind.

Freelancers Should Create Better Code

I’ve been a freelance web developer for almost 6 years, full-time, and earning my entire living that way. The nature of freelancing means that 90% of my projects are solely completed on my own. I’ve found that my code quality has improved quite significantly over my previous work for an employer. When I worked for someone else, I didn’t take as much ownership in the project as I do now. That’s not to say I didn’t take pride in my work, but until you actually work for yourself and make all the decisions it’s hard to understand the difference from working for a pay cheque and working for yourself. I find I take much more time on improving my own skills, and learning best practices. I also keep in mind that I’m the one maintaining this product in the future which makes me document and test my work exhaustively to relieve headaches down the road as best I can.

The main point I feel Jonathan is making is that we still need to make testing a priority just like NASA but on the scale of our own projects. But my own testing, and my client’s testing and review is always more than enough and I rarely have bugs come back in projects any more than a month after they are completed, even within that month any major bugs are really rare.

To walk through the points that Jonathan makes in his post, I just think comparing NASA projects to the type of work most freelancers do is a little too much.

Planning Varies Per Project

First, Jonathan talks about the planning that is so important in the projects NASA takes on, and how it takes up about 30% of their time. I’m not surprised about that at all, when you’re shooting billion dollar rockets into space you better have it planned out pretty well. He implies that planning is left out by most freelancers, and I would have to agree about that, but the amount of planning required is really relative to the project. In my case, I’ve done significant planning on each project I take on, however, it’s probably never been more than 5-10% of the total project. That’s because I’m building web sites, not shooting rockets into outer space. I don’t mean to be harsh, but the planning aspect is solely up to the developer and client – and really is there just to clarify scope and eliminate scope creep down the road.

"Freelance" Does Not Always Mean You’re Not a Part of a Team

Next Jonathan talks about is the rivalry between the developers and the testing department as NASA and how they compete to find bugs in software. The developers want to find it first and the QA department wants to find it to "win the competition" I think this rivalry is great, and in any instance where I’ve worked with a QA department I’ve loved the "competition" that came along with it. However, 80% of my projects don’t need much testing aside from my own developer testing and QA and some client review – which isn’t to say that the projects I work on aren’t mission critical or complicated. But some of my large projects are outsourced from larger firms or companies in which I still work with a QA department who’s job is to find bugs, vulnerabilities, and other flaws in my work. Even though I’m a freelancer, I still gain the advantages of working with a QA department. Whether it’s necessary is dependent on the project.

Code Control and Bug Tracking

Jonathan mentions the version control and bug tracking databases that NASA uses, and talks about how most freelancers don’t use code control (CVS, Subversion) or bug tracking tools enough. Again, I totally agree with this, but in a lot of cases, freelancers don’t necessarily need advanced tools like subversion or bug tracking software. I don’t use any bug tracking software at all. All I use is a spreadsheet with a list of bugs – or even a pen and paper – to write down any bugs I find in my code. As far as code control goes, I use subversion, but not because I have a team that I need to correspond with and control access to code, it’s just because I want to keep a backup of any past changes I’ve made to rollback to past versions. I actually just started using it a couple months ago and haven’t really had a need for it until now that I’ve been taking on a couple really large projects. I don’t use subversion for 90% of the projects I work on. The reason is that as a freelancer, you are in charge of everything, and you don’t need to have a full blown sophisticated bug tracking system like trac or bugzilla. Even code control with subversion is totally unnecessary for a lot of my projects.

I think that it’s hard to compare a project at NASA to a project that a typical freelancer works on. I’ve worked on mission-critical intranets that run some of my client’s businesses. Things like inventory, invoicing, employee calendars, client directories and CRM systems that my clients’ rely on to run their business, and they’re nowhere near the complexity of a project NASA would be taking on. To compare their methodologies to those of a freelancer is a little overkill – I would never consider going through that process for nearly many of my projects.

The projects I take on that would require a separate QA department are projects where I’m a part of a team that includes the QA department, but those projects are not something a freelancer would take on themselves. I’m still a freelancer but I’m also working with a QA department to compete with just as the NASA developers do.

How to Sell Yourself in 60 Seconds

Sales is something that every freelancer has to deal with, unless you launch your career with a range of stable, consistent client work then you’re going to have to go out and find it yourself. There are many ways to find work, but going out to business socials or mixers, and other networking events is the best way to actually meet new business people, and to find new work. For some beginner or inexperienced freelancers who aren’t used to networking it can be really intimidating to go out to some of these events. These tips will help you become more comfortable and even help you find some business.

Just like when you have someone come to your web site, your first impression is very important. So you need to be ready when someone says to you, “so tell me what you do”. The best way to hurt your professionalism is to answer them by stumbling through your work using words like HTML, CSS, and other acronyms or being too simple and just saying “I build web sites.” That person just gave you an opportunity to sell yourself to them, whether they need a web site or not, they probably know someone that needs one, and you at least want them to tell others about you should they hear about someone else needing a web site. This leads to more word-of-mouth referrals – the most powerful referral you’ll ever receive as a business person.

Anatomy of an Elevator Pitch or 60 Second Infomercial

Most importantly, you need to keep your infomercial short, simple, and to the point, but interesting enough to ensure that you keep their attention, and remember you. Remember, the person you’re talking to is going to talk to a lot of people that night, and collect a lot of business cards.

Part One - the Introduction

The first step is to actually introduce yourself. Give your name, job title, and the name of your business. For example, “Hello I’m Jarrod Goddard, President of Net Shift Media.” That’s it.

Part Two - Services Overview

Next you want one line that describes what you do, you can then quickly explain them in a sentence or two. Don’t get too technical or descriptive because you’ll lose them, but try come up with something to hook them and pique their interest in what you do. “We’re an Internet marketing and development company but we don’t just build web sites, we help our clients plan their web strategy and create a web presence to sell their services or products online.”

Part Three - The Story

Follow that up with a story, and drop a couple names of some of your big clients. For example, “One client of ours, ABC Company, sells tens of thousands of dollars a month in widgets directly through their web site – we handle the design and implementation of their site as well as their online marketing and search engine optimization to provide them with a complete web strategy. We do the same for XYZ Corp, and National Widgets Ltd.” Depending on who you’re talking to and what sort of event you’re at, swap in and out some of your bigger clients. For example, when I’m at a local networking event, I drop the names of the local agencies I work with such as our local chamber of commerce, economic development agency, tourism marketing agency, and a few larger corporations. This shows that you have some big clients, but also a lot of local clients - which is important to local businesses.

Part Four - Ask for Business

This can be the most intimidating part of your infomercial, but let’s be honest, every one is there to make contacts and get new business. You don’t want to be too direct and ask them for business and say “do you need a web site?” but talk about the type of clients you want to work with and say something like “a great client for us is a large product manufacturing company that can ship their product worldwide, we can help them not only reach a much larger audience on the web but help them find new opportunities to sell their product.” That tells the person what type of client you are looking for, and also shows that even though a company may already have an existing web site and sell product online, you can still assist them in finding new customers through the web. The key to remember here is that as soon as you mention the type of client (large product manufacturing company) the person you talk will automaticall start thinking of anyone they know who fits that description, if you’re keeping them interested, they just may give your their name and contact information on the spot.

Part Five - Your Hook or Tag Line

Finally, you don’t want to drag on about yourself, so end your pitch with your memory hook or tag line for your business. Don’t just keep it plain like “we’re a full service web company”, but make it humorous, or witty so the person remembers it. In my case I always say “I make really cool stuff.” Probably because of my youth a lot of people get a kick out of it just because I use the word “cool”, but I’m just being honest – “I make really cool stuff, check out my web site to see!”

The Combined Pitch

Here is the script, customize it to suit your business and your style.

Hello, I’m Jarrod Goddard, President of Net Shift Media. We’re an Internet marketing and development company but we don’t just build web sites, we help our clients plan their web strategy and create a web presence to sell their services or products online. One client of ours, ABC Company, sells tens of thousands of dollars a month in widgets directly through their web site – we handle the design and implementation of their site as well as their online marketing and search engine optimization to provide them with a complete web strategy. We do the same for XYZ Corp, and National Widgets Ltd. A great client for us is a large product manufacturing company that can ship their product worldwide, we can help them not only reach a much larger audience on the web but help them find new opportunities to sell their product. We make some really cool web sites.

Say that out loud, it’s probably only about 30 seconds, but you covered every aspect of your business and explained exactly what type of client you’re looking for and gave some examples to back up your claims. If the person you are talking to even knows one of the clients you worked with they’ll probably say so and you can talk about how great they are to work with to keep the conversation going further and lead into other areas.

More Tips

Don’t be too self-involved. One big piece of advice is to always ask about what the other person does first. You don’t want to come off as arrogant and always wanting to talk about yourself. When you do introduce yourself ask “so tell me about what you do?” and ask them about their business. Most likely they’ll ask you the same once they introduce themselves and their business and then you can launch into your own business.

Bring a friend. When I first started out I had a couple other friends who had businesses and I always went to these events with at least one of them. You don’t want to go with a bunch of friends because you won’t talk to anyone else as much, but you can then work together and introduce each other to your contacts that you recognize there and even talk to a couple people at once making it a little more casual – you still want to socialize and have fun.

Always end with your business card. Finally, bring lots of business cards with you and give a card to each person you talk to. That way they have something they can pass on to someone else they might know, or keep it for themselves. Sometimes if the conversation went well enough they’ll ask for a couple cards.

It gets easier. A lot of the same people go to local networking events, so you’ll find that as you go to them month after month, you’ll start to get to know a lot of the people there. It will make it easier to talk and make conversation, and it can lead to those people introducing you to even more people.

Follow up. Always get the business card of the people you talk to so you can send them a quick e-mail saying it was nice to meet them – if they gave you a hard lead then be sure to phone them if you need more information, or just to say thanks.

There is a ton of information on the web about business networking, just google “business networking tips” and you’ll find a lot, or you can check out Freelance Folder.