Sunday, June 19, 2016

Drive WebsiteTraffic with Google+

   


Google+ is popular as Facebook and Twitter but it’s definitely something that you can use to boost your traffic.
Here’s what you can do to use Google+ to drive traffic to your blog:
Create an Impressive Profile. The idea here is to present yourself as one of your potential clients so they can easily relate to you. List down your hobbies, your professional experience, your areas of expertise, the things that you do, etc. So make sure you create an impressive profile before you start mingling with other people so you can attract more attention.
Connect with People. Once you’ve created your Google+ profile/page, you need to start connecting with other Google+ members. Search for people you know first. Then look through the people they follow to find more interesting people to follow. As you spend time on blogs and other social web destinations, look for links to connect with interesting people on Google+.
Build your Reputation. This will happen if they consider you as an expert in your niche. Always remind yourself that your primary goal here is not to make an outright sale. So, avoid doing blatant ads. What you want to achieve first is to get your prospects to click the link that will take them to your articles.
Publish Interesting Content. Your Google+ stream of updates should be interesting and meaningful to your audience, DO NOT spam. Therefore, publish and share interesting updates and links that your target audience will find value in and appreciate seeing.
Integrate and Cross-Promote. Give people as many ways to find your blog and interact with you as possible. Use social media icons, Facebook social plugins, Twitter widgets, and so on to integrate your blog marketing efforts and cross-promote your content.
Focus your Time on being known as an Authority in your Field. Join discussions of industry leaders and talk about the most recent issues in your niche. Offer your opinions and ensure that your prospects will be amazed after reading them.
Be Active. Update your Google+ stream frequently. Make sure you spend time commenting on other members’ updates as well. Talk about their most pressing issues in details and provide the best solutions. Provide answers to questions, offer how-to guides, spill in trade secrets, and offer expert tips and advice. The more active you are, the more likely people are to remember who you are, recognize what your blog is about, and click through to read your posts.

Secure Your Accounts Against Cybercrime

   

The Internet enables us to do things we'd never have even dreamt about even a decade ago. We can buy and sell, learn and teach, meet new people and share experiences without ever leaving te comfort of our homes. But every coin, however, has a flipside, and new dangers stalk us while we're peacefully buying, selling, meeting and sharing.
You've seen the headlines - data breaches at LinkedIn, celebrity Twitter accounts hacked. Even Facebook CEO Mark Zuckerberg is not immune to such attacks.
Thankfully, there are some simple precautionary measures you can take to help minimize the risks. Here are some tips.

1. Awareness

Knowledge is the best way to protect your digital identity. By knowing the dangers of the cyber world, you'll also get a better understanding of how to avoid them. Always find a couple of minutes a day read up on the latest security concerns and remain aware of potentially dangerous situations and vulnerabilities.

2. Anti-virus/spy software

These have long become a must-have for each and every web surfer. Viruses have evolved from simply damaging systems, erasing data and similar outward mischiefs. They can now be used as intricate tools for spying and retrieving your security data and gathering information, such as logins, passwords, keys, phone numbers, e-mails etc., etc.
Install anti-virus software, USB protection and adware blockers so that your device won't betray your interests. And at the same time, don't let such measures lull you into a false sense of security: run regular checks, and don't trust suspicious sites, even if your antivirus doesn't alert you. Scam masterminds are always working to stay a step ahead of the wardens.

3. VPN-services

If you use public Wi-Fi hotspots on a regular basis, a VPN becomes absolutely indispensable.
A VPN is the least protection measure you should use in order to secure your social network and e-mail accounts. It creates a secure encrypted connection between your device and VPN's server.
A VPN, however, is not a silver bullet.
While a VPN does make it harder for hackers and other cybercriminals to spy on you, it’s always better safe than sorry, and you should always avoid sharing potentially sensitive information while using public hotspots.

4. Password generators

Are you proud of your long and sophisticated password that, for years, has been reliable and safe for your multiple online accounts? Don't get too complacent.
In order to stay secure you should change your passwords regularly – and you should avoid using the same one across multiple accounts.
The older your password is, the weaker it becomes each day. If you've run out of ideas or you simply can't memorize new security keys, use a password generator.
These online services will take care of inventing and keeping safe a number of difficult to crack passwords for you.
If you still prefer to come up with your new paswords yourself, make sure they includes both capital and lower-case letters, and numbers also.

5. 2-step authentication

This is one of the easiest and most efficient ways to protect your accounts from being hacked. 
Double-secured accounts are entered through both password and a verification code, normally sent directly to a user’s phone. For some services (for example, online banking and payment systems) such measures are mandatory, but for most social media platforms they're optional. However, it's strongly recommended that you utilize such features for better security.

6. Restriction and blocking apps

This may come as a surprise, but most information breaches occur because a third party, entrusted with the information, leaks it unwittingly. “Third party” meaning your family, friends or even employees.
To be on the safe side you should use the advanced control settings on your devices and limit the sharing of such info to the fewest amount of people possible.

7. Privacy settings

Don't neglect the privacy settings on your social media accounts, applications and other resources which you share information through.
Sometimes the default settings are as far from private as can be and require immediate tweaking to ensure your security isn't compromised.
Make sure you share only what you mean to share, and only with whom you intend to share it with.
You may also want to avoid oversharing in order to maintain a level of anonymity online. Think twice before sharing another picture or clicking “Like” – cyber criminals pick up any shred of information they can get to learn about your tastes and preferences: this makes it easier for them to guess your passwords, secret code words and other information that is supposed to be private.

8. A touch of paranoia

Be suspicious - apply critical thinking and don't trust easily.
You may be sociable, amicable and open-hearted, yet experts advise to add contacts to your friend list only if you know them personally.
Being suspicious will help you avoid the bait set by phishing e-mails and sites and take pause before clicking on links that prompt you to  “Check THIS out!” from your trusted accounts. Double check - make sure the messages really are from one of your connections, not a foe in disguise, who has breached their account.
Also, think twice before sharing sensitive data through unsecured connections – you never know who else might be watching.

Saturday, June 18, 2016

Best Social Media Sites

   


The concepts behind “social networking” aren’t anything new – ever since there have been humans, we have been looking for ways to connect, network, and promote with one another – but they’ve taken on an entirely new meaning (and momentum) in the digital age. Where we used to have handshakes, word-of-mouth referrals, and stamped letters, today’s relationships are often begun and developed on LinkedIn, Google +, and Facebook.

1. Facebook Facebook is still the king of social networking sites with more than 1.5 Billion users.

2. Twitter The Micro-blogging platform for easy communication. You can communicate with celebrities too with the help of Twitter.

3. Google Plus Google’s social networking site Google Plus to develop my business.

4. Pinterest You might be surprised how popular is Pinterest. Why is Pinterest successful people already sharing images on Facebook or Flickr? Be a Pinterest addict.

5. LinkedIn Worlds largest professional network there millions are engaging together.

6. Solaborate Personally one of my favorite professional network. Solaborate lets you collaborate with professionals.

7. YouTube YouTube is not just a video sharing sites, but also a great social media site to connect with others.

8. Flickr.com – The Facebook of photo sharing. Millions of users connecting together.

9. Instagram.com – Worlds biggest photo and quick video sharing platform.

10. SoundCloud.com – Worlds largest online audio distribution platform.

11. Tumblr.com – Used as Micro-blogging and social networking platform owned by Yahoo.com. hosts over 150 million blogs.

12. Keek.com – Share quick videos up to 35 second lengths with your friends.

13. Medium.com – A collaborative publishing tool launched 
by Twitter founders Evan Williams and Biz Stone.

14. Fancy.com – Discover and collect amazing stuffs you love.

15. Foodily.com – Find the tastiest foods on the web and share with your friends.

16. GoodReads.com – Share book recommendation with your friends.
17. Quora.com – Quora is not a social network, but a question and answer site that can help you to get good connections.

18. Vine.co – The Twitter-owned social platform help you create & share beautiful looping videos.

19. Snapchat Snapchat is a video messaging, addictive app lest you connect with a lot of friends.


20. Path.com  Social networking-enabled photo sharing and messaging service for mobile devices in 2010.

Most usefull Photoshop Actions For Designer, Photographer & Digital Artist

   


For graphic designer, photographer or digital artist, Photoshop action can help you automate your hard work, such as applying a series of repetitive changes to a group of images. Here is list of 26 most wanted Photoshop actions that will give your images or designs stunning effects.

Animated Glitch

 Animated Glitch - Photoshop Action

This Action will generate a video sequence of animated glitch effects from your photos, logo or artwork. The effects are packed in clips and are easy to edit in the Timeline of Photoshop, in few clicks you’ll be able to make many variations and then export the file as a normal image, animated gif or video!

Retro Painting Machine

 Retro Painting Machine - Vintage Effect Action

This action gives you a quick & easy possibility to apply an high quality retro / vintage / old style painting effect to your photos & images.

Double Exposure

 Double Exposure Photoshop Action

Double exposure is a photographic technique that combines two images into one. With this action, you can create better double exposure in half the time.

Neon Maker Action Set

 Neon Maker Action Set

This set of Photoshop actions will help you in creating high quality and hi-resolution realistic neon effects. Action works well with any kind of objects such as vector shapes any text and art layers.

Sketch

 Sketch Photoshop Action

Simplify your life and do not spend hours of time trying to create effects like this manually, get it done in minutes with only a few clicks. Get this action and it will turn your photo in to a sketch.

Square Potraits

 Square Potraits Photoshop Action

This action was made carefully using a multi language method and it has been tested on English and French version of Photoshop.

Retro Vintage Text Effects Vol. 2

 Retro Vintage Text Effects Vol. 2
You can use it on simple text, shapes and vector logo. You just need to replace them into the smart object of your favourite style included. Create a great poster or flyer, a facebook cover, a magazine title or a website banner and give them the vintage touch.

Clean Sketch – Photoshop Action


 Clean Sketch - Photoshop Action

This action is designed to transform your photo’s into sketch look.

Materials Type Effects

 Materials Type Effects

Materials Type Effects is a Photoshop Action Set for styling Type or Custom Lettering.

Neon Styles Bundle

 Neon Styles Bundle

These Neon signs are very fun and easy to customize. Simply edit with your own text, or if you wish to create your own template I have included the ASL files.

Typo Portrait Pro Photoshop Action

 Typo Portrait Pro Photoshop Action
Typo Portrait Pro is a Photoshop Action to convert your simple image to Typographic master piece. Results are very professional with minimal user interactions.

25 HDR Photo FX V.1 – Photoshop Action

 25 HDR Photo FX V.1 - Photoshop Action
Easily create true HDR or DRI photos from just one image or multiple exposures with these HDR Pro Photoshop Actions.

Pro 3D Text Mockups V1

 Pro 3D Text Mockups V1
3D text PSD and action files to help you create high quality, elegant 3D text easily and quickly.

Vintage Retro Text Effects

 Vintage Retro Text Effects Col 8
Create distinctive vintage typography with this retro text effect action.

Leather Badge Generator – Photoshop Actions

 Leather Badge Generator - Photoshop Actions
Generate realistic leather elements from your text, logo, shape or design in few clicks.

Instagram Filter – Photoshop Action

 Instagram Filter - Photoshop Action
This action is designed to transform your photo’s into instagram filter look.

Christmas Felt Maker

 Christmas Felt Maker
This set of Photoshop actions will help you in creating high quality, good performance and realistic results related on stitched felt effect. Action works well with any kind of objects such as vector shapes text and art layers.

Smoke Action

 Smoke Action
Photoshop actions that will allow you to create realistic “smoke photo effects”.

RainStorm Photoshop Action

 RainStorm Photoshop Action
Photoshop actions that will make your images look as if they are seen through the wet window while it rains. Make it rain in your photos without getting your camera wet!

Creative Retouch

 Creative Retouch
Make your photos look perfect with Creative Retouch photoshop action!

Minimal Town Maker

 Minimal Town Maker
This set of Photoshop actions will help you to easily create a map from any shape. The final result is a suface consisting of block squares, no matter what were the edges of the original shape. Also, the final result takes the shape contour. Once you’ve generated the surface, you can add any vector shape items that are included in the kit.

Low Poly Generator Photoshop Actions

 Low Poly Generator Photoshop Actions
Photoshop actions with styles intact that will allow you to create geometric low poly effects out of any image. The result contains easy to edit layers with layers styles intact so that you can customize the effect.

Cartoon Text Effects

 Cartoon Text Effects
Very easy to use. Replace the text in seconds via smart object layers. Works with text, vector shapes or logos.

Electrum PS Action

 Electrum PS Action
Incredible Photoshop action Pro with effect of electric lighting. Easy to change color, contrast and gradient of lighting.

Cinematic 3D Movie Mockups

 Cinematic 3D Movie Mockups
Realistic 3D effect easy for you to make with this 3D Mockup. Easy to change the colors of the Backgrounds.

Vibrant Watercolor Effect – Photoshop Action

 Vibrant Watercolor Effect - Photoshop Action
Vibrant Watercolor Effect is a Photoshop plug-in that converts your photo in realistic watercolor painting. Just run the action and watch your drawing come to life!

Friday, June 17, 2016

useful Websites on the Internet

   


In the internet world when we surfing the browser we can see millions of websites, blogs, gaming sites. Websites are categorized as personal resource, travel, education, general internet etc.Internet is an amazing source and they offer some useful websites for the users. Here we go with the list of 50 unique and useful websites on the internet useful for everyone.

1. Alexa.com – The web information company provides commercial web traffic data for everyone.

2. SpeedTest.net – Easy way to test your broadband speed.

3. Iconfinder – Free Icon search engine.

4. Archive.org – The wayback machine to see how the website looks in the past.

5. Goo.gl – Url shortener from Google. Allows you to track, in real-time, the clicks and referrers. You can see your existing links and avoid duplication.

6. StumbleUpon – Website discovery engine. Collection of best pages on the Internet.

7. About.me – A complete professional page about you.

8. Imgur.com – Worlds greatest image hosting and sharing service.

9. Askboth.com – One search and get results from Google, Bing and Twitter

10. Wolframalpha.com – Computation knowledge engine.

11. Evernote – Save your notes for life time.

12. ResizeYourImage.com – Resize your image, its free and easy.

13. GoogleWebDesigner –  A free and easy tool to create animated, 3 HTML5 Ads in minutes.

14. StatsCrop.com – Free Website Analyzer

15. RankSignals.com – Free Backlink checker, can easily 
categorize nofollow, dofollow, hot links for your website.

16.  WeTransfer.com – Share big files for free and secure. Can send up to 2GB in single time.

17. PrivNote.com – Send notes that will self-destruct after being read.

18. Xmarks.com – A powerful tool to Bookmark, Sync and Search.

19. MyFonts.com – Fonts for prints, products and screens. 
Determine the font name from an image.

20. Chipin.com – Easy way to collect money for events etc.
21. GTMetrix.com – Check your websites speed.

22. Sleepytyi.me – The Bedtime calculator.

23. Snapito.com – Take full length website screenshots.

24. WordCounterTool.com – Accurate word counter and also can use to test your typing speed.

25. SmallSEOTools.com – Prefect plagiarism checker.

26. OnlineConcersion.com – Convert anything to anything.

27. Plaxo.com – Plaoxo helps you to organize, manage, and access your contacts in one place.

28. MiiCard.com – Your online internet identity.

29. TwitterFeed.com – Create feeds and connect with your Twitter account, Facebook profile or pages and LinkedIn profile.

30. ManageFlitter.com – Advanced Twitter profile management tool.

31. Paper.li – News curation platform. Become a news publisher with paper.li and Twitter.

32. Ustream.tv – Worlds best and easiest way to stream live video.

33. Join.me – Free screen sharing with anyone on the web.

34. IMDb.com – Specially for movie lovers. To find who has been in which film and what the name of that actor is.

35. TypingWeb.com – Learn to type. Free typing tutor and lessons.

36. TwitLonger.com – Send tweets longer than 140 characters.

37. TwitterSpirit.com – Set an expiration date or time to your Twitter tweets.

38. Zamzar.com – Online file conversion site that works for hundreds of formats.

39. Google Translate –  Translate texts just typing or as document.

40. Vocaroo.com – Simple online voice recorder and you can download in different formats.

41. Cutmp3.net – Easily cut MP3 files online.

42. Similarsites.com – Find similar websites that you liked.

43. HootSuite.com – Manage multiple social networks from a single dashboard.

44. Quora.com – Source for knowledge. Question and Answer Website.

45. LeanDomainSearch.com – Domain searching tool that helps users to find their favorite domains related domains easily with one click. Lean Domain Search shows thousands of related domains that are available to register.

46. PayPal – Worlds faster and secure online money transferring system. Pay and get paid.

47. Who.is – Find information on any domain name or website.

48. PeekYou.com – Free people search engine. To find your friends other Social Network profiles by username, first name or last name.

49. Safeweb.norton.com – Is your website? Look up a site and get rating.

50. Unfurlr.com – Find the original link behind the short link.

How does a search engine work?

   


While you should always create website content geared to your customers rather than search engines, it is important to understand how a search engine works. Once you know this, you can move on to the next step, which is incorporating the elements that the search engine is looking for.

How do search engines work?

Most search engines build an index based on crawling, which is the process through which engines like Google, Yahoo and others find new pages to index. Mechanisms known as bots or spiders crawl the Web looking for new pages (1). The bots typically start with a list of website URLs determined from previous crawls. When they detects new links on these pages, through tags like HREF and SRC, they add these to the list of sites to index. Then, search engines use their algorithms to provide you with a ranked list from their index of what pages you should be most interested in based on the search terms you used. 

Then, the engine will return a list of Web results ranked using its specific algorithm. On Google, other elements like personalized and universal results may also change your page ranking. In personalized results, the search engine utilizes additional information it knows about the user to return results that are directly catered to their interests. Universal search results combine video, images and Google news to create a bigger picture result, which can mean greater competition from other websites for the same keywords.
Here are the top elements to edit when designing your store for SEO

Architecture - Make websites that search engines can crawl easily. This includes several elements, like how the content is organized and categorized and how individual websites link to one another. An XML sitemap can allow you to give a list of URLs to search engines for crawling and indexing. (2) 

Content - Great content is one the most important elements for SEO because it tells search engines that your website is relevant. This goes beyond just keywords to writing engaging content your customers will be interested in on a frequent basis. 

Links - When a lot of people link to a certain site, that alerts search engines that this particular website is an authority, which makes its rank increase. This includes links from social media sources. When your site links to other reputable platforms, search engines are more likely to rate your content as quality also. 

Keywords- The keywords you use are one of the primary methods search engines use to rank you. Using carefully selected keywords can help the right customers find you. If you run a jewelry store but never mention the word "jewelry," "necklace," or "bracelet," Google's algorithm may not consider you an expert on the topic. 

Title descriptions - While it may not show up on the website, search engines do pay attention to the title tag in your site's html code, the words between < title > < /title >, because it likely describes what the website is about, like the title of a book or a newspaper headline. 

Page content - Don't bury important information inside Flash and media elements like video. Search engines can't see images and video or crawl through content in Flash and Java plugins. 

Internal links - Including internal links helps search engines crawl your website more effectively, but also boosts what many SEO professionals refer to as "link juice." In other words, it has the same benefit of any link to your site: It demonstrates the value of your content. 

Search engine is the popular term for an information retrieval (IR) system. While researchers and developers take a broader view of IR systems, consumers think of them more in terms of what they want the systems to do — namely search the Web, or an intranet, or a database. Actually consumers would really prefer a finding engine, rather than a search engine. 
 
Search engines match queries against an index that they create. The index consists of the words in each document, plus pointers to their locations within the documents. This is called an inverted file. A search engine or IR system comprises four essential modules:
  • A document processor
  • A query processor
  • A search and matching function
  • A ranking capability
While users focus on "search," the search and matching function is only one of the four modules. Each of these four modules may cause the expected or unexpected results that consumers get when they use a search engine.
 
Document Processor
The document processor prepares, processes, and inputs the documents, pages, or sites that users search against. The document processor performs some or all of the following steps:

  • Normalizes the document stream to a predefined format.
  • Breaks the document stream into desired retrievable units.
  • Isolates and metatags subdocument pieces.
  • Identifies potential indexable elements in documents.
  • Deletes stop words.
  • Stems terms.
  • Extracts index entries.
  • Computes weights.
  • Creates and updates the main inverted file against which the search engine searches in order to match queries to documents.

Steps 1-3: Preprocessing.While essential and potentially important in affecting the outcome of a search, these first three steps simply standardize the multiple formats encountered when deriving documents from various providers or handling various Web sites. 

The steps serve to merge all the data into a single consistent data structure that all the downstream processes can handle. The need for a well-formed, consistent format is of relative importance in direct proportion to the sophistication of later steps of document processing. Step two is important because the pointers stored in the inverted file will enable a system to retrieve various sized units — either site, page, document, section, paragraph, or sentence. 
 
Step 4: Identify elements to index. Identifying potential indexable elements in documents dramatically affects the nature and quality of the document representation that the engine will search against. In designing the system, we must define the word "term." Is it the alpha-numeric characters between blank spaces or punctuation? If so, what about non-compositional phrases (phrases in which the separate words do not convey the meaning of the phrase, like "skunk works" or "hot dog"), multi-word proper names, or inter-word symbols such as hyphens or apostrophes that can denote the difference between "small business men" versus small-business men." Each search engine depends on a set of rules that its document processor must execute to determine what action is to be taken by the "tokenizer," i.e. the software used to define a term suitable for indexing. 
 
Step 5: Deleting stop words. This step helps save system resources by eliminating from further processing, as well as potential matching, those terms that have little value in finding useful documents in response to a customer's query. This step used to matter much more than it does now when memory has become so much cheaper and systems so much faster, but since stop words may comprise up to 40 percent of text words in a document, it still has some significance. 
A stop word list typically consists of those word classes known to convey little substantive meaning, such as articles (a, the), conjunctions (and, but), interjections (oh, but), prepositions (in, over), pronouns (he, it), and forms of the "to be" verb (is, are). To delete stop words, an algorithm compares index term candidates in the documents against a stop word list and eliminates certain terms from inclusion in the index for searching. 
 
Step 6: Term Stemming. Stemming removes word suffixes, perhaps recursively in layer after layer of processing. The process has two goals. In terms of efficiency, stemming reduces the number of unique words in the index, which in turn reduces the storage space required for the index and speeds up the search process. In terms of effectiveness, stemming improves recall by reducing all forms of the word to a base or stemmed form.
For example, if a user asks for analyze, they may also want documents which contain analysis, analyzing, analyzer, analyzes, and analyzed. Therefore, the document processor stems document terms to analy- so that documents which include various forms of analy-will have equal likelihood of being retrieved; this would not occur if the engine only indexed variant forms separately and required the user to enter all. Of course, stemming does have a downside. It may negatively affect precision in that all forms of a stem will match, when, in fact, a successful query for the user would have come from matching only the word form actually used in the query. 
 
Systems may implement either a strong stemming algorithm or a weak stemming algorithm. A strong stemming algorithm will strip off both inflectional suffixes (-s, -es, -ed) and derivational suffixes (-able, -aciousness, -ability), while a weak stemming algorithm will strip off only the inflectional suffixes (-s, -es, -ed). 
 
Step 7: Extract index entries. Having completed steps 1 through 6, the document processor extracts the remaining entries from the original document. For example, the following paragraph shows the full text sent to a search engine for processing:
Milosevic's comments, carried by the official news agency Tanjug, cast doubt over the governments at the talks, which the international community has called to try to prevent an all-out war in the Serbian province. "President Milosevic said it was well known that Serbia and Yugoslavia were firmly committed to resolving problems in Kosovo, which is an integral part of Serbia, peacefully in Serbia with the participation of the representatives of all ethnic communities," Tanjug said. Milosevic was speaking during a meeting with British Foreign Secretary Robin Cook, who delivered an ultimatum to attend negotiations in a week's time on an autonomy proposal for Kosovo with ethnic Albanian leaders from the province. Cook earlier told a conference that Milosevic had agreed to study the proposal.
Steps 1 to 6 reduce this text for searching to the following:
Milosevic comm carri offic new agen Tanjug cast doubt govern talk interna commun call try prevent all-out war Serb province President Milosevic said well known Serbia Yugoslavia firm commit resolv problem Kosovo integr part Serbia peace Serbia particip representa ethnic commun Tanjug said Milosevic speak meeti British Foreign Secretary Robin Cook deliver ultimat attend negoti week time autonomy propos Kosovo ethnic Alban lead province Cook earl told conference Milosevic agree study propos.
The output of step 7 is then inserted and stored in an inverted file that lists the index entries and an indication of their position and frequency of occurrence. The specific nature of the index entries, however, will vary based on the decision in Step 4 concerning what constitutes an "indexable term." More sophisticated document processors will have phrase recognizers, as well as Named Entity recognizers and Categorizers, to insure index entries such as Milosevic are tagged as a Person and entries such as Yugoslaviaand Serbia as Countries. 
 
Step 8: Term weight assignment. Weights are assigned to terms in the index file. The simplest of search engines just assign a binary weight: 1 for presence and 0 for absence. The more sophisticated the search engine, the more complex the weighting scheme. Measuring the frequency of occurrence of a term in the document creates more sophisticated weighting, with length-normalization of frequencies still more sophisticated. Extensive experience in information retrieval research over many years has clearly demonstrated that the optimal weighting comes from use of "tf/idf." This algorithm measures the frequency of occurrence of each term within a document. Then it compares that frequency against the frequency of occurrence in the entire database.
Not all terms are good "discriminators" — that is, all terms do not single out one document from another very well. 
A simple example would be the word "the." This word appears in too many documents to help distinguish one from another. A less obvious example would be the word "antibiotic." In a sports database when we compare each document to the database as a whole, the term "antibiotic" would probably be a good discriminator among documents, and therefore would be assigned a high weight. Conversely, in a database devoted to health or medicine, "antibiotic" would probably be a poor discriminator, since it occurs very often. The TF/IDF weighting scheme assigns higher weights to those terms that really distinguish one document from the others. 
 
Step 9: Create index. The index or inverted file is the internal data structure that stores the index information and that will be searched for each query. Inverted files range from a simple listing of every alpha-numeric sequence in a set of documents/pages being indexed along with the overall identifying numbers of the documents in which the sequence occurs, to a more linguistically complex list of entries, the tf/idf weights, and pointers to where inside each document the term occurs. The more complete the information in the index, the better the search results.
 

Query Processor
Query processing has seven possible steps, though a system can cut these steps short and proceed to match the query to the inverted file at any of a number of places during the processing. Document processing shares many steps with query processing. More steps and more documents make the process more expensive for processing in terms of computational resources and responsiveness. However, the longer the wait for results, the higher the quality of results. Thus, search system designers must choose what is most important to their users — time or quality. Publicly available search engines usually choose time over very high quality, having too many documents to search against.

The steps in query processing are as follows (with the option to stop processing and start matching indicated as "Matcher"):
  • Tokenize query terms.

  • Recognize query terms vs. special operators.
    ————————> Matcher
  • Delete stop words.
  • Stem words.
  • Create query representation.

  •  
     
    ————————> Matcher
  • Expand query terms.
  • Compute weights.

  •  
     
    ————————> Matcher
Step 1: Tokenizing.As soon as a user inputs a query, the search engine — whether a keyword-based system or a full natural language processing (NLP) system — must tokenize the query stream, i.e., break it down into understandable segments. Usually a token is defined as an alpha-numeric string that occurs between white space and/or punctuation. 
 
Step 2: Parsing.Since users may employ special operators in their query, including Boolean, adjacency, or proximity operators, the system needs to parse the query first into query terms and operators. These operators may occur in the form of reserved punctuation (e.g., quotation marks) or reserved terms in specialized format (e.g., AND, OR). In the case of an NLP system, the query processor will recognize the operators implicitly in the language used no matter how the operators might be expressed (e.g., prepositions, conjunctions, ordering).
At this point, a search engine may take the list of query terms and search them against the inverted file. In fact, this is the point at which the majority of publicly available search engines perform the search.
 
Steps 3and 4: Stop list and stemming. Some search engines will go further and stop-list and stem the query, similar to the processes described above in the Document Processor section. The stop list might also contain words from commonly occurring querying phrases, such as, "I'd like information about." However, since most publicly available search engines encourage very short queries, as evidenced in the size of query window provided, the engines may drop these two steps. 
 
Step 5: Creating the query. How each particular search engine creates a query representation depends on how the system does its matching. If a statistically based matcher is used, then the query must match the statistical representations of the documents in the system. Good statistical queries should contain many synonyms and other terms in order to create a full representation. If a Boolean matcher is utilized, then the system must create logical sets of the terms connected by AND, OR, or NOT. 
 
An NLP system will recognize single terms, phrases, and Named Entities. If it uses any Boolean logic, it will also recognize the logical operators from Step 2 and create a representation containing logical sets of the terms to be AND'd, OR'd, or NOT'd. 
 
At this point, a search engine may take the query representation and perform the search against the inverted file. More advanced search engines may take two further steps. 
 
Step 6: Query expansion. Since users of search engines usually include only a single statement of their information needs in a query, it becomes highly probable that the information they need may be expressed using synonyms, rather than the exact query terms, in the documents which the search engine searches against. Therefore, more sophisticated systems may expand the query into all possible synonymous terms and perhaps even broader and narrower terms. 
 
This process approaches what search intermediaries did for end users in the earlier days of commercial search systems. Back then, intermediaries might have used the same controlled vocabulary or thesaurus used by the indexers who assigned subject descriptors to documents. Today, resources such as WordNet are generally available, or specialized expansion facilities may take the initial query and enlarge it by adding associated vocabulary. 
 
Step 7: Query term weighting (assuming more than one query term). The final step in query processing involves computing weights for the terms in the query. Sometimes the user controls this step by indicating either how much to weight each term or simply which term or concept in the query matters most and must appear in each retrieved document to ensure relevance. 
 
Leaving the weighting up to the user is not common, because research has shown that users are not particularly good at determining the relative importance of terms in their queries. They can't make this determination for several reasons. First, they don't know what else exists in the database, and document terms are weighted by being compared to the database as a whole. Second, most users seek information about an unfamiliar subject, so they may not know the correct terminology. 
 
Few search engines implement system-based query weighting, but some do an implicit weighting by treating the first term(s) in a query as having higher significance. The engines use this information to provide a list of documents/pages to the user.
After this final step, the expanded, weighted query is searched against the inverted file of documents.
 

Search and Matching Function
How systems carry out their search and matching functions differs according to which theoretical model of information retrieval underlies the system's design philosophy. Since making the distinctions between these models goes far beyond the goals of this article, we will only make some broad generalizations in the following description of the search and matching function. Those interested in further detail should turn to R. Baeza-Yates and B. Ribeiro-Neto's excellent textbook on IR (Modern Information Retrieval, Addison-Wesley, 1999). 

 
Searching the inverted file for documents meeting the query requirements, referred to simply as "matching," is typically a standard binary search, no matter whether the search ends after the first two, five, or all seven steps of query processing. While the computational processing required for simple, unweighted, non-Boolean query matching is far simpler than when the model is an NLP-based query within a weighted, Boolean model, it also follows that the simpler the document representation, the query representation, and the matching algorithm, the less relevant the results, except for very simple queries, such as one-word, non-ambiguous queries seeking the most generally known information. 
 
Having determined which subset of documents or pages matches the query requirements to some degree, a similarity score is computed between the query and each document/page based on the scoring algorithm used by the system. Scoring algorithms rankings are based on the presence/absence of query term(s), term frequency, tf/idf, Boolean logic fulfillment, or query term weights. Some search engines use scoring algorithms not based on document contents, but rather, on relations among documents or past retrieval history of documents/pages. 
 
After computing the similarity of each document in the subset of documents, the system presents an ordered list to the user. The sophistication of the ordering of the documents again depends on the model the system uses, as well as the richness of the document and query weighting mechanisms. For example, search engines that only require the presence of any alpha-numeric string from the query occurring anywhere, in any order, in a document would produce a very different ranking than one by a search engine that performed linguistically correct phrasing for both document and query representation and that utilized the proven tf/idf weighting scheme. 
 
However the search engine determines rank, the ranked results list goes to the user, who can then simply click and follow the system's internal pointers to the selected document/page. 
 
More sophisticated systems will go even further at this stage and allow the user to provide some relevance feedback or to modify their query based on the results they have seen. If either of these are available, the system will then adjust its query representation to reflect this value-added feedback and re-run the search with the improved query to produce either a new set of documents or a simple re-ranking of documents from the initial search.
 

What Document Features Make a Good Match to a Query
We have discussed how search engines work, but what features of a query make for good matches? Let's look at the key features and consider some pros and cons of their utility in helping to retrieve a good representation of documents/pages.

• Term frequency: How frequently a query term appears in a document is one of the most obvious ways of determining a document's relevance to a query. While most often true, several situations can undermine this premise. First, many words have multiple meanings — they are polysemous. Think of words like "pool" or "fire." Many of the non-relevant documents presented to users result from matching the right word, but with the wrong meaning.
Also, in a collection of documents in a particular domain, such as education, common query terms such as "education" or "teaching" are so common and occur so frequently that an engine's ability to distinguish the relevant from the non-relevant in a collection declines sharply. Search engines that don't use a tf/idf weighting algorithm do not appropriately down-weight the overly frequent terms, nor are higher weights assigned to appropriate distinguishing (and less frequently-occurring) terms, e.g., "early-childhood."
• Location of terms: Many search engines give preference to words found in the title or lead paragraph or in the metadata of a document. Some studies show that the location — in which a term occurs in a document or on a page — indicates its significance to the document. Terms occurring in the title of a document or page that match a query term are therefore frequently weighted more heavily than terms occurring in the body of the document. Similarly, query terms occurring in section headings or the first paragraph of a document may be more likely to be relevant.

• Link analysis: Web-based search engines have introduced one dramatically different feature for weighting and ranking pages. Link analysis works somewhat like bibliographic citation practices, such as those used by Science Citation Index. Link analysis is based on how well-connected each page is, as defined by Hubs and Authorities, where Hub documents link to large numbers of other pages (out-links), and Authority documents are those referred to by many other pages, or have a high number of "in-links" (J. Kleinberg, "Authoritative Sources in a Hyperlinked Environment," Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms. 1998,pp. 668-77).

• Popularity :Google and several other search engines add popularity to link analysis to help determine the relevance or value of pages. Popularity utilizes data on the frequency with which a page is chosen by all users as a means of predicting relevance. While popularity is a good indicator at times, it assumes that the underlying information need remains the same.

• Date of Publication: Some search engines assume that the more recent the information is, the more likely that it will be useful or relevant to the user. The engines therefore present results beginning with the most recent to the less current.

• Length : While length per se does not necessarily predict relevance, it is a factor when used to compute the relative merit of similar pages. So, in a choice between two documents both containing the same query terms, the document that contains a proportionately higher occurrence of the term relative to the length of the document is assumed more likely to be relevant.

• Proximity of query terms : When the terms in a query occur near to each other within a document, it is more likely that the document is relevant to the query than if the terms occur at greater distance. While some search engines do not recognize phrases per se in queries, some search engines clearly rank documents in results higher if the query terms occur adjacent to one another or in closer proximity, as compared to documents in which the terms occur at a distance.

• Proper nounssometimes have higher weights, since so many searches are performed on people, places, or things. While this may be useful, if the search engine assumes that you are searching for a name instead of the same word as a normal everyday term, then the search results may be peculiarly skewed. Imagine getting information on "Madonna," the rock star, when you were looking for pictures of madonnas for an art history class.