Friday, May 15, 2009

The undocumented truth about SharePoint Search

Now that the new corporate Intranet has gone live, I decided I would test some of the Search functionality to make sure it is working correctly. I did a search on a keyword that I knew should return results from the Intranet and was only getting back results from the file share I had indexed. I logged into the Search Server 2008 administration site and found that my Local Office SharePoint Server sites content source had no items in the successes lately and had an error. So I looked at the crawl log and found the error "Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to the crawl repository...."



I knew that the account I was using could access the content because I logged into the site with that user account and was able to get to all of the pages without any problems. The next spot I looked was the SharePoint logs and I did not find anything out of the ordinary in them so I moved onto the Event logs. In there were some warnings for the Office Server Search that said "The start address <http://intranet/> cannot be crawled. Context: Application 'SharedServices', Catalog 'Portal_Content'



Details: Access is denied. Verify that either....."



This was repeated in the events every time I tried to run a Full Crawl. I knew that many things affect the crawling of a SharePoint site, including Alternate Access Mappings, content account privileges, and content rules or scopes.



I started looking into all of these things and they all looked ok to me and I knew that search was working before on this web app. The only thing I had changed since then was to extend the application to out production domains for internal and external access. I started searching and found that there were just as many answers as there appeared to be questions when it comes to this error. Many of the answers were "Check to make sure that the content crawl account has access to the site."



The answer ended up being a combination of many articles, one of them being this Event ID site. I scrolled down to the area where people added their comments and found one from Ionut Marin that said you should extend the web app with a new site. I then followed the Google Groups link at the bottom of that site and it took me to another person that said I should extend the web app. I ended up following his instructions to the letter.




  • I went to Central Administration -> Application Management.

  • Select Create or extend web application and make sure you have the right web application that you want to extend.

  • Set all of the parameters on the extend web app. page as needed, but you should set the host header to be the name of the server.

  • From the zone list pick anything other than Default.

  • When you are done with that, go to Authentication providers and make sure that your newly created web app. is set to use windows integrated authentication.

  • Then go to Operations -> Alternate Access Mappings for the web application.

  • Click Edit Public URLs and switch out new URL you just created for whatever is currently in the Default zone. The URL that is in the Default zone can be placed in any other zone.

  • Wait a few minutes and change your Local Office SharePoint Server sites content source to crawl the new web address. So for my example the server name is web2. I created a new extended web app. with a URL of http://web2:1234/. The port is not really important as you will not access the site at all using this URL. I then set the start address in the content source to be http://web2:1234/.

  • I started a full index and noticed that items were showing back up in the index.

Another thing that I noticed after the crawling was working again was that I was getting warnings in the crawl log for all ASPX pages that said "Content for this URL is excluded by the server because a no-index attribute." This seems to be a generic error for a number of things that could go wrong and did not point to my issue specifically. So I tried clicking on the link that was having the issue and got back a more meaningful error that said "Code blocks are not allowed in this file."


I understood this because it was indexing the new web app. and I had added a Page Parser Path to the web.config for my other web apps in order to run some custom code. After I placed the Page Parser Path in the new web apps web.config file I could hit the page through the new URL and I ran another Full crawl of the site. This time the number of items in the index increased and the warnings decreased dramatically. I looked into the crawl log and there were no more warnings about the ASPX pages.


Although this resolution worked for me, it may not be the same for you. As I said, Search seems to have many issues or areas of configuration that could cause errors or prevent content from being indexed.

Labels: , ,

2 Comments:

At June 19, 2009 at 10:58 AM , Anonymous Anonymous said...

I had a same problem with my site.

Can you please give me the example of your code that you added to your webconfig file in your new web apps. Great blog!

>>
After I placed the Page Parser Path in the new web apps web.config file I could hit the page through the new URL and I ran another Full crawl of the site.

 
At June 22, 2009 at 8:29 AM , Blogger Michael Markel said...

The line in the web.config to allow code to be run on a page is in the PageParserPaths and looks like this (open tag)PageParserPath VirtualPath="/*" CompilationMode="Always" AllowServerSideScript="true" IncludeSubFolders="true" (close tag)

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home