Thursday, April 25, 2013

Manage deletion of index items (SharePoint Server 2010)

While re-configuring the search configuration and checking / validating the search functionality I discovered around 15.000 errors in the crawl-log.
The reason for this was that I changed the architecture for a specific site collection. In my first setup, I had a site collection with about 55 sub-sites. Each sub-site had it's own document library and security profile.

I already migrated about 15.000 documents in some of these sub-sites when I realized this structure wasn't going to work for me.
The maintenance and template creation of these sub-sites was so high, together with some problem in SharePoint with these sub-sites (I could not get the import-server to see the sub-sites to import the data in it, and some users complained they could not see the sub-sites even though they had access to it).

So, I decided to delete all sub-sites (thank god for powershell) and created 55 document libraries under the main site-collection.
Of course, the search crawlers already crawled the previous content (with sub-sites) and when executing the full crawl schedule now, it resulted in it not finding the 15K documents (and 15K errors!).

Thanks to this great TechNet article I found the correct properties to change in the SharePoint 2010 Search Service application. It was quite a challenge because of the many documented other problems with SP2010 Search (404 not found ->Loopback check, etc. etc.) but after a while I found the relevant information :

Delete policy for access denied or file not found

ErrorDeleteCountAllowed: Default value: 30
ErrorDeleteIntervalAllowed: 720 Hours (30 days)

So it would have taken SharePoint 30 + 30 days (2 months !) before it would have removed the reference and resolve the error automatically...
I made the numbers a bit lower and saw this morning that the crawler has deleted 15K references :)

Next: to find out why the SearchContentAccess account can't access 2 site collections in a web-application that allows this account FULL READ policy access (and works for the other 50+ site collections..)
Cheers, Jeroen

Command to change the policy-parameter values:

$SearchApplication = Get-SPEnterpriseSearchServiceApplication -Identity "<SearchServiceApplicationName>"
$SearchApplication.SetProperty("<PropertyName>", <NewValue>)

Tuesday, April 16, 2013

SharePoint 2010 "File already exists" backup error

Yesterday I asked the community to help me with a #wickedProblem regarding the SharePoint production farm backup. Basically it gave me 8 errors, all caused by one main error:

InvalidOperationException: Job "Search Backup and Restore (first phase backup) application b46e25e7-2307-4...restofid..-query-0 server [servername]" failed with error: The file '\\server\share$\spbr0000\b46e25e7-2307-4..sameasabove-query-0\projects\portal_content\indexer\cifiles\' already exists.

I created a blog post for this:

and thanks to Cornelius J. van Dyk I found out that my 'emptying the backup folder' before starting the backup, wasn't really effective because of the default "Don't show hidden files, folders, or drives" windows explorer setting.. Yup, changed the setting to actually 'see' all files made me able to 'delete' all files on the share :)

Ran the backup again and this time it finished:

Finished with 0 warnings.
Finished with 0 errors.

Another problem solved :)

Tuesday, April 09, 2013

How I added 250.000+ documents and 3.5 Million pieces of metadata to 152 #SP2010 document libraries

The approach I took was the following:

First, created a separate site collection to hold the 8000+ documents, just from a content database point of view this made most sense.

After enabling all branding features on this site collection, I created a template document library which had the appropriate content type enabled, correct view and managed metadata navigation-configuration.

Next, I used a tool called DocKIT from VYAPIN software to create a basic-metadata-excel document from a file share which contained the documents that need to go to SharePoint.

This basic .XLS contains all file references (path) and per record (file), you can add the metadata you need to fill the content type in the destination location (library). Because it's all in excel (the tool reads from this document later), it's easy to copy and paste the metadata in the document / complete the metadata in bulk and offline.

Next, I created a document library per 'company-documents'; this was done mainly for security reasons. (Persons who have rights to company-A's document, don't necessarily have rights to company-B's documents. To manage these rights in bulk, separate document-libraries are used.

I started off by creating separate sites (sub-sites) per company (152 in total), using a powershell script to create the sub-sites, but I left this idea after some sites seemed to be inaccessible (just did not exist) for some users and for the import tool. Very strange behaviour from SP2010 but I didn't have time to do huge research on why this was happening.

The metadata.xls document contained the 'destination library location' for each document (again, a copy & paste within excel makes this definition very fast) and after completing all metadata for the 8000+ documents it was ready to use the file as import data for the DocKIT tool.
This tool reads the file from it's "Path" location, uploads it to the destination library on SharePoint, attaches the right content type to the file and applies the configured metadata, all automatically.

After completion, 8000+ documents were uploaded in 152 document libraries, mostly automated :)
Each document contains 14 pieces of metadata, in total more then 112000 metadata additions in SharePoint 2010.

I created the document libraries automatically, based on the data in the excel document (company name is a piece of metadata that was available as the document library name) , using the library template as one of the parameters.

I'm happy to share more details if need-be on the configuration of DocKIT or any of the other items described here :)

Update: using the same principle, I also created a "Public Domain Documents Silo" for this client, uploading 220.000 documents (mostly OCR'd PDF's), all with around 15 pieces of metadata attached. (I wrote a little C# program that got most of the existing information like folder structure into the metadata.xls document).
This site collection is fully search able (SharePoint 2010 search) and VERY fast, under 3 seconds. The nice thing is that because all documents have so much metadata, the search if fully refine able. Creating a great end-user experience.