man thinking

Searching External Data using BCS

Search and Business Connectivity Services are two of the most powerful tools within the sizeable SharePoint toolchest. Together they give you a way of not only giving users access to all manner of external database systems, but giving your users the ability to find information that would otherwise be buried in enormous databases.

Univac.jpg

But there is one fundamental set of configurations that you have to get right, and that is security. You have to be able to get into SharePoint of course, but having got there, you need to also get from SharePoint to the external system which may have a completely different kind of authentication system. This is just as true for search as it is when accessing the data directly.

When you set up a data source and one or more External Content Types using Business Connectivity Services you will need to decide on the method of authentication. The first option is to "pass through" the user's identity to the external system (also called "user's identity"). This means of course that all your users need to be given access to that system, which might not be desirable. The second method is to used the trusted subsystem security model in which the identity of the server process is used to access the external system. This can be called "RevertToSelf" (named after the API method call involved) or "BDC Identity".  The third method is called "impersonation" in which the Secure Store Service is used to store an identity to be used to access the back-end system. Depending on how the secure store is configured, this can use either a group identity in which a set of credentials are shared by a number of users, or an individual identity can be cached in the secure store so that user only has to enter those credentials once.

Remember that there are two authentication steps involved - the first from the client to the BDC (Business Data Connectivity), and the second from BDC to the back-end system (which we will assume for the sake of argument is SQL Server). So you first need to ensure that your user account has access to your BDC object (if you do not you get the dreaded"Access denied by Business Data Connectivity" error). You do that by going to your Business Data Service Application management page and giving the user permission either through the Set Object Permissions or Metadata Store Permissions buttons on the ribbon, depending on whether you want to give permissions for that particular object or the BDC as a whole. Secondly, you need to ensure that the backend system allows access to the account used by the BDC (which will depend on the authentication method). So for example in SQL Server you would need to give those user permissions to your database. If that is not correct you will get the subltly different "Access denied by LOBSystem" error.

So how does the search crawler authenticate? This depends on how the data source is configured, and those authentication models have slightly different implications for the search crawler. The first thing to note is that the crawler uses a default crawl account to access content. This can be overriden by crawl rules, but let's assume that the default crawl account is being used. This account itself defaults to the search service account, but I recommend that you configure it explicitly in the search administration page. There is a link in the "Search Status" section which will allow you to change the crawl account if needed and, more importantly, to set the password.

To allow the search crawler to access your external data you also need to go to the BCS service application adminstration page and give the crawl account permissions as you would for an external user. In this case I suggest you always use the more comprehensive Metadata Store level permission setting, rather than setting this individually on each object in the BDC unless you have a particular reason to do so. Finally, you need to ensure that your database permissions are set up correctly, and that will depend on the authentication configuration.

In the following sections we'll draw a simple diagram to show the accounts used in each step. 

Authentication scheme: User Identity/Pass-through

                                         user                         user
Normal Client    ------> BDC ------> SQL
 
                                       crawl a/c               crawl a/c
Search Crawler ------> BDC ------> SQL

This is probably the recommended approach for configuring crawling of external data. However, it might not be ideal for user authentication, since it places the onus on the administrator of the back-end system to ensure that all users have permissions. In this case, if the data source is to be used for both user access and crawling, then it will not be possible to use pass-through authentication.

Authentication scheme: Impersonation/Secure Store (with group identity)

                                           user                      group a/c 
Normal Client    ------> BDC ------> SQL
 
                                         crawl a/c               group a/c
Search Crawler ------> BDC ------> SQL

For crawling, you will probably go to your Secure Store configuration and add a group identity and make the crawl account a member of that group. You can then configure the identity that will be used to access the back-end system. If you already have a group account for users you could probably just add your crawl account to this group.

Authentication scheme: ReverToSelf/BDC Identity

                                            user              application pool a/c 
Normal Client    ------> BDC ------> SQL
 
                                         crawl a/c              farm a/c
Search Crawler ------> BDC ------> SQL

RevertToSelf is not really recommended, but is there if needed, although you will have to explicitly enable it using PowerShell. Normal users will find that the application pool account is used to access the back-end system. But surprisingly when the search crawler is used it is the farm account that needs to be given permissions on the database. You would expect this to be the application pool account, and from talking to people close to the product group this was the intention, and is even documented on TechNet. But it seems that this never made it into the product. Presumably the farm account is used because the search crawler runs as a timer job (which runs under the farm account), so that is the account that would be used if RevertToSelf was never called, but I am just guessing.