Secrets of Office 365 Image Analysis & Search
John E. Huschka, December 9, 2018
SharePoint Image Search
Microsoft has promoted Office 365 image analysis and image search features for the last year, using both of these terms to describe Office 365's ability to recognize text and properties within images and to make that information available via search and other functionality. Customer attempts to use this compelling feature have been mixed.
This post provides you with insight into this feature, which we will refer to as "image content recognition". We will show you how to determine if image content recognition is working in your tenant, and if so, how to see its results. We will show you how image content recognition varies depending on the location, content, and properties of images being surveyed.
This post is current as of our date of publication, based on our work with Office 365 and with Microsoft Support. We encourage you to test in your O365 tenant from time-to-time to determine if/how the functionality is working for you. We welcome feedback.
What We Found
Collaboration Foundry tested image content recognition in our tenant. In addition, we worked with Microsoft Support (case #10638094), who tested in their own tenant and worked with the relevant product team. Our findings to date are:
- The feature is available in OneDrive and SharePoint (both classic and modern sites).
- It has worked for us in both the out-of-the-box "Documents" library and a user-created library.
- It has only worked for us using the out-of-the-box "Document" content type.
We understand that Microsoft is continuing to release updates to this feature. We will continue to test this feature to understand its functionality, and we encourage you to do the same. Note that it takes a few days for image content recognition updates to be reflected in your tenant.
The rest of this article will detail some of the feature's architecture, helping you diagnose and test it within your tenant. Note that we are using SharePoint's "Classic Experience" throughout to allow us to display columns unsupported by the SharePoint Modern Experience.
Image Content Recognition is an Image Analysis Feature
Image content recognition's value is largely exposed through search: You can search for and find images based on text and properties within them. There are no apparent changes to search functionality, however. The functionality that is new is that which extracts information from the image. That extraction process (which we will call "image analysis") records its results in standard columns that are indexed by search.
We can see this by looking at a working OneDrive instance of image content recognition in which image analysis has populated metadata columns:
Internally, seven columns, three of which are visible here, have been added to the library and underlying Document content type:
- Tags: A classification regarding the nature of the image—"receipt" or "screenshot" for example. The column's internal name is MediaServiceAutoTags, and it is indexed within search as the MediaServiceAutoTags managed property. The column's purpose, besides enabling search, is not clear:
- MediaServiceLocation: Contains the image's origin location. The column's internal name is MediaServiceLocation, and it is indexed within search as the MediaServiceLocation managed property.
- Extracted Text: Contains the text that image analysis captured from the image. The column's internal name is MediaServiceOCR. This property is indexed within search but is not available as a named, managed property.
- MediaServiceMetadata: This hidden column contains a JSON document used internally by media services. Typically, it will contain something like this:
{ "ctag": "\"c:{92B4CEEC-44F1-4A3F-A9B5-E7897F7903C1},2\"", "timestamp": "2017-06-21T14:28:19.253339Z", "modules": [{ "module": "OfficeBundleGeneration", "version": 1 } ], "officeBundle": { "ctag": "\"c:{92B4CEEC-44F1-4A3F-A9B5-E7897F7903C1},2\"", "fatalError": false, "version": "1.83182827" } }
The column's internal name is MediaServiceMetadata, and it is not indexed within search.
- MediaServiceFastMetadata: This hidden column contains a JSON document used internally by media services. Typically, it will contain something like this:
{ "officeBundle": { "ctag": "\"c:{92B4CEEC-44F1-4A3F-A9B5-E7897F7903C1},2\"", "fatalError": false, "version": "1.83182827" } }
The column's internal name is MediaServiceFastMetadata, and it is not indexed within search.
- MediaServiceEventHashCode: This column's purpose is not apparent. Whereas the MediaServiceMetadata and MediaServiceFastMetadata columns may be present for documents in which image analysis found no content, this column only appears to have a value when image analysis found relevant content in the image and then populated the Tags, MediaServiceLocation, and/or Extracted Text columns. The column's internal name is MediaServiceEventHashCode, and it is not indexed within search.
- MediaServiceGenerationTime: Presumably, the date/time at which image analysis reviewed the document. As with MediaServiceEventHashCode, it appears to only be populated when there is relevant content in the image. The column's internal name is MediaServiceGenerationTime, and it is not indexed within search.
So, the key to knowing if Image Content Recognition is available and working in your tenant is to look for these columns as well as if/how they have been populated.
Our OneDrive Test Results
We uploaded three PNG files ("SoybeanProcess.png", "SoybeanTestResults1.png", and "GasValveArchitecture.png") containing diagrams of various manufacturing processes. In all three cases, image analysis was able to extract content from the diagrams.
In addition, two JPG files, geotagged manufacturing plant photographs, were uploaded. Image analysis extracted the geotagging information and correctly translated it into named locations in Missouri and Kansas. It correctly found no recognizable text content.
As expected, image analysis added all seven columns to the list definition (irrelevant fields and properties omitted):
<Field ID="{617f8947-74b2-36bc-9f7e-21ded7029bb5}" Type="Note" DisplayName="MediaServiceMetadata" Name="MediaServiceMetadata" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceMetadata" /> <Field ID="{b887b6b2-4dcf-34fc-98b1-d5a42c605755}" Type="Note" DisplayName="MediaServiceFastMetadata" Name="MediaServiceFastMetadata" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceFastMetadata"/> <Field ID="{ae5f3e55-9e2d-4632-9b90-282575c1c1f3}" Type="Text" DisplayName="MediaServiceEventHashCode" Name="MediaServiceEventHashCode" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceEventHashCode" /> <Field ID="{64ae69b5-5b9d-4ba9-b01c-f0ab72af8b7b}" Type="Text" DisplayName="MediaServiceGenerationTime" Name="MediaServiceGenerationTime" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceGenerationTime" /> <Field ID="{d1cff744-ba61-4189-94d6-97d0a9eb4f6a}" Type="Text" DisplayName="Tags" Name="MediaServiceAutoTags" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceAutoTags"/> <Field ID="{67aff0cf-8e19-43f2-9987-be89075e1467}" Type="Note" DisplayName="Extracted Text" Name="MediaServiceOCR" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceOCR"/> <Field ID="{5e0382d1-eb43-4590-a609-575845fa9af9}" Type="Text" DisplayName="MediaServiceLocation" Name="MediaServiceLocation" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceLocation"/>
A notable test result is that image analysis added columns by updating the underlying Document content type. This suggests that image analysis may be tied to this content type, which is consistent with our inability to apply image analysis to non-Document content types. (See below.)
<ContentType ID="0x010100E28D36E41E449B47A2E6BC0C6FD40C00" Name="Document" Group="Document Content Types" Description="Create a new document.">
<Fields>
<Field ID="{617f8947-74b2-36bc-9f7e-21ded7029bb5}" Type="Note" DisplayName="MediaServiceMetadata" Name="MediaServiceMetadata"/>
<Field ID="{b887b6b2-4dcf-34fc-98b1-d5a42c605755}" Type="Note" DisplayName="MediaServiceFastMetadata" Name="MediaServiceFastMetadata"/>
<Field ID="{ae5f3e55-9e2d-4632-9b90-282575c1c1f3}" Type="Text" DisplayName="MediaServiceEventHashCode" Name="MediaServiceEventHashCode"/>
<Field ID="{64ae69b5-5b9d-4ba9-b01c-f0ab72af8b7b}" Type="Text" DisplayName="MediaServiceGenerationTime" Name="MediaServiceGenerationTime"/>
<Field ID="{d1cff744-ba61-4189-94d6-97d0a9eb4f6a}" Type="Text" DisplayName="Tags" Name="MediaServiceAutoTags" />
<Field ID="{67aff0cf-8e19-43f2-9987-be89075e1467}" Type="Note" DisplayName="Extracted Text" Name="MediaServiceOCR"/>
<Field ID="{5e0382d1-eb43-4590-a609-575845fa9af9}" Type="Text" DisplayName="MediaServiceLocation" Name="MediaServiceLocation"/>
</Fields>
</ContentType>
Our SharePoint Modern Site Test Results
We uploaded the same files to a SharePoint Modern Site, using a ProjectDocument content type derived from SharePoint's out-of-the-box Document content type:
Here, image analysis did not add columns to the ProjectDocument content type or the library, and we have no image analysis data to display.
Our Results with the Document Content Type
We uploaded three different files to a SharePoint Modern Site, using SharePoint's out-of-the-box Document content type:
Here, our results are largely consistent with our OneDrive results. Image analysis has only added five of the seven columns that we see in OneDrive, however. The MediaServiceEventHashCode and MediaServiceGenerationTime columns are missing. In addition, a MediaServiceDateTaken column has been added.
<Field ID="{617f8947-74b2-36bc-9f7e-21ded7029bb5}" Type="Note" DisplayName="MediaServiceMetadata" Name="MediaServiceMetadata" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceMetadata" /> <Field ID="{b887b6b2-4dcf-34fc-98b1-d5a42c605755}" Type="Note" DisplayName="MediaServiceFastMetadata" Name="MediaServiceFastMetadata" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceFastMetadata"/> <Field ID="{f34611d5-65a6-322f-ac39-d880b14ce28f}" Type="Text" DisplayName="MediaServiceDateTaken" Name="MediaServiceDateTaken" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceDateTaken"/> <Field ID="{d1cff744-ba61-4189-94d6-97d0a9eb4f6a}" Type="Text" DisplayName="MediaServiceAutoTags" Name="MediaServiceAutoTags" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceAutoTags"/> <Field ID="{5e0382d1-eb43-4590-a609-575845fa9af9}" Type="Text" DisplayName="MediaServiceLocation" Name="MediaServiceLocation" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceLocation"/> <Field ID="{67aff0cf-8e19-43f2-9987-be89075e1467}" Type="Note" DisplayName="Extracted Text" Name="MediaServiceOCR" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceOCR"/>
Likewise, the Document content type now has the five original OneDrive columns and the MediaServiceDateTaken column added:
<ContentType ID="0x01010092A4AAC8C3566F46BDC289AE9883005D" Name="Document" Group="Document Content Types" Description="Create a new document.">
<Fields>
<Field ID="{617f8947-74b2-36bc-9f7e-21ded7029bb5}" Type="Note" DisplayName="MediaServiceMetadata" Name="MediaServiceMetadata"/>
<Field ID="{b887b6b2-4dcf-34fc-98b1-d5a42c605755}" Type="Note" DisplayName="MediaServiceFastMetadata" Name="MediaServiceFastMetadata"/>
<Field ID="{f34611d5-65a6-322f-ac39-d880b14ce28f}" Type="Text" DisplayName="MediaServiceDateTaken" Name="MediaServiceDateTaken"/>
<Field ID="{d1cff744-ba61-4189-94d6-97d0a9eb4f6a}" Type="Text" DisplayName="MediaServiceAutoTags" Name="MediaServiceAutoTags"/>
<Field ID="{5e0382d1-eb43-4590-a609-575845fa9af9}" Type="Text" DisplayName="MediaServiceLocation" Name="MediaServiceLocation" />
<Field ID="{67aff0cf-8e19-43f2-9987-be89075e1467}" Type="Note" DisplayName="Extracted Text" Name="MediaServiceOCR" />
</Fields>
</ContentType>
The MediaServiceDateTaken column contains the value of the "Date Taken" property from the JPG image. So, for the file "2017-11-07.jpg", MediaServiceDateTaken has a value of "2017-11-07T13:36:40.000". This potentially useful field is (unfortunately) hidden and not indexed within search.
Our SharePoint Classic Site Test Results
We uploaded our test files to a SharePoint classic site. We also added some additional files to flex O365's image analysis functionality. We used the out-of-the-box Document content type in a user-created library:
Our results were consistent with how image analysis worked elsewhere; however, by expanding our test document collection, we flexed some of O365's image analysis capabilities:
- We confirmed that image analysis requires photo geotagging. Although the "MarshallPlantRaw" image has location-defining text within it, image analysis determined the location only after we geotagged the photo, creating the "MarshallPlant" file.
- The "SoybeanProcessingTraining" PDF document content is not captured by image analysis.
- Image analysis was able to correctly identify images with a person ("PowerPlantEngineer" and "PowerPlantEngineerWide"). Notably, however, it was no longer able to determine if the image was taken indoors when we removed content from the image left and right edges to reduce width. (See MediaServiceAutoTags column.)
Image analysis' classic site implementation is largely consistent with the OneDrive implementation. Here, the MediaServiceDateTaken column used in SharePoint modern sites is not present. In addition, the MediaServiceLocation column is now displayed as "Location", and the MediaServiceOCR column is no longer displayed as "Extracted Text":
<Field ID="{617f8947-74b2-36bc-9f7e-21ded7029bb5}" Type="Note" DisplayName="MediaServiceMetadata" Name="MediaServiceMetadata" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceMetadata"/> <Field ID="{b887b6b2-4dcf-34fc-98b1-d5a42c605755}" Type="Note" DisplayName="MediaServiceFastMetadata" Name="MediaServiceFastMetadata" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceFastMetadata"/> <Field ID="{d1cff744-ba61-4189-94d6-97d0a9eb4f6a}" Type="Text" DisplayName="MediaServiceAutoTags" Name="MediaServiceAutoTags" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceAutoTags"/> <Field ID="{67aff0cf-8e19-43f2-9987-be89075e1467}" Type="Note" DisplayName="MediaServiceOCR" Name="MediaServiceOCR" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceOCR"/> <Field ID="{ae5f3e55-9e2d-4632-9b90-282575c1c1f3}" Type="Text" DisplayName="MediaServiceEventHashCode" Name="MediaServiceEventHashCode" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceEventHashCode"/> <Field ID="{64ae69b5-5b9d-4ba9-b01c-f0ab72af8b7b}" Type="Text" DisplayName="MediaServiceGenerationTime" Name="MediaServiceGenerationTime" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceGenerationTime"/> <Field ID="{5e0382d1-eb43-4590-a609-575845fa9af9}" Type="Text" DisplayName="Location" Name="MediaServiceLocation" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceLocation"/>
And, once again, the columns have been added via the underlying Document content type:
<ContentType ID="0x010100B955BD3904425844A80278B6379EEFEF" Name="Document" Group="Document Content Types" Description="Create a new document.">
<Fields>
<Field ID="{617f8947-74b2-36bc-9f7e-21ded7029bb5}" Type="Note" DisplayName="MediaServiceMetadata" Name="MediaServiceMetadata" />
<Field ID="{b887b6b2-4dcf-34fc-98b1-d5a42c605755}" Type="Note" DisplayName="MediaServiceFastMetadata" Name="MediaServiceFastMetadata"/>
<Field ID="{d1cff744-ba61-4189-94d6-97d0a9eb4f6a}" Type="Text" DisplayName="MediaServiceAutoTags" Name="MediaServiceAutoTags"/>
<Field ID="{67aff0cf-8e19-43f2-9987-be89075e1467}" Type="Note" DisplayName="MediaServiceOCR" Name="MediaServiceOCR" />
<Field ID="{ae5f3e55-9e2d-4632-9b90-282575c1c1f3}" Type="Text" DisplayName="MediaServiceEventHashCode" Name="MediaServiceEventHashCode" />
<Field ID="{64ae69b5-5b9d-4ba9-b01c-f0ab72af8b7b}" Type="Text" DisplayName="MediaServiceGenerationTime" Name="MediaServiceGenerationTime" />
<Field ID="{5e0382d1-eb43-4590-a609-575845fa9af9}" Type="Text" DisplayName="Location" Name="MediaServiceLocation" Group="_Hidden" />
</Fields>
</ContentType>
Image Analysis seems rather conservative in terms of adding columns to the library and content type. We have a library in which no files are geotagged with location:
Internally, only four of the original seven columns are present. The MediaServiceEventHashCode, MediaServiceGenerationTime, and MediaServiceLocation columns are not present.
<Field ID="{617f8947-74b2-36bc-9f7e-21ded7029bb5}" Type="Note" DisplayName="MediaServiceMetadata" Name="MediaServiceMetadata" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceMetadata" ColName="ntext2" RowOrdinal="0" /> <Field ID="{b887b6b2-4dcf-34fc-98b1-d5a42c605755}" Type="Note" DisplayName="MediaServiceFastMetadata" Name="MediaServiceFastMetadata" Group="_Hidden" Hidden="TRUE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceFastMetadata" ColName="ntext3" RowOrdinal="0" /> <Field ID="{d1cff744-ba61-4189-94d6-97d0a9eb4f6a}" Type="Text" DisplayName="MediaServiceAutoTags" Name="MediaServiceAutoTags" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceAutoTags" ColName="nvarchar13" RowOrdinal="0" /> <Field ID="{67aff0cf-8e19-43f2-9987-be89075e1467}" Type="Note" DisplayName="MediaServiceOCR" Name="MediaServiceOCR" Group="_Hidden" Hidden="FALSE" Sealed="TRUE" ReadOnly="TRUE" StaticName="MediaServiceOCR" ColName="ntext4" RowOrdinal="0" />
And the same is true of the underlying Document content type:
<ContentType ID="0x010100B99249AB796809429EE88DAA74453FF0" Name="Document" Group="Document Content Types" Description="Create a new document.">
<Fields>
<Field ID="{617f8947-74b2-36bc-9f7e-21ded7029bb5}" Type="Note" DisplayName="MediaServiceMetadata" Name="MediaServiceMetadata" />
<Field ID="{b887b6b2-4dcf-34fc-98b1-d5a42c605755}" Type="Note" DisplayName="MediaServiceFastMetadata" Name="MediaServiceFastMetadata" />
<Field ID="{d1cff744-ba61-4189-94d6-97d0a9eb4f6a}" Type="Text" DisplayName="MediaServiceAutoTags" Name="MediaServiceAutoTags" />
<Field ID="{67aff0cf-8e19-43f2-9987-be89075e1467}" Type="Note" DisplayName="MediaServiceOCR" Name="MediaServiceOCR" />
</Fields>
</ContentType>
Conclusion
Office 365 image content recognition works in a powerful way in OneDrive and in SharePoint classic and modern sites. It has only worked for us, however, on Document content type items.
Image content analysis is a powerful and compelling feature, and we at Collaboration Foundry are looking forward to its future updates and to incorporating it into our client implementations.
We at Collaboration Foundry are experts in SharePoint and Office 365, including integration with Azure. If you need assistance, we can help. Contact us.