Meta-data extraction is very crucial feature for any content management system. I have written couple of articles related to Mata-data extraction in Alfresco CMS. If you have ever worked with such extraction job the question which pops up to one’s mind is what about the reverse scenario? That means is it possible to set the properties on binary content files from alfresco user interface. Normally when any content comes up with in alfresco after meta-data extraction all the meta-data will be stored in the alfresco database and content will be converted in to the binary file and stored in content store. We will be able to see those meta-data from the alfresco UI and also able to change them (obviously though some configurations) but that editing is limited to alfresco only with the help of this feature we will be able to do reverse as well that mean we can even update the property of original binary file through alfresco.

Although Metadata embedding is very new feature in alfresco and I come to know about it though one the alfresco employees blog here is some details I feel worth sharing.

If you have gone through my meta-data extraction blog you must be aware that alfresco use Apache-Tika library internally to handle metadata extraction of majority of file types.

There is MetadataEmbedder interface which has two methods, isEmbeddingSupported, and embed.

MetadataExtracterRegistry is extended with a getEmbedder(String sourceMimetype) method.  AbstractMappingMetadataExtracter implements the MetadataEmbedder interface mentioned previously and contains:

  • supportedEmbedMimetypes collection that’s used in the isEmbeddingSupported call
  • embedMapping that defines the mapping from Alfresco properties to metadata fields
  • embedInternal method to be overridden by extending classes

If you are aware about meta-data extraction flow this is exactly a reverse of it and you will be able to use it easily. There is one more thing you will require apart from above classes i.e. MyExtracter.embed.properties file which will be used for revers mapping. If you forget to create this file it will use the original one used for meta-data extraction.

Now as we know by now that for handling majority of files alfresco use Apach Tika underneath so question comes up is does Tika support this feature? Will it be able to handle this revers mapping? Answer is yes. Apache is in process of developing support for this feature where already there is a API embedInternal defined in its parent AbstractMappingMetadataExtracter to convert Alfresco properties to Tika metadata fields and passes that on to a Tika Embedder‘s embed method, which then passes back the new binary with the metadata embedded.

For exposing this feature to End users alfresco has added a ContentMetadataEmbedder action executor which shows up as a standard ‘Embed properties as metadata in content’ action that can be used in a rule on a folder.  (This is available only on 4.2.c and late version and you can find this in alfresco/extension/metadata-embedding-context.xml.sample)

References: Metadata embedding

