Monday, March 16, 2015

Sitecore error with Lucene Thai Analyzer

ManagedPoolThread #1 2015:03:12 08:32:28 ERROR Exception
Exception: System.Reflection.TargetInvocationException
Message: Exception has been thrown by the target of an invocation.
Source: mscorlib
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
   at (Object , Object[] )
   at Sitecore.Pipelines.CorePipeline.Run(PipelineArgs args)
   at Sitecore.Jobs.Job.ThreadEntry(Object state)

Nested Exception

Exception: System.NotSupportedException
Message: PORT ISSUES
Source: Lucene.Net.Contrib.Analyzers
   at Lucene.Net.Analysis.Th.ThaiAnalyzer.ReusableTokenStream(String fieldName, TextReader reader)
   at Lucene.Net.Index.DocInverterPerField.ProcessFields(IFieldable[] fields, Int32 count)
   at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument()
   at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc, Analyzer analyzer, Term delTerm)
   at Lucene.Net.Index.IndexWriter.UpdateDocument(Term term, Document doc, Analyzer analyzer)
   at Sitecore.ContentSearch.LuceneProvider.LuceneUpdateContext.UpdateDocument(Object itemToUpdate, Object criteriaForUpdate, IExecutionContext[] executionContexts)
   at Sitecore.ContentSearch.SitecoreItemCrawler.DoUpdate(IProviderUpdateContext context, SitecoreIndexableItem indexable)
   at Sitecore.ContentSearch.LuceneProvider.LuceneIndex.PerformUpdate(IEnumerable`1 indexableUniqueIds, IndexingOptions indexingOptions)

In a single day, we saw this error appear over 9000 times on a production environment.

From what I understand (since 7.0+) Sitecore by default provides full mapping of all available Lucene.net analyzers. They are configured under:
indexConfigurations > defaultLuceneIndexConfiguration > analyzer > param desc="map"
Based on the context of the content that's indexed/searched, Sitecore will (with reflection) figure out which mapping to use. Here’s a great post explaining execution contexts - http://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/08/execution-contexts-explained.aspx

So the Thai Analyzer seems to be a bit broken (read not implemented) from what I see in the Lucene.Net source. The Analyzer calls the constructor for ThaiWordFilter with a token stream and that constructor just throws the exception we see. You can decompile the Lucene.Net.Contrib.Analyzers.dll or look at the source at http://lucenenet.apache.org/.

public ThaiWordFilter(TokenStream input): base(input)
{
  throw new NotSupportedException("PORT ISSUES");
  //breaker = BreakIterator.getWordInstance(new Locale("th"));
  //termAtt = AddAttribute<TermAttribute>();
  //offsetAtt = AddAttribute<OffsetAttribute>();
}

Removing or commenting out the Thai analyzer (the below mapEntry) from the execution context mappings in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config should result in indexing/searching in th-TH to fall back to the standard analyzer and will get rid of the error in your log files.

             <mapEntry type="Sitecore.ContentSearch.LuceneProvider.Analyzers.PerExecutionContextAnalyzerMapEntry, Sitecore.ContentSearch.LuceneProvider">
                <param hint="executionContext" type="Sitecore.ContentSearch.CultureExecutionContext, Sitecore.ContentSearch">
                  <param hint="cultureInfo" type="System.Globalization.CultureInfo, mscorlib">
                    <param hint="name">th-TH</param>
                  </param>
                </param>
                <param desc="analyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
                  <param desc="defaultAnalyzer" type="Lucene.Net.Analysis.Th.ThaiAnalyzer, Lucene.Net.Contrib.Analyzers">
                    <param hint="version">Lucene_30</param>
                  </param>
                </param>
              </mapEntry>

If anyone has come across this before, I'd love to hear from you!


Update: Pavel Veller (@pveller) pointed out to me that this issue has been fixed with Sitecore 7.2 Update 3. As per the release notes:
  • Thai Analyzer from Lucene.Net was not fully implemented and could sometimes throw Not Supported exceptions. The analyzer has been removed from the default Lucene index configuration. The default analyzer will be used instead. (420234)

Wednesday, March 11, 2015

Searchable Language Selector

If you have ever worked in a Sitecore instance with a lot of languages, you may have noticed that sometimes it could be quite time consuming (and frustrating) to look for the language you need in the language picker. This isn't as much a developer problem as it is an issue for the content editors who often make edits in multiple languages. So, here's a quick and easy client-side solution.

The language selector is generated by an xml control located here: \sitecore\shell\Applications\Content Manager\Galleries\Languages\Gallery Languages.xml

A couple of modifications to add a search box, and a couple of javascript functions later, and we now have a searchable language selector:



You can find the modified control up on GitHub. Let me know what you guys think!

Update: This modification is now also available for download from the Sitecore Marketplace.