Major AlchemyAPI Update

Posted by: eturner on June 18th, 2009

Today we’re announcing a significant upgrade to our AlchemyAPI content analysis online service. This update includes expanded language coverage (adding Portuguese and Swedish), enhanced text categorization, and integration with Linked Data standards.

The press release for this update is available here.

AlchemyAPI now supports analyzing content written in eight different languages, more than any other commercially-available text mining service.   We’re committed to supporting all of the world’s major languages, and this update moves us significantly closer to that end-goal.  AlchemyAPI now understands the native language of over 1.2 billion individuals.

Significant updates have also been made to AlchemyAPI’s text categorization service, a mechanism that identifies content by subject category (health, politics, etc.). Support for five new subject categories, enhancements to categorization performance, and support for processing Microblogging content have all been integrated into this release.

AlchemyAPI enters the “Semantic Web” world with integrated support for Linked Data standards. By interconnecting with assets in the Linked Data cloud, publishers gain access to a huge volume of highly-relevant information, further enhancing the value of their content.  AlchemyAPI integrates content links to online databases such as Wikipedia, GeoNames, the CIA World Factbook, and more. Linked Data integration also includes support for RDF (Resource Description Format) semantic web standards.

These updates are available immediately to all new and existing AlchemyAPI subscription users. To learn more about AlchemyAPI, please visit http://www.alchemyapi.com/.

Add comment

New Release, New Tools, Open Registration

Posted by: eturner on May 5th, 2009

A new release of AlchemyAPI is upon us!

Notable additions in this release include: updates to the AlchemyAPI programmer SDKs, secure SSL access to AlchemyAPI for subscription users, content mining / Named Entity Extraction from photographs of printed documents (OCR+Entity Extraction), and more.

We’ve also opened up AlchemyAPI registration to the general public: Register today

This release brings with it several new tools built around the AlchemyAPI content analysis services.  These include:

AlchemyTagger - Semantic-powered Tag Suggestions for WordPress Blogs

AlchemyTagger automatically works in the background as you’re blogging, analyzing your writing and suggesting useful tags for your posts. Tags make your posts easier to navigate, better-ranked by search engines, and can increase flows of relevant website traffic.

AlchemySEO - Semantic-powered Search Engine Optimization

AlchemySEO detects when a search engine is accessing your website, returning a semantically-marked up version of your content. Specifically, AlchemySEO annotates your web page content with REL-TAG Microformats and HTML META “keyword” tags.  By exposing this semantic meta-data to search engines such as Google and Yahoo, AlchemySEO improves your search engine rankings and increases flows of relevant traffic.

Orchestr8 will be exhibiting at Gluecon next week, May 12-13.  If you’re attending and would like to learn more about AlchemyAPI, please stop by our booth!

Add comment

Orchestr8 presenting at BDNT on March 23rd

Posted by: eturner on March 20th, 2009

Orchestr8 will be demoing our AlchemyAPI service at the Boulder-Denver New Technology event on March 23rd.

Come and join us to find out more information on AlchemyAPI and its capabilities.

We have a suprise for this BDNT event: Something special and entirely new in the world of NLP text mining.  We’ve built a fun interactive demo to illustrate this new capability and will be giving everyone a chance to get “hands on” at the BDNT.

Add comment

Named Entity Disambiguation

Posted by: eturner on March 17th, 2009

We’re back with another big update to our AlchemyAPI content analysis / text mining service!

What’s new in this release?  Named Entity Disambiguation

Human language is not exact. Text referring to the city “Roanoke” can mean “Roanoke, Virginia” or “Roanoke, Texas“, depending on the surrounding context. Organizations and companies often have multiple nicknames, name variations, or common misspellings. Famous persons (”Michael Jackson”) often share a name with many non-famous individuals.

Named Entity Disambiguation works to solve these and other text ambiguity problems.

So how does it work?

Our disambiguation engine employs tens of millions of contextual hints describing traits of the world’s objects, individuals, and locations. We employ a variety of public and non-public data-sets.

Hints vary depending on the specific type of entity being disambiguated. For example, when disambiguating people, we utilize information on a person’s career, where they’re located, who they work for, and so on. For companies: key executives, notable products, industry, location, etc.

Whenever an entity is successfully disambiguated, additional information is returned in API responses. This includes the fully resolved, disambiguated entity name, and if available, the entity’s website and geographic coordinates.

AlchemyAPI’s Named Entity Disambiguation system resolves approximately two-dozen entity types, more than any other commercially-available text mining system!

Disambiguation functionality is available to all API preview / beta users.  If you do not currently have an API access key, please apply for one.

Also new in this release:

  1. Source text can now optionally be returned in all named entity and keyword extraction API call results.
  2. Updates to online API documentation.
  3. New developer SDKs for Ruby, C, and C++.

Add comment

New SDKs, Website Updates

Posted by: eturner on February 17th, 2009

AlchemyAPI users, rejoice!  We’ve just released SDKs for over a half-dozen programming languages (Java, .NET, Python, Perl, PHP, etc!) that enable easy integration of AlchemyAPI into your development project.  For those of you not preferring to use a SDK, our entire API is (and will always be) available as an Internet-accessible REST web service.

In other news: We’ve made some significant updates to the Orchestr8 website, in preparation for the AlchemyAPI general availability release.

We’ll be announcing something pretty exciting in the next week or two (a definite commercial “first” in the NLP / text mining world).  Stay tuned :)

Add comment

Orchestr8 sponsoring Gluecon 2009!

Posted by: eturner on February 13th, 2009

Orchestr8 has signed on as an official sponsor / exhibitor for Gluecon 2009!

About gluecon:

The idea that “the web is the platform” is now widely accepted among tech entrepreneurs. But even with the web as a common platform, we still find ourselves in the same “stovepipe” problem. What was the proliferation of separate enterprise application stovepipes of information, process and workflow that led to the growth of “enterprise application integration” in the late 90s, is now the explosion of web-based applications that will demand similar levels of web integration.

Glue is the only conference devoted solely to this new problem facing enterprise architects, developers and integrators. Glue is about all of the bits and pieces, APIs and meta-data, standards and connectors that will help us to glue together the varying applications of the new platform.

If you are interested in attending, Gluecon will be happening May 12-13th in Denver, Colorado.  You can register here.  We’ll be there, demoing our new AlchemyAPI semantic / contextual product offering.

Add comment

Orchestr8 Profiled in Colorado Biz Magazine

Posted by: eturner on February 9th, 2009

Orchestr8 was recently profiled in Colorado Biz magazine.  You can read the article here.

Add comment

Content Analysis API Updates

Posted by: eturner on January 22nd, 2009

We’ve just deployed a significant new update to the Orchestr8 Content Analysis API!

This release contains a number of exciting enhancements, including:

  1. JSON output support (easily integrate into any Javascript web application).
  2. Document / Content uploading support (analyze private / non-web-accessible content).
  3. Updated API Documentation.
  4. Microformats output support (automatically generate rel-tag Microformat content).
  5. Enhanced named entity detection accuracy (improved detection of sports teams, radio stations, and more).

Beta users can take advantage of the new API release effective immediately.  If you are not currently a beta user and would like to apply for an API key, please contact Orchestr8 Support.

Add comment

New Year & New Release

Posted by: eturner on January 12th, 2009

Happy 2009 from Orchestr8! We’re kicking off this new year with an exciting update to AlchemyGrid’s text analysis / entity extraction system.

We’ve blogged about text analysis and NLP (natural language processing) in the past. These capabilities are utilized within our contextual widget platform, and by our Content Analysis API.

Orchestr8 has put significant effort into building a robust natural language parsing capability: automated language identification, topic classification, sentence splitting / parsing, part-of-speech tagging, chunking, and so on. Our “language stack” is entirely statistical in its basis, operating in a much similar fashion to speech recognition systems and Google’s automated translation service, Google Translate.

This new release represents a significant improvement in overall precision and recall, moving our system significantly closer to human-level tagging performance. What this means is, we’re now able to detect People, Companies, Locations, and other entity types even better than we were before. Existing API and widget users will see these improvements automatically; no changes to API calls are necessary.

We’re able to continually improve our language capabilities over time, due to the statistical nature of our approach and the fact that we’re steadily generating larger and larger sets of training materials. To create training materials for our system, we rely on a dedicated team of human annotators (something we’ve discussed in the past). Our annotation team generates hundreds of thousands of words of training materials each day, which are then used to teach our machine-learning algorithms, improving system accuracy.

We have some really exciting announcements planned for coming weeks / months related to the AlchemyGrid platform and its natural language processing capabilities. Stay tuned for more information.

Existing Users: Enjoy the new release, and feel free to contact us with any comments, questions, or feedback!

Everyone else: We’ll be opening up public access to the AlchemyGrid Contextual Analysis API very soon! If you can’t wait and would like to apply for early access, send us an email.

Add comment

Statistical Language Processing & POS Tagging

Posted by: eturner on December 12th, 2008

This is the third in a rather-technical series of posts on statistical language processing work being performed here at Orchestr8.

For a quick overview of statistical language processing, see this post.

We utilize these techniques in our AlchemyGrid platform for a wide variety of tasks: web content / data clipping, language identification, named entity extraction, keyword identification, and so on. Some of these tasks (such as named entity extraction) are rather complex, involving multi-stage text processing pipelines and large training corpora. Robust named entity extraction (identifying people, places, companies, etc. in text), relation extraction (identifying tuples: “[Bob] is-an-employee-of [Boeing]” in text), and other NLP techniques involve a whole series of processing steps, somewhat akin to a “language processing stack”.

One of these “processing steps” is POS tagging, otherwise known as part-of-speech tagging. This is the process of assigning a “part of speech” (noun, verb, etc.) to every word in a text sequence. POS tagging is similar to, though more complex than, the word-labeling tasks many of us had to do in early grammar school.

Modern techniques go beyond simple noun, verb, and adverb identification, using tagsets of 80 or even 130+ tags. This means identifying words as “semantically superlative adjectives”, “possessive plural proper nouns”, and so forth. Things further complicate with combined tags, negated tags, and other word forms. For an example POS tagset see this Wikipedia entry.

POS tagging gives additional insight into the meaning of text, enabling us to provide useful services, such as contextually-relevant widgets, contextual analysis APIs, etc. We custom-engineered a POS-tagging solution that is both fast (many thousands of words/sec tagging speed) and accurate (for you language geeks, 97+% accuracy in 10-fold cross-validation). Like the rest of our language stack, our POS tagging system can also be easily re-trained to tag foreign languages.

So what does POS-tagged text look like, anyway?

Let’s take an example text sequence:

President-elect Barack Obama is likely to name Steven Chu, a physicist who runs the Lawrence Berkeley National Laboratory, as his energy secretary.

Here’s a POS-tagged version:

President-elect/JJ Barack/NNP Obama/NNP is/VBZ likely/JJ to/TO name/VB Steven/NNP Chu,/NNP a/DT physicist/NN who/WP runs/VBZ the/DT Lawrence/NNP Berkeley/NNP National/NNP Laboratory,/NNP as/IN his/PRP$ energy/NN secretary./NN

Each of these “/XX” tags represents a part-of-speech. For example, the words “Steven” and “Chu” are both tagged with “NNP”, indicating they are a “singular proper noun.”

We use machine learning and statistical modeling techniques to perform this and many other language processing and relevancy computation tasks. These components make up the various layers in our “language processing stack” and enable much of the truly exciting stuff you’re seeing from the AlchemyGrid: contextually relevant widgets, our contextual analysis API, and content enrichments.

Stay tuned for future posts that detail other components of our NLP engine — aka “the language stack”.

1 comment