| |||||||||||||||
![]() | ![]() | ![]() |
| | |||||||
| | LinkBack | Thread Tools | Display Modes |
| |||
| Text containing formal language elements considered more searchable It is observed that searching for a piece of sample code is much easier than searching for a solution described in a natural language. This is because source code written in a programming language using a certain code library or for a certain system interface almost surely has an unambiguous vocabulary (e.g. class and function names) and/or consistent syntactic conventions. For example, to find some code that gets caret position from a win32 edit control, searching for close occurrences of the keywords GetCaretPos and AttachThreadInput will immediately catch a few sample code snippets for this development purpose. Similar observations can also be made from search systems for scientific data such as DNA codes. So I propose that natural language document composition and retrieval may also benefit from such principle. We may define and promote a controlled vocabulary (glossary) for a knowledge domain, and encourage information producers and searchers to use such vocabulary in making and retrieving information of this knowledge domain. Information producers may also include one or more publicly recognized domain identifiers in his information, and information searchers can use the same kind of domain identifier to narrow his search. Besides controlled terms, more complex formal language elements may be specified to formalize structured concepts in information. For example, a "win32_programming" knowledge domain may define a term "caret" and a possible action "get" associated with "caret". This is like defining a class "caret" in some C++ program and a member function "get" for this class. Then the combined formal expression "caret.get" can be more precise in describing the structured concept "get the caret" than a near combination of "caret" and "get" respectively. Using formal semantics also enables a search engine to reliably deduce implications of a search criteria and therefore to return relevant results if an exact match does not exist. Yao Ziyuan |