Get answers and suggestions for various questions from here

How does TML KnowledgeCloud achieve "small knowledge + big data = big knowledge"


Today, with the rapid development of artificial intelligence technology, all walks of life are faced with the opportunities and challenges brought by informationization and intelligence. Whether it is manual labor or mental work, as long as it can be repeated according to a certain process, it is gradually replaced by machines. For example, various robotic arms on the production line can work tirelessly; the back-end system of a large e-commerce website can make a portrait of each consumer in real time, and even know his own needs more than the consumer; the bank's credit card application review position is early Almost all have been replaced by machines; and so on. The intelligent systems in these scenarios generally have a machine brain that allows computers to think and make decisions like humans and even experts. We call the ability of computers to have such brains cognitive computing power. In the process of establishing industry intelligence, a core link is to express the human accumulated knowledge system as a form that the computer can understand and execute, so that the computer can think and make decisions, and related knowledge representation, knowledge matching and knowledge reasoning. Collectively referred to as knowledge mapping technology.

Knowledge map is one of the core links of cognitive computing, attracting countless outstanding scientists and engineers to invest in research and development in this field. IBM's Watson has been investing resources and efforts in this direction since more than a decade ago; one of Palantir's core technologies, Dynamic Ontology, has been polished for many years to solve similar problems; at home, despite some in the public security and financial sectors. Startup companies claim to have technology in this area, but the actual level is not known, because from the development history of Watson and Palantir, the technology of knowledge map needs years of polishing to mature. So, why does it take so long to build a knowledge map to build a machine brain? Is it so difficult? There are mainly the following reasons:

1. There is still a lack of method for the representation of knowledge: the knowledge system accumulated by human society has various forms, and the knowledge representation methods that the current machine can understand and execute cannot match. For example, in the knowledge map academic community, the triples that describe the semantic target relationship are widely used. Although it is widely used at present, it is so simple that it cannot even directly indicate that “someone has a certain position in a company at a certain time”. The pattern of the number of elements, not to mention the fact that it cannot directly describe complex forms such as upper and lower relationships and compositional relationships between multiple targets. Therefore, the use of triples to model the knowledge system accumulated by humans has the lack of ability to express. In addition, when defining the semantic goals and the relationships between them in combination with the specific text context, the common knowledge means are basically regular expressions plus Boolean logic restrictions, unable to describe the frequency, order, spacing, etc. of the semantic targets. feature. This is a key reason why current mainstream natural language processing techniques cannot effectively extract large-scale knowledge. The formal methods that can be used are far from the natural definition of knowledge itself.

2. Cold start is difficult: In any scenario, the number of semantic targets that need to be defined and matched will explode as the analysis progresses. For example, when we need to establish a knowledge map in the legal field, the first thing that comes to mind is to model and extract the basic goals of the plaintiff and the defendant in each indictment and judgment; when we want to focus on a certain type of case. In the case of intellectual property cases, there are still hundreds of major elements of evidence and outcomes that need to be extracted from intellectual property cases. When we need to further deepen the computer into a senior expert in intellectual property, we must A comprehensive list of all the elements related to intellectual property, whether they are common or not, will be in the order of magnitude. In the process of defining these hundreds of case elements, we also need to define a large number of intermediate semantic elements. For example, in order to identify the plaintiff's address, we need to define semantic elements such as provinces, cities, counties, streets, and house numbers. In order to define the house number, we must define the basic semantic elements such as the most basic Chinese numerals, Arabic numerals, and English numbers. Therefore, in the actual work, it is usually necessary to define a million semantic targets; in order to ensure that these elements are better identified through machine learning, a sufficient number of representative samples should be labeled for each of these targets. This takes a lot of labor and has a serious cold start problem.

The engineering is extremely difficult: in the actual work, due to the complexity and hugeness of the knowledge system, and the data to be analyzed is a large amount of unstructured text, it has an unusual engineering difficulty in building the knowledge map to polish these texts. For example, with thousands of semantic goals and their relationships, light-defined text can be as high as hundreds of megabytes, targets may be interdependent, relationships may conflict with each other, and how to relate them to massive text Efficient matching and reasoning can be imagined. For example, when trying to dig out the statistical relationship between the plaintiff and the defendant from tens of millions of referee documents, it is first to identify the relationship type, or to identify the plaintiff defendant first, which may lead to a huge difference in the operating efficiency of the system. In the actual work, if we want to provide a common knowledge map modeling method and matching engine for cross-industry scenarios, we need to consider all these problems in order to ensure the versatility of the field, while in each field Enough segmentation and precise definition and identification of a large number of semantic goals.

The core team of the network sense, based on the accumulation of technology and algorithms in the past two decades, polished the TML cognitive computing platform, and based on this, launched the TML KnowledgeCloud product, which better solved the above problems. The working principle of the TML cognitive computing platform can be summarized by the phrase "small knowledge + big data = big knowledge". It uses our custom set of formal methods to model the knowledge of various industries and their relationships, and then integrates industry expert knowledge with deep neural networks to solve the problem of knowledge acquisition and knowledge reasoning. Issues such as quasi-incompleteness and conflict ultimately allow computers to make decisions and reasoning in an interpretable manner. This process can be described simply as the following three steps:

1. Representation and acquisition of knowledge : NetSense has customized a set of computer programming language TML, which is short for Text Mining Language, which provides a universal and powerful formal method for modeling knowledge systems in various scenarios. TML's description ability far exceeds the triples. It provides a context-sensitive grammar to define semantic goals. There are two aspects of knowledge modeling in knowledge modeling: First, it uses generated grammar, allowing users to customize various Concepts and their upper and lower generation relationships, a concept can be used as part of defining other concepts; second, it proposes a set of contextual computing symbols that contain Boolean logic symbols, and defines the context of any number of concepts. Environment, such as frequency, spacing, and order. In the TML description system, triples are just a simple exception to the context. Through TML, the knowledge of industry experts can be described as code written in TML in any industry scenario, allowing the computer to execute automatically, which better realizes the representation and calculation of "small knowledge".

2. The work of the knowledge engine : the human knowledge system defined by TML programming, through our self-developed compiler, is transformed into an intermediate bytecode form that is easy for the computer to analyze and process efficiently. These intermediate bytecodes are then loaded into us. In the self-developed running virtual machine, when the text is submitted to the virtual machine, each knowledge point defined and the relationship between them are accurately matched. One of the technological advances of TML is that, through special mutation processing, the efficiency of running a virtual machine to process a large amount of knowledge can be basically independent of the amount of knowledge. Therefore, TML's running virtual machine becomes a powerful knowledge engine that can submit massive amounts of big data to this virtual machine, quickly matching each predefined knowledge point in the data. Through this model, we turn the expert's experience and knowledge into a program that the computer can automatically process and execute, which better avoids the cold start problem that traditional methods need to label large amounts of data.

3. Machine learning integrates human knowledge : With a powerful knowledge engine as the foundation, we can integrate deep learning with the knowledge engine to solve incomplete, inaccurate and conflicting places in the expert-defined knowledge system. There are three main aspects of work here: one is to find the subordinate words of each defined knowledge point automatically through deep learning, and the other is to use the machine learning algorithm to simply label the results of the above matching and use it as the annotation corpus to train and establish. In the interest model, the third is to automatically calculate a weight for each rule defined by the expert to solve the conflict between knowledge. Through this series of combined boxing efforts, we are able to optimize and expand small knowledge based on big data, gain great knowledge, and our big knowledge engine compensates and expands the deficiencies of human experts. The intelligence provided is one. An interpretable artificial intelligence.

Thanks to the TML cognitive computing platform as the back-end infrastructure, we are able to launch TML KnowledgeCloud products to provide knowledge maps and cognitive computing cloud services to customers and partners across a wide range of industries. For experts and scholars from all walks of life, TML KnowledgeCloud is a platform for turning their experience into a machine brain that can be executed automatically by computers. They share the benefits of these machine brains with the sense of the net; for the development of various industries. For informatization and intelligent vendors, TML KnowledgeCloud is a soft chip that requires knowledgeable NLP and machine learning engineers to have knowledge maps and cognitive computing capabilities through simple API calls and integration. We are gradually releasing dozens of industry knowledge maps, tens of thousands of semantic target elements and thousands of relationships between them in the judicial, financial, medical, manufacturing and other industries over the past years into the TML KnowledgeCloud. Providing API interfaces in SAAS mode allows developers to use them for free. Through collaboration with industry experts, TML KnowledgeCloud will provide knowledge maps in more industries and scenarios, and output a variety of cognitive computing capabilities that directly serve the business, such as semantic factor extraction, association mining, and logical reasoning.

Nanjing Net Sense Zhi Information Technology Co., Ltd. is a high-tech company with self-developed TML cognitive computing platform as its core technology. It has exported knowledge map construction, knowledge reasoning and data insight for various customers and partners in the past three years. Other capabilities, including:

1. Extract thousands of knowledge points from the text to build a knowledge map, and complete the deep semantic understanding of business documents such as electronic medical records, referee documents, and financial announcements;

2. Based on the accumulated business knowledge maps of dozens of industry scenarios, helping to establish deep insights into massive unstructured and semi-structured business data;

3. Establish logical reasoning skills such as intelligent pre-diagnosis and intelligent pre-judgment based on knowledge maps and historical case bases in each vertical field to help build industry artificial intelligence applications;

NetEase has launched the TML KnowledgeCloud platform to bring together experts from all walks of life to build them into machine brains, providing cloud services such as knowledge engine, knowledge extraction and logical reasoning for various customers. The product is empowering multiple clients and partners in the industries of judicial public security, medical and medical care, financial insurance, smart manufacturing, sensational business and retail fast-moving, to create a practical and practical enterprise-level artificial intelligence.