Learning in Artificial Intelligence

April 4, 2018 | Author: R Ravi Teja | Category: Technology, Artificial Intelligence, Learning, Analogy, Artificial Neural Network


Comments



Description

Learning in Artificial Intelligence• • • • • • • • • • • • • • What is learning? Rote learning Learning by taking advice Learning in problem solving Learning from examples Induction Explanation based learning Discovery analogy Formal learning theory Neural net learning and genetic learning What is learning? Most often heard criticisms of AI is that machines cannot be called intelligent until they are able to learn to do new things and adapt to new situations, rather than simply doing as they are told to do. Some critics of AI have been saying that computers cannot learn! Definitions of Learning: changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time. Learning covers a wide range of phenomenon: – Skill refinement: Practice makes skills improve. More you play tennis, better you get – Knowledge acquisition: Knowledge is generally acquired through experience Various learning mechanisms Simple storing of computed information or rote learning, is the most basic learning activity. Many computer programs i e., database systems can be said to learn in this sense although most people would not call such simple storage learning. Another way we learn if through taking advice from others. Advice taking is similar to rote learning, but high-level advice may not be in a form simple enough for a program to use directly in problem solving. People also learn through their own problem-solving experience. Learning from examples: we often learn to classify things in the world without being given explicit rules. Learning from examples usually involves a teacher who helps us classify things by correcting us when we are wrong. • • • • • • Rote Learning • When a computer stores a piece of data, it is performing a rudimentary form of learning. • In case of data caching, we store computed values so that we do not have to recompute them later. • When computation is more expensive than recall, this strategy can save a significant amount of time. a rudimentary kind of learning is taking place: The programmer is sort of a teacher and the computer is a sort of student.• Caching has been used in AI programs to produce some surprising performance improvements. a card game. • Pattern classification programs often combine several features to determine the correct category into which a given stimulus should be placed. A human user first translates the advice from English into a representation that FOO can understand. and correct them through yet more advice. • Such caching is known as rote learning. A computer program might make use of the advice by adjusting its static evaluation function to include a factor based on the number of center squares attacked by its own pieces. • A human can watch FOO play. Learning in Problem solving • Can program get better without the aid of a teacher? • It can be by generalizing from its own experiences. People process advice in an analogous way. detect new mistakes. Learning by parameter adjustment • Many programs rely on an evaluation procedure that combines information from several sources into a single summary statistic. • It shows the need for some capabilities required of complex learning systems such as: – Organized Storage of information – Generalization • • • • • • • Learning by taking Advice A computer can do very little without a program for it to run. Executing a program may not be such a simple matter. Learning by advice • A program called FOO. some interpreter or compiler must intervene to change the teacher’s instructions into code that the machine can execute directly. Suppose the program is written in high level language such as Prolog. which accepts advice for playing hearts. the computer is now able to do something it previously could not. • Rote learning does not involve any sophisticated problem-solving capabilities. After being programmed. • The ability to operationalize knowledge is critical for systems that learn from a teacher’s advice. the advice “fight for control of the center of the board” is useless unless the player can translate the advice into concrete moves and plans. When a programmer writes a series of instructions into a computer. such as “play high cards when it is safe to do so”. . In chess. • Game playing programs do this in their static evaluation functions in which a variety of factors such as piece advantage and mobility are combined into a single score reflecting the desirability of a particular board position. inserting the key. After a solution is found. At present. • • Learning by Macro-operators Sequences of actions that can be treated as a whole are called macro-operators. . SOAR is an example production system which uses chunking. The idea of chunking comes from the psychological literature on memory and problem solving. You are free to treat START-CAR as an atomic action. even though it really consists of several actions: sitting down. MACROP is just like a regular operator. starting it. • Samuel’s checkers program uses static evaluation function in the polynomial: c1t1 + c2t2 + … +c16 t16 • The t terms are the values of the sixteen features that contribute to the evaluation. chunking is inadequate for duplicating the contents of large directly-computed macro-operator tables. except that it consists of a sequence of actions. After each problem solving episode. but you need not plan about how to about starting the car. the chunks remain in memory. adjusting the mirror. it is often difficult to know a priori how much weight should be attached to each feature being used. As learning progresses. Substantial planning may go into choosing the appropriate route. it creates chunk. not just a single one. Learning by Chunking Chunking is a process similar in flavor to macro-operators. • • • • • • • • • The utility problem • While new search control knowledge can be of great benefit in solving future problems efficiently. and turning the key. • Features that appear to be good predictors of overall success will have their weights increased. When a system detects useful sequence of production firings.• In designing such programs. Its computational basis is in Production systems. • Considering a control rule amounts to seeing if its post conditions are desirable and seeing if its preconditions are satisfied. the c values will change. or MACROP. Example: suppose you are faced with the problem of getting to the downtown post office. and driving along a certain route. there are also some drawbacks. • The c terms are the coefficients that are attached to each of these values. • The learned control rules can take up large amounts of memory and the search program must take the time to consider each rule at each step during problem solving. Chunks learned during the initial stages of solving a problem are applicable in the later stages of the same problem-solving episode. while those that do not will have their weights decreased. ready for use in the next problem. the learning component takes the computed plan and stores it away as a macro-operator. Your solution may involve getting in your car. Macro-operators were used in the early problem solving system STRIPS. which is essentially a large production that does the work of an entire sequence of smaller ones. • One way of finding the correct weights is to begin with some estimate of the correct settings and then to let the program modify the settings on the basis of its experience. Restrict the definition to exclude these. Generalize th definition to include them. – Isolate a set of features that are relevant to the task domain. Its goal was to construct representations of the definitions of concepts in blocks domain. 3. it learned the concepts House. the classes it will use must be defined: – Isolate a set of features that are relevant to the task domain. Call that description the concept defintion. to a particular input. • While learned rules may reduce problem-solving time by directing the search more carefully. Ex: classifying animals. Begin with a structural description of one known instance of the concept. the name of a class to which it belongs. Winston’s Learning Program An early structural concept learning program.• This is a time consuming process. • If we only want to minimize the number of node expansions in the search space. Define each class by a weighted sum of values of these features. • But if we want to minimize the total CPU time required to solve a problem. various features can be such things as color. • • • • • Learning from Examples: Induction Classification is the process of assigning. • • • • • • Basic approach of Winston’s Program 1. the parameters can be measurements such as rainfall. we must consider this trade off. Before classification can be done. Define each class as a structure composed of these features. Their definition will depend on the use to which they are put. Examine the descriptions of near misses of the concept. length of neck etc The idea of producing a classification program that can evolve its own class definitions is called concept learning or induction. A near miss is an object that is not an instance of the concept in question but that is very similar to such instances. This program operates in a simple blocks world domain. . Ex: task is weather prediction. location of cold fronts etc. they may also increase problem-solving time by forcing the problem solver to consider them. then the more control rules we learn. For example. Tent and Arch. the better. Examine descriptions of other known instances of the concepts. The classes from which the classification procedure can choose can be described in a variety of ways. Classification is an important component of many problem solving tasks. 2. Version space consists of two subsets of the concept space. 4. ID3 uses iterative method to build up decision trees. Initialize G to contain one element Initialize S to contain one element: the first positive element. Accept new training example. first remove from G any descriptions that do not cover the example. output their values and halt. Decision Trees This is a third approach to concept learning. Inverse actions for negative example If S and G are both singleton sets. The other subset contains the most specific descriptions consistent with the training examples. so an initial idea is to keep an explicit list of those descriptions. preferring simple trees over complex ones. The version space is simply a set of descriptions. 3. Then update the set S to contain most specific set of descriptions in the version space that cover the example and the current elements of the S set. This is another approach to concept learning. This subset is called the window.• • • • • • • • • 1. If it is a positive example. Compute : A concept description that is consistent with all the positive examples and none of the negative examples. 2. then if they are identical. we start at the top of the tree and answer questions until we reach a leaf. The algorithm for narrowing the version space is called the Candidate elimination algorithm. It begins by choosing a random subset of the training examples. • • • • • • • Version spaces The goal of version spaces is to produce a description that is consistent with all positive examples but no negative examples in the training set. Decision tree for “Japanese economy car” DIAGRAM Explanation-Based Learning Learning complex concepts using Induction procedures typically requires a substantial number of training instances. What makes such single-example learning possible? The answer is knowledge. We don’t need to see dozens of positive and negative examples of fork( chess) positions in order to learn to avoid this trap in the future and perhaps use it to our advantage. But people seem to be able to learn quite a bit from single examples. where the specification is stored. • • • • . on the theory that simple trees are more accurate classifiers of future inputs. Algorithm: Candidate Elimination Given: A representation language and a set of positive and negative examples expressed in that language. To classify a particular input. ID3 is a program example for Decision Trees. The algorithm builds a decision tree that correctly classifies all examples in the win do. One subset called G contains most general descriptions consistent with the training examples . Version spaces work by maintaining a set of possible descriptions and evolving that set as new examples and near misses are presented. This explanation is expressed in terms that satisfy the operationality criterion. – A domain theory: A set of rules that describe relationships between objects and actions in a domain. • AM is written by Lenat and it worked from a few basic concepts of set theory to discover a good deal of standard number theory. Discovery • Learning is the process by which one entity acquires knowledge. One of the major activities of AM is to create new concepts and fill in their slots. More clearly than other kinds of learning. EBL • We can think of EBL programs as accepting the following as input: – A training example – A goal concept: A high level description of what the program is supposed to learn – An operational criterion. • Discovery is a restricted form of learning in which one entity acquires knowledge without the help of a teacher. problem solving. – Theory-Driven Discovery – Data Driven Discovery – Clustering AM: Theory-driven Discovery • Discovery is certainly learning. data intensive approach described in the last section toward this more analytical knowledge intensive approach. the domain theory is used to prune away all the unimportant aspects of the training example with respect to the goal concept. Usually that knowledge is already possessed by some number of other entities who may serve as teachers. • The explanation is then generalized. guided by a set of 250 heuristic rules representing hints about activities that are likely to lead to “interesting” discoveries. and then system’s performance is improved through the availability of this knowledge. • An EBL system attempts to learn from a single example x by explaining why x is an example of the target concept. • AM exploited a variety of general-purpose AI techniques. What is left is an explanation of why the training example is an instance of the goal concept. • AM uses Heuristic search.• Much of the recent work in machine learning has moved away from the empirical. • A number of independent studies led to the characterization of this approach as explanation-base learning(EBL).A description of which concepts are usable. (2) generalize • During the explanation step. • From this EBL computes a generalization of the training example that is sufficient to describe the goal concept. • In one run AM discovered the concept of prime numbers. • The next step is to generalize the explanation as far as possible while still describing the goal concept. • Explanation-based generalization (EBG) is an algorithm for EBL and has two steps: (1) explain. and also satisfies the operationality criterion. such a program would have to rely heavily on the problem-solving techniques. • Suppose that we want to build a program to discover things in maths. How did it do it? . It used a frame system to represent mathematical concepts. A better understanding of the science of scientific discovery may lead one day to programs that display true creativity. the stock market was a roller coaster. BACON holds the variables n and T constant. no class labeling are provided. in addition to a method for classifying instances. Analogy • Analogy is a powerful inference tool. – Last month. Langley et al presented a model of data-driven scientific discovery that has been implemented as a program called BACON ( named after Sir Francis Bacon. called ideal gas law. AUTOCLASS found meaningful new classes of stars from their infrared spectral data. . In one application. since the facts it discovered were previously unknown to astronomy. In clustering. First. the conservation of momentum and Joule’s law. This was an instance of true discovery by computer. Bacon: Data Driven Discovery AM showed how discovery might occur in theoritical setting. – Problems in electromagnetism are just like problems in fluid flow. n. AUTOCLASS uses statistical Bayesian reasoning of the type discussed. n. It created the concept of divisibilty and noticed that some numbers had very few divisors. BACON has been used to discover wide variety of scientifc laws such as Kepler’s third law. V. BACON’s discovery procedure is state-space search. Scientific discovery has inspired several computer models.• • • • • • • • • • • • • • • • • • • • • • – Having stumbled onto the natural numbers. AM explored operations such as addition. Clustering Clustering is very similar to induction. V and T. that relates these variables. Much more work must be done in areas of science that BACON does not model. multiplication and their inverses. Physicists have long known a law. a philosopher of science) BACON begins with a set of variables for a problem. p2 and p3. some variables are p. the volume V decreases. – Bill is like a fire engine. BACON notices that as the pressure increases. the amount of gas in moles. In Inductive learning a program learns to classify objects based on the labeling provided by a teacher. AUTOCLASS is one program that accepts a number of training cases and hypothesizes a set of classes.32 which is ideal gas law as shown by BACON. • Our language and reasoning are laden with analogies. the pressure on the gas. For any given case. and T the temperature of the gas. For all values. pV/nT = 8. Ohm’s law. The program must discover for itself the natural classes that exist for the objects. the volume of the gas. BACON is able to derive this law on its own.p. performing experiments at different pressures p1. For example in the study of the behavior of gases. the program provides a set of probabilities that predict into which classes the case is likely to fall. and f. • Neural Net Learning and Genetic Learning • Collections of idealized neurons were presented with stimuli and prodded into changing their behavior via forms of reward and punishment. • Humans often solve problems by making analogies to things they already understand how to do. • Perhaps mathematical theory will one day be used to quantify the use of such knowledge but this prospect seems far off. an exponential number of computational steps is required. • Inductive learning in particular has received considerable attention. • After all. to understand the first sentence above. partly as a result of the discovery of powerful new learning algorithms. Such hopes proved elusive. • For example. t. • Researchers hoped that by imitating the learning mechanisms of animals. • However. • The space of possible analogies is very large. the field of neural network learning has seen a resurgence in recent years. • Formally. • The complexity of learning a concept is a function of three factors: the error tolerance (h). produces and algorithm that will classify future examples correctly with probability 1/h. it is necessary to do two things: – Pick out one key property of a roller coaster. a device learns a concept if it can given positive and negative examples. then the concept is said to be learnable. given positive and negative examples of strings in some regular language. • If the number of training examples required is polynomial in h. learning occurs through a selection process that begins with a large population of random programs. • It is difficult to tell how such mathematical studies of learning will affect the ways in which we solve AI problems in practice. • This is no easy trick. can we efficiently induce the finite automation that produces all and only the strings in the language? The answer is no. namely that it travels up and down rapidly – Realize that physical travel is itself an analogy for numerical fluctuations. • Thus analogical reasoning is an important factor in learning by advice taking. they might build learning machines from very simple parts. . people are able to solve many exponentially hard problems by using knowledge to constrain the space of possible solutions. Formal Learning Theory • For example. • An AI program that is unable to grasp analogy will be difficult to talk to and consequently difficult to teach. the number of binary features present in the examples (t) and the size of the rule necessary to make the discrimination (f). Formal Learning Theory • Learning has attracted the attention of mathematicians and theoritical computer scientists. • While neural network models are based on a computational “brain metaphor”. of a number of other learning techniques make use of a metaphor based on evolution. • In this work.• Underlying each of these examples is a complicated mapping between what appear to be dissimilar concepts.
Copyright © 2024 DOKUMEN.SITE Inc.