Saturday, September 26, 2009

Development of Virtual Humans - Early Years

The idea, was to develop and evaluate a conversational sales assistant that would assist users in all stages of the CBB (Consumer Buying Behaviour Model) - a model used to describe the entire range of consumer needs in on-line commerce environments. The algorithm alone, would have been enough for a M.Phil contribution. Hence, with great enthusiasm, I started my work at Middlesex. The first task in my research list, was the development of a fully-working prototype. I designed a rather complex system (shown below along with an explanation) and begun looking for proper authoring tools.

2 

A discourse manager module can not be evaluated if it is not part of a general dialogue system . The organization of system needed to test this module is shown below: Starting at the top of the figure, the user communicates with the system textually through a standard desktop PC. The input is parsed to a sequence of speech acts based on the syntactic and semantic form of the utterance and sent to the discourse manager. The discourse manager sets up its initial conversation state and passes the sequence to the context for identification of any lexical information (e.g. names, features etc), and then hands the acts to the reference resolution. The component has two duties. First, it assigns the correct referent to anaphoric referring expressions (e.g. the pronoun her in the sentence Anne asked Edward to pass her the salt refers to Anne). Then, if it is necessary it does illocutionary remapping of the speech acts assigned by the parser as needed to fit discourse and reference cues. For instance, an utterance that consist of a REJECT “no” followed by a REQUEST “go via bank” will have the letter REQUEST remapped into the REJECT; it is essentially the content of the REJECT, not a separate REQUEST. After this processing, reference returns the speech act(s), now in an internal format, to the discourse manager for further disposition. These speech acts however, are only the surface acts (i.e., literal acts) of the utterance. Oftentimes, a surface speech act has a deeper intended or indirect meaning (i.e. an illocutionary act). Take, for example, the utterance, “Can you pass the salt?” which on the surface looks like a QUERY speech act (i.e., asking about your ability to pass the salt). However, the indirect speech act is most likely a REQUEST (i.e. a request to pass me the salt). The discourse manager detects and interprets indirect speech acts through an intention recognition process done in conjunction with the Task Manager. More specifically, it sends each of the postulated acts to the Task Manager and asks whether this interpretation would ‘make sense’ given the current domain and planning context. Each of these postulates are then scored by the Task Manager which allows the Discourse Manager (DM) to find the correct interpretation of the surface speech act (i.e., the intended problem solving act). In addition, with its indirect speech acts duties, the discourse manager must convert the speech act semantics from a linguistic knowledge representation (produced by the parser) to a domain specific, problem-solving knowledge representation (used by the Task Manager for intention recognition). For example, the linguistic semantics of the utterance “Buy the HP Palm from PC-World” might be:

(buy1

:agent SYSTEM

: theme HPPalm

: goal PC-World

The domain specific, problem-solving knowledge representation would correspond to some action in the domain (purchase) with some corresponding parameters. For example:

(purchase

:product HPPalm

:from-merchant PC-World

:price ?p)

The ?p is a variable, since the price was not explicitly mentioned in the utterance.

The output of the task manager and the interface agent is a set of surface speech acts that the system wants to communicate to the user. The generation module achieves this communication using the graphical user display. This module also uses prioritized rules to match requested speech acts to appropriate means of expressing them. For example a request to the generator to inform user about a specific product will result in telling the display to show the image of the product, and simultaneously provide a textual description. The above architecture is by no means complete. We suspect that several modifications to the existing components and addition of new ones can be made during the course of development. However, it serves as a good example of the great complexity of the project.

I spent several months search the WWW. I have a massive URL library from companies that probably no longer exist. An example is: http://www.agentscape.de/, that asked me 15,000 USD for their authoring tools. To that, add, a) a University refused to spent any more money on me (it was already paying me a scholarship) b) a supervisor that kept insisting to follow that path, and you end-up with the perfect recipe for career destruction. Under the massive pressure, instead of quitting, I decided to move to a less complex domain and area of research. It was clear, that this project was better suited for a team of researchers with a perhaps unlimited budget. When I think about it today, it is still unknown, why my supervisor failed to see something so obvious!!!!

0 comments:

Post a Comment