Monday, December 14, 2009

Immortal Avatars - A very unique use of avatar technology



Sometimes I wonder if there will be any limits in the human imagination. Engineers at the University of Illinoy Chicago, are trying to immortalize human beings through the use of avatar technology. The realization of such attempt is the Alex the avatar(show above). Alex is the digital double of Alex Pothen the former director of the National Science foundation. All I can say is WOW!!! Just imagine in 10-20 years from now to be able to transfer yourself into an AI brain and to live forever as a digital entity. Would you do it if you had the opportunity? Only the idea creeps me out but future generations may consider it an acceptable practice. The whole concept reminds me of the movie the Lawnmower man where the main character becomes so intelligent through the use of Virtual Reality technology that manage to transfer himself into the WWW.

PopSImmci's Future Of: Imm
http://science.discovery.com/videos/popscis-future-of-immortal-avatars.html

Avatar Movie

                                        

I am truly shocked and amazed. Avatar is the BEST movie I have ever seen. The Na'vi characters are simply breath-taking and way out of the uncanny valley(see below for a definition). That's an important aspect of every system that integrates Virtual human technology. The more life-like a character is, the more believable it becomes and hence, it is easier for peopel to accept it in its assigned role (guide, actor, etc). Avatar is the most tangible evidence for that. So what can we (the poor Virtual human developers) do? How can we develop systems with Virtual Humans that people will accept? If you have access a computer with 80000 processors (that's what James Cameron used for the creation of avatar), you are in HEAVEN. If not, then what? No matter how beautiful and life-like your creation is, it will only go as far as the real-time engine can take it. I have reviewed several real-time engines in the past-years and nothing comes close to Avatar. Second Life has become popular despite its crapy graphics. But has anybody ever used Second Life for a serious application? There have been a number of research attempts for training people, but there is nothing serious for the common user (other than silly things to "kill" his/er time).

Def: The uncanny valley refers to the negative effect that is created when something approaches human (in appearance) but isn't quite there; it creates this creepiness. You can find an explanation of the term from James Cameron himself here: http://news.discovery.com/videos/tech-avatar-motion-capture-mirrors-emotions.html

Tuesday, December 1, 2009

Industry Survey

Below is a list of Character Engines I compiled a long time ago:

Open-source

1) The EMBR project (http://embots.dfki.de/EMBR/)

2) Ogre3D- www.ogre3d.com. An open Source real-time 3D engine but without any kind of web player for content delivery.

Real-Time engines

1) Haptek - www.haptek.com  The full suite of tools is the range of 7,0000-10,000 USD

2) Charamel - http://www.charamel.de/. A German avatar company, with very realistic characters but also very expensive tools. A very interesting product of this company, is the Charavirld that allows actual actors to controls Virtual characters in real-time. They ask 10,000 Euros for their main development platform along with the player. However, for research purposes they offer a 6 month demo license for only 300 Euros

3) QEDSOFT - http://www.qedsoft.com. A french company with a very interesting real-time 3D engine. Although I am not sure, their tools must be very expensive.

Real-time but without a real-time web player

1) DA Group - http://www.digital-animations.com/. The creators of Ananova. The company offers a range of real-time content creation tools but there is no player for real-time delivery.

2) Codebaby - http://www.codebaby.com/. A canadian company offering 3D characters through flash.

3) Cantoche - http://www.cantoche.com/. A french company with cartoon-like avatars again relying onto flash for content delivery.

4) VCom - http://www.vcom3d.com. An amazing technology for 3D character creations again using flash for content delivery

5) IMS Interactive - http://www.ims3d.com/ The company uses shockwave 3D for content delivery

6) Visage - http://www.visagetechnologies.com. A Swidish company with an interesting technology but without a web player for content delivery

7) Virtuoz- http://www.virtuoz.com. Another French company

8)SimGraphics - http://www.simg.com. A California company using a Wizard-Of-Oz aporoach for brining 3D characters to life.

Non real-time engines (Psedo 3D)

1) Guile 3D - www.guile3d.com. Recently the company created Denish a 3D photorealistic character with amazing visuals. There is no information for any kind of Web player for content delivery

2) Media Semantics - http://www.mediasemantics.com/. A very interesting and cheap technology using XML for character control and flash for content delivery

3) Gizmoz - http://www.gizmoz.com/. A photorealistic technology using flash for content delivery

4) http://www.karigirl.com. A virtual girlfriend

Face animation only

1) Crazytalk - http://www.reallusion.com/crazytalk/. A face-only creating tool using mainly flash for content delivery

2) http://www.lifemi.com/. Another face-only company using flash for web-content delivery

3) FaceFX-http://www.oc3ent.com/. A face-only tool mainly used for game development

Avatar communities

1) SecondLife http://secondlife.com/ Everyone knows secondlife

2) The Bluemars project (http://www.bluemarsonline.com/). In terms of graphics blue mars is far far better than SL. Download the installer (be carefull is massive 1,01 GB) and try it. The 3D worlds are simply AMAZING.... (that's me avatar in BM by the way)

3) http://www.imvu.com/. Very similar to Second Life with far more amazing graphics

4)Google Lively - http://www.lively.com. A rather unsuccesfull attempt of google to mimic secondlife

5) Soon Startrek online - http://www.startrekonline.com/

6) Entropia Universe. Mainly for Sci-Fi funs http://www.entropiauniverse.com/index.var

7) ActiveWorlds - http://www.activeworlds.com/

8) Kaneva - https://www.kaneva.com

9) The new Amsterdam - https://www.kaneva.com

Character Languages

Several attempts have been made to standarize avatar creation. Currently there are two major trends, one in Japan and the other in the Western World

1) Behaviour Markup language - http://wiki.mindmakers.org/projects:bml:main

2) Multimodal Presentation Markup Language 3D (MPML3D) -

http://research.nii.ac.jp/~prendinger/MPML3D/MPML3D.html

Wednesday, September 30, 2009

Haptek Characters

Haptek is the best example of how overpriced products can fail your business. Their real-time 3D engine, although originally designed for Windows 95 (can you believe that? 14 years ago) and its still the best in the market. I heavily modified their fullBod character for the needs of my project, and now comes with:
 
1) 2000 animations that covers almost every project needs.
2) Dynamic clothes pants, top and shoes. Character clothes are separate models animated to flow naturally with every character movements (even walking). Clothes have more than 2000 animations to achieve that. Animations and clothe took more than a year to complete.
3) Several textures for both top and pants.
4) A script parser that allows automatic creation and cleaning of Haptek scripts.


The Talos script parser as I call it, features the following:
 
a) Automatic creation of script ready to be executed by the Haptek engine. For example, Talos will convert the following text into a script that accurately animates the character, top, pants and shoes in sync with the text:
(open_tag) BOOKMARK mark=‘anim,suggest’ (close_tag) Can I suggest you to have a look on wall (open_tag)BOOKMARK mark=‘anim,show_sea’(close_tag) that goes down the bold cliff and towards the sea?

Note: Open_tag and close_tag stand for the standard opening and closing tags. For some reason I can not display them here, correctly.b) Accurate extraction of tags from already tagged texts. These tags (shown above in red) can easily be used in Talos to tag texts in any language.

c) Single/batch conversion of any text to .ogg format (the sound format used by the Haptek engine).
d) Ability to use SAPI 5.0 compatible tags in your texts. Usage of Loquendo compatible tags is also possible. These tags add realistic effects (e.g., sneezing, laughter) to Loquendo TTS voices (e.g., (http://tts.loquendo.com/ttsdemo/default.asp?page=id&language=en) but work only with the Loquendo voices.
e) Limited automatic tagging of texts. Talos script parser allow you to automatically and accurately tag a given text. Currently works only with a limited set of texts, but I am working hard to extend it.
If you want to use the character in your projects please email me at: virtual.guide.systems at googlemail.com. An example of a script created by the script parser:
#Haptek Version= 2.00 Name=LocC_HB.hap HapType= script FileType= text
## world It
##prereq= none
\clock [t= 0.0] \load [file= [sounds\LocC_HB.ogg]]
\clock [t= 0] \SetSwitch [figure= fullBod switch= during state= start]
\clock [t= 3] \SetSwitch [figure= fullBod switch= propably state= start]
\clock [t= 5.1] \SetSwitch [figure= fullBod switch= used state= start]
\clock [t= 9] \SetSwitch [figure= fullBod switch= when state= start]
\clock [t= 11] \SetSwitch [figure= fullBod switch= retake state= start]
\clock [t= 13.1] \SetSwitch [figure= fullBod switch= converted state= start]
\clock [t= 18] \SetSwitch [figure= fullBod switch= retake state= start]
\clock [t= 21] \SetSwitch [figure= fullBod switch= and_the state= start]
\clock [t= 23] \SetSwitch [figure= fullBod switch= become_again state= start]
\clock [t= 26.4] \SetSwitch [figure= fullBod switch= after state= start]
\clock [t= 31] \SetSwitch [figure= fullBod switch= while state= start]
\clock [t= 35] \SetSwitch [figure= fullBod switch= today state= start]
\clock [t= 38.75] \SetSwitch [figure= fullBod switch= from_the state= start]
\clock [t= 40.85] \SetSwitch [figure= fullBod switch= name state= start]
\clock [t= 42.7] \SetSwitch [figure= fullBod switch= contradictory2 state= start]
\clock [t= 44.6] \SetSwitch [figure= fullBod switch= exist_near state= start]
\clock [t= 48] \SetSwitch [figure= fullBod switch= large2 state= start]
\clock [t= 51] \SetSwitch [figure= fullBod switch= find_series state= start]
\clock [t= 52.8] \SetSwitch [figure= fullBod switch= important state= start]
\clock [t= 57] \SetSwitch [figure= fullBod switch= whole state= start]
\clock [t= 62] \SetSwitch [figure= blackboard switch= enter state= on]
\clock [t= 62] \settexture [figure= blackboard tex= museum1.jpg]
\clock [t= 63.9] \SetSwitch [figure= fullBod switch= museum1 state= start]
\clock [t= 66.1] \settexture [ figure= blackboard tex= museum2.jpg]
\clock [t= 66.1] \SetSwitch [figure= fullBod switch= museum2 state= start]
\clock [t= 68.3] \SetSwitch [figure= fullBod switch= as_well state= start]
\clock [t= 78] \SetSwitch [figure= blackboard switch= enter state= close]
\clock [t= 80] \SetSwitch [figure= fullBod switch= plateia_9 state= start]
\clock [t= 87] \SetSwitch [figure= fullBod switch= contradictory2 state= start]
\clock [t= 91] \SetSwitch [figure= fullBod switch= propably state= start]
\clock [t= 98] \SetSwitch [figure= fullBod switch= named state= start]
\clock [t= 101] \SetSwitch [figure= fullBod switch= no_help state= start]
\clock [t= 105] \SetSwitch [figure= fullBod switch= plateia_10 state= start]
\clock [t= 109] \SetSwitch [figure= fullBod switch= which state= start]
\clock [t= 114] \SetSwitch [figure= fullBod switch= which state= start]
\clock [t= 118] \SetSwitch [figure= fullBod switch= point state= start]
\clock [t= 124] \SetSwitch [figure= fullBod switch= of_course_left state= start]
\clock [t= 132] \SetSwitch [figure= fullBod switch= if_else state= start]
\counter [name= [LocC_HBcounter] i0=0 f0= 1]

Current Developments - Computer Vision

OpenCV is an open source library for robotic vision from Intel. Using its head-tracking functionality, I was able to construct a simple algorithm that detects, the location of the user's head in the 3D environment (right,left,far-away, close, etc). By using this information, you can then implement some really fantastic scenarios in any programming environment. For example in my prototypes I have the following scenarios:

face_detection

Face recognition is also possible with OpenCV but it doesn't offer much more than face detection. A good idea would be to integrate both into a single component. Ideally such a component would lock onto a face, and provide its coordinates to the application. Finally somehow such a component should also integrate emotion recognition abilities, an important attribute for Virtual Humans.

A VH looking at the direction of my face:

6

Natural Language Processing

 
VPF is a web service where "you can create virtual people for a variety of uses. Currently the most common use of the Virtual People Factory is to create Virtual Patients for Medical and Pharmacy education.".
Other free VH hosting services are the following:


Although I am sure you can cite more advanced system that these, compared to just mere claims, these three systems are: a) publicly available, b) fully-functional c) free-to-use. Although all three systems, do not actually process language, VPF is by far the most effective of all. As VPF people were kind enough to provide me with their script-matching algorithm and a fully functional API I was able to address this limitation.
 
VPF currently relies on a score-matching algorithm that determines the likelihood a user's input to match with a trigger (i.e., a phrase input in the system by the content developer). In my approach simple keyword matching is the last stage of processing, if all other stages fail. In particular the algorithm in:

Stage 1.

Compares the user's input and the trigger returned by the VPF for common tokens and POS. If comparison is successful (all tokens match), returns the answer associated with the VPF trigger. If comparison fails (keywords are not equal) it moves on to the next stage.

Stage2.

Conducts a series of predicate tests on the input against a DB of predefined phrases. If comparison is successful it passes the phrases to VPF for 100% matching. If comparison fails, it moves on to the final stage where the system reverts back to the simple VPF keyword matching

Stage 3.

VPF keyword Matching
Semantic processing and comparison was also implemented as an additional stage in the algorithm but as it has some problems that need to be addressed first, it was decided not be used in the final prototype. The plan is to integrate the VPF script matching algorithm to the existing one , and create a four-stage approach (with Semantic Processing included) that will enable the NLU component of Talos authoring tool to full process the user's input before matching it with a trigger.
Another fully developed idea for Talos is the creation of a dialogue manager based on HTN (Hierarchical Task Networks) but as it is currently only on paper, I would prefer not to discuss it any further.
 

Code for Stage 1

Sub Syntactic_Keyword_Processing(ByVal userinput As String)
        'we need to load the tagger first
        Try
            tagger_counter += 1
            If tagger_counter = 1 Then
                load_tagger()
            Else
                'tagger is loaded don't loaded again
            End If
            'remove panctuation and contractions first
            Dim contractions As List(Of String) = New List(Of String)(New String() _
           {"didn't", "'ll", "'re", "lets", "let's", "'ve", "'m", "won't", "'d", "'s", "n't"})
            Dim word_contractions As List(Of String) = New List(Of String)(New String() _
        {"did not", "will", "are", "let us", "let us", "have", "am", "will not", "would", "is", "not"})
            Dim end_line As String = "[\,\?\!\.]|\s+$"
            Dim start_line As String = "[\,\?\!]|^\s+"
            Dim userinput2 As String = Regex.Replace(userinput, end_line, "")
            Dim userinput3 As String = Regex.Replace(userinput2, start_line, "")
            'remove contractions
            For Each item As String In contractions
                If userinput.Contains(item) Then
                    Dim cont_position As Integer = contractions.IndexOf(item)
                    Dim what_word As String = word_contractions.Item(cont_position)
                    userinput3 = Regex.Replace(userinput3, item, Space(1) & what_word)
                    Exit For
                End If
            Next
            Dim ask_step As String = "step_1"
            Select Case ask_step
                Case "step_1"
                    hr.addUserInput(userinput)
                    Dim response As String = hr.findResponses(Current_Script)
                    Dim index As Integer = hr.findMostRelevantResponse()
                    VPF_trigger = hr.getResponseMatchedSentence(index)
                    If VPF_trigger <> "" Then
                        'Perform syntactic comparison between the input and the trigger
                        syntactic_keyword_comparison(userinput3)
                        If comparison = "Sucessful" Then
                            _answer = hr.getResponseMatchedSpeech(index)
                            input_list.Clear()
                            trigger_list.Clear()
                        Else
                            ask_step = "step_2"
                        End If
                    End If
            End Select
            Select Case ask_step
                Case "step_2"
                    If new_question <> "" Then
                        hr.addUserInput(new_question)
                        Dim response As String = hr.findResponses(Current_Script)
                        Dim index As Integer = hr.findMostRelevantResponse()
                        _answer = hr.getResponseMatchedSpeech(index)
                        input_list.Clear()
                        trigger_list.Clear()
                    End If
            End Select
        Catch ex As Exception
            output.Clear()
            output.Text += ex.Message + Environment.NewLine
        End Try
    End Sub

Tuesday, September 29, 2009

Current Developments - Talos Authoring tool

Talos: A Virtual Human Authoring tool

Talos is an authoring environment designed to enable content developers to create Virtual Human systems for the domain of mobile guidance. Although the system has been designed with the paper guide book creators in mind, it can also be used in other application domains as well.The final design of Talos is very complex and includes several different modules. I think that it will take a programming team of 5 people 1 year of intense work to complete.

Talos come as an idea when I realised that there are virtually no tools in the market that would enable me to rapidly create the prototypes I needed for my research work. From the final design of Talos, I was only able to implement some of its ideas, those that needed to implement my final prototype systems.


1 ) A script parser to automatically create Haptek scripts. A screen-shot of the parser is show below:


The tool is described in more detail in the topic "Current Developments - Haptek characters". The script parser of the full Talos environment follows a similar approach, but in a fully automated and real-time fashion. Talos will accept pure text as input, and generate full character performances as an output. Of course the problem is this approach, is how to make the character pointing correctly to objects in its background. This requires the character to have knowledge of its environment. My design in this area is incomplete. A possible solution, would be to divide the background of the character into segments and associate each segment with a keyword. I need to do more research on this area.
2) A simple UI (User Interface tool) that enabled me to create without too much effort, the AIML KBs needed for my final prototype systems. In more detail the UI includes:
a) An AIML\XML KB creator.
b) An AIML creator for existing question-sets (e.g., questions that you may already have in XML format)
c) A translator for both AIML/XML KBs
d) A scene character modifier where you can modify various scene and character settings (e.g., probs, backgrounds, etc)
e) A GPS integration module, where you can assign Long and Lat coordinates to various scripts and tests them in real-time
f) A speech recognition module where you can load a grammar and test it
g) A Cyc-creator module. This module was supposed to automate the creation of CYC queries suitable for insertion into AIML scripts. However, the API of ResearchCyc is extremely tough to crack (perhaps because it is pure Java) and I ended up with just a simulation.
From the above I only used "a","b",and "c" in the creation of my final prototypes. The rest of the features were not of any value to the actual development, but they lead to several improvement in the final Talos design. The UI tool is available for free. If you want to experiment with Virtual Humans, AIML and XML KB's, making location-sensitive scripts for your characters that's the best way to start.

A screenshot of the UI tool


Current Developments - Haptek Clothing

After spending several months building these clothes I realised that they are too complex to be handled by the limited hardware of the UMPC device. Hence, I decided to give them for free. The clothes come with several textures, animations, morphs, etc. The clothes are designed for the character(called the guide) I use in my systems. I will possibly release the character once I am done with the evaluation stage of my project.

You can find them here:

Monday, September 28, 2009

Current Developments - MGUIDE Prototypes

Prototype 1:

 

1. A 3D Agent with more than 2000 gestures and several face expressions. The agent uses this body and facial language to augment the location presentation and navigation instructions provided.
2. A 3D agent that is aware of its environment and can dynamically evoke the attention of the user during a presentation for a location. For example if the user is poking around for a certain period of time, it can request the attention of the user to the presentation.
3. A 3D agent with fully dynamic 3D clothing with changeable textures during system configuration from the user.
4. A 3D agent capable of using additional multimedia information (on a 3D board) to enhance further the transmitted information (mainly in the presentation mode).
5. A Finite State Machine (FSM) dialogue manager capable of dynamically displaying questions based on the user’s selection and the current context. The questions cover a very broad range of the possible questions/clarifications that a user can ask after a presentation for a location.
6. 12 information scenarios based on what the castle has to offer to the potential visitor both culturally and historically. The total content (presentations and questions) is more than 10 hours long.
7. Designed but not implemented, customization of the agent voice.

Screenshot 1: The animated agent points to an image on the 3D board


Screenshot 2: The animated agent gestures as she speaks

Prototype 2:

Similar to the first system but with one additional feature - QR Code based navigation. A QR-Code is a bar-code capable of storing up to 4,296 characters in a simple geometrical shape. The system uses a QR-Code recognition algorithm to recognize the locations that the user is currently in (the user must photograph the QR-Code in order for the system to process it).
 

Prototype 3:

Similar to the other systems but it focuses only on the provision of navigation instructions. At the moment the system uses only photographs of landmarks but other more automated methods (e.g., GPS positing) have also been considered.
 

Prototype 4:

It features one information scenario only, along with the characteristics of the first prototype and:
2) Dynamic changing voice recognition grammars that allows the user to interact with the system using only h/er voice.
3) Natural language processing abilities using the Stanford/Link Parser. The system utilizes a novel algorithm that allows it to conduct predicate analysis and score keyword matching (if the first stage fails).The second stage of analysis is conducted by a secondary web system (Virtual People Factory).
4) A highly experimental search/comparison algorithm utilizing Semantic interpretation of the user’s input.

Screenshot 3: The system preferences of prototype 3 (Natural Language Processing version)


Sunday, September 27, 2009

Development of Virtual Humans - Early Years

Domain: Mobile Guides

After the first 1 and 1/2 years of my M.Phil work, I finally found heaven. The area of mobile guides is a relatively simple domain but with much to offer. Tourism in several countries (e.g., Greece) is considered a vital part of the economy. In 2005 I begun working on a system that a) provides navigation instructions in a specific area (i.e., a medieval castle in Greece) b) provides information on selected locations of the castle (e.g., historical information). The system was authored in Macromedia Director MX 2004 multimedia environment and used:

a) The AIMLPad ActiveX control
b) The Maddy character from DAGroupPlc.
c) A location-sensing algorithm that triggered specific scripts at specific coordinates in the castle.
d) A speaker independent ASR engine provided by Babel technologies ().

The system accepted input through a menu of static English phrases and buttons. It generated no output apart from the minor movement of the character's body on the screen. After field testing the system, I realised that a) the GPS component because the rocky environment of the castle was not working. b) The ASR engine could not distinguish the user's voice from the noise in the environment. These two components were removed and a formal evaluation took place in November 2005. The goal of the evaluation was to investigate how such an agent would affect the accessibility and usability of a mobile guide system.

 

In parallel with the developments of the prototype I also had the opportunity to investigate Research-Cyc the most advanced Knowledge-Base on the planet. In parallel with the developments of the prototype I also had the opportunity to mess around with Research-Cyc, the most advanced Knowledge-Base on the planet. The design of the system is shown below.
 
 

The idea was to built a KB that would contain all the castle knowledge and integrated with the prototype system.Back then, this was an ideal solution for me, as with a simple fact in KB (Yannis Ritsos is a poet) the system could answer questions like:
 
“Who is Yannis Ritsos”
“What is the profession of Yannis Ritsos” (poet is a profession)
“Name a poet of Greece” (Greece is a nation and Greeks its inhabitants)

I used the CynD AIML Interpreter. The system uses specially modified AIML to access the KB NL facilites, query the KB and then return the results. For example:
Given the below AIML code and the sentence:
WHAT COUNTRIES BORDER *
(#$isa #$Country )
Consulting cyc returns: What countries border ?
 
The system calls the ResearchCyc parser (through the custom cycquestion tag), translate the input into its CycL (the language in which the KB is written) representations:
((#$and
(#$isa ?WHAT-ONE-1 #$Country)
(#$politiesBorderEachOther #$Greece ?WHAT-ONE-1))
(#$and
(#$bordersOn #$Greece ?WHAT-ONE-1)
(#$isa ?WHAT-ONE-1 #$Country))
(#$and
(#$isa ?COUNTRY #$IndependentCountry)
(#$politiesBorderEachOther #$Greece ?COUNTRY))
(#$and
(#$bordersOn #$Greece ?COUNTRY)
(#$isa ?COUNTRY #$IndependentCountry)))
And then iterate through the list of parses, and try to query the KB to get a possible answer. In the particular example the system returns:
 
Consulting Cyc returns:
0 (((?COUNTRY . Bulgaria))((?COUNTRY . Macedonia))((?COUNTRY . Albania))((?COUNTRY . Turkey))).
1. (((?COUNTRY . Bulgaria))((?COUNTRY . Macedonia))((?COUNTRY . Albania))((?COUNTRY . Turkey)))
2. (((?WHAT-ONE-1 . Bulgaria))((?WHAT-ONE-1 . Macedonia)) ((?WHAT-ONE-1 . Albania))((?WHAT-ONE-1 . Turkey)))
3.(((?WHAT-ONE-1 . Bulgaria))((?WHAT-ONE-1 . Macedonia))((?WHAT-ONE-1 . Albania))((?WHAT-ONE-1. Turkey)))
Finally it calls the NL generator of the KB and translate the results back to English:
 
Consulting Cyc returns: 0 Bulgaria Macedonia Albania Turkey. 1 Bulgaria Macedonia Albania Turkey. 2 . Bulgaria Macedonia Albania Turkey 3. Bulgaria Macedonia Albania Turkey

Problems:

1) CynD will try to query the KB for every single interpretation of the user's question. In the above example, the KB returned the same answer for all parser interpretations. Some kind of tool was needed that would automatically generate AIML categories incorporating all possible parser interpretations, allow the system's author to select the one that produced the answer he expected and make the system's answer more natural. There was no time for that.
 
2) In an open Q-A dialogue with a system in my domain (mobile guides), the user could ask questions about random objects that he may encounter in his path-something like "What is that?" for a church or a particular building. The disambiguation of "that" to the actual object name and location in the physical environment is an extremely difficult problem -- one of the many that NLP has to face in the future. At the moment, there is no location technology that can allow a computer to distinguish between two objects located at a close distance in a psychical environment.
 
3) My work focuses on animated agents. If the dialogue with the system is unknown... how could I possibly generate the proper animations for each output of the CyND engine?
 
5) Hardware resources. Such a system demands massive hardware resources... resources in a M.Phil project are always scarce.

Lessons learned:

1) The pilot evaluation suggested a) several improvements in the experimental design b) several design improvements of the prototype.

2) The need to produce more user-insights. This could only be accomplished, by developing more prototypes varying different attributes of virtual humans (e.g., competence, modality of communication, etc.).

3) The need for technological differentiation. Its impossible to continue the simplistic approach of this prototype, as the research will lead me nowhere. The domain of VH is extremely comparative area with groups of researchers that produce massive contributions to both technology and knowledge every single month.

4) My rather bumpy experience in developing the prototype also suggested the need for an authoring tool that would enable non-programmers to rapidly develop complex Virtual Human systems.


Saturday, September 26, 2009

Development of Virtual Humans - Early Years

The idea, was to develop and evaluate a conversational sales assistant that would assist users in all stages of the CBB (Consumer Buying Behaviour Model) - a model used to describe the entire range of consumer needs in on-line commerce environments. The algorithm alone, would have been enough for a M.Phil contribution. Hence, with great enthusiasm, I started my work at Middlesex. The first task in my research list, was the development of a fully-working prototype. I designed a rather complex system (shown below along with an explanation) and begun looking for proper authoring tools.

2 

A discourse manager module can not be evaluated if it is not part of a general dialogue system . The organization of system needed to test this module is shown below: Starting at the top of the figure, the user communicates with the system textually through a standard desktop PC. The input is parsed to a sequence of speech acts based on the syntactic and semantic form of the utterance and sent to the discourse manager. The discourse manager sets up its initial conversation state and passes the sequence to the context for identification of any lexical information (e.g. names, features etc), and then hands the acts to the reference resolution. The component has two duties. First, it assigns the correct referent to anaphoric referring expressions (e.g. the pronoun her in the sentence Anne asked Edward to pass her the salt refers to Anne). Then, if it is necessary it does illocutionary remapping of the speech acts assigned by the parser as needed to fit discourse and reference cues. For instance, an utterance that consist of a REJECT “no” followed by a REQUEST “go via bank” will have the letter REQUEST remapped into the REJECT; it is essentially the content of the REJECT, not a separate REQUEST. After this processing, reference returns the speech act(s), now in an internal format, to the discourse manager for further disposition. These speech acts however, are only the surface acts (i.e., literal acts) of the utterance. Oftentimes, a surface speech act has a deeper intended or indirect meaning (i.e. an illocutionary act). Take, for example, the utterance, “Can you pass the salt?” which on the surface looks like a QUERY speech act (i.e., asking about your ability to pass the salt). However, the indirect speech act is most likely a REQUEST (i.e. a request to pass me the salt). The discourse manager detects and interprets indirect speech acts through an intention recognition process done in conjunction with the Task Manager. More specifically, it sends each of the postulated acts to the Task Manager and asks whether this interpretation would ‘make sense’ given the current domain and planning context. Each of these postulates are then scored by the Task Manager which allows the Discourse Manager (DM) to find the correct interpretation of the surface speech act (i.e., the intended problem solving act). In addition, with its indirect speech acts duties, the discourse manager must convert the speech act semantics from a linguistic knowledge representation (produced by the parser) to a domain specific, problem-solving knowledge representation (used by the Task Manager for intention recognition). For example, the linguistic semantics of the utterance “Buy the HP Palm from PC-World” might be:

(buy1

:agent SYSTEM

: theme HPPalm

: goal PC-World

The domain specific, problem-solving knowledge representation would correspond to some action in the domain (purchase) with some corresponding parameters. For example:

(purchase

:product HPPalm

:from-merchant PC-World

:price ?p)

The ?p is a variable, since the price was not explicitly mentioned in the utterance.

The output of the task manager and the interface agent is a set of surface speech acts that the system wants to communicate to the user. The generation module achieves this communication using the graphical user display. This module also uses prioritized rules to match requested speech acts to appropriate means of expressing them. For example a request to the generator to inform user about a specific product will result in telling the display to show the image of the product, and simultaneously provide a textual description. The above architecture is by no means complete. We suspect that several modifications to the existing components and addition of new ones can be made during the course of development. However, it serves as a good example of the great complexity of the project.

I spent several months search the WWW. I have a massive URL library from companies that probably no longer exist. An example is: http://www.agentscape.de/, that asked me 15,000 USD for their authoring tools. To that, add, a) a University refused to spent any more money on me (it was already paying me a scholarship) b) a supervisor that kept insisting to follow that path, and you end-up with the perfect recipe for career destruction. Under the massive pressure, instead of quitting, I decided to move to a less complex domain and area of research. It was clear, that this project was better suited for a team of researchers with a perhaps unlimited budget. When I think about it today, it is still unknown, why my supervisor failed to see something so obvious!!!!

Friday, September 25, 2009

Development of Virtual Humans - Early Years

I think, its a good idea to start this post, with a simple, non-technical definition for a virtual human. A virtual human is an intelligent system capable of providing and accepting information through a full-range of human modalities (e.g., speech,gesture, face expressions, etc).

Constructing an intelligent VH with a general intelligence is a scary idea that should never be pursued. Instead I am more fun of the the robot-slaves idea, i.e., virtual humans (in physical or holographic form) that ONLY look intelligent in a specific domain (e.g., tour-guides, sales, etc)

I initially started working with VH back in 2002, during my Master's degree. The demo I constructed, the e-briefing room is shown below, along with a brief description of its functionality:

The “e-Briefing Room” service provides a tool for effectively educate customers with complex and high-end products on the Web. This vehicle allow customers to fully interact with three-dimensional (3D) models of products online, and also access personalized services on demand, which will be provided by a three-dimensional (3D) talking virtual sales-assistant.Through this interactive technology, the e-briefing room makes it easy for customers to access information on demand. Start your experience by directly selecting a product category of your choice, or activate Derek to listen a detail presentation about the functionalities of the service.

The demo was extremely simple (but... I did it only for my Master's degree). Its only difference from a static web-page, was the 3D head (from an Australian company called famous3D). Back in those days I wasn't even aware of the existence of ALICE But once you got the bug of VH technology you can not easily stop. The simple "e-briefing room" was enough to start me wondering what could be next in the exciting world of Virtual Humans.

In 2003 I was given the opportunity to study for an M.Phil degree at Middlesex University in London. My decision was to continue working on Virtual Humans for electronic commerce.