Monday, December 14, 2009
Immortal Avatars - A very unique use of avatar technology
Avatar Movie
I am truly shocked and amazed. Avatar is the BEST movie I have ever seen. The Na'vi characters are simply breath-taking and way out of the uncanny valley(see below for a definition). That's an important aspect of every system that integrates Virtual human technology. The more life-like a character is, the more believable it becomes and hence, it is easier for peopel to accept it in its assigned role (guide, actor, etc). Avatar is the most tangible evidence for that. So what can we (the poor Virtual human developers) do? How can we develop systems with Virtual Humans that people will accept? If you have access a computer with 80000 processors (that's what James Cameron used for the creation of avatar), you are in HEAVEN. If not, then what? No matter how beautiful and life-like your creation is, it will only go as far as the real-time engine can take it. I have reviewed several real-time engines in the past-years and nothing comes close to Avatar. Second Life has become popular despite its crapy graphics. But has anybody ever used Second Life for a serious application? There have been a number of research attempts for training people, but there is nothing serious for the common user (other than silly things to "kill" his/er time).
Def: The uncanny valley refers to the negative effect that is created when something approaches human (in appearance) but isn't quite there; it creates this creepiness. You can find an explanation of the term from James Cameron himself here: http://news.discovery.com/videos/tech-avatar-motion-capture-mirrors-emotions.html
Tuesday, December 1, 2009
Industry Survey
Below is a list of Character Engines I compiled a long time ago:
Open-source
1) The EMBR project (http://embots.dfki.de/EMBR/)
2) Ogre3D- www.ogre3d.com. An open Source real-time 3D engine but without any kind of web player for content delivery.
Real-Time engines
1) Haptek - www.haptek.com The full suite of tools is the range of 7,0000-10,000 USD
2) Charamel - http://www.charamel.de/. A German avatar company, with very realistic characters but also very expensive tools. A very interesting product of this company, is the Charavirld that allows actual actors to controls Virtual characters in real-time. They ask 10,000 Euros for their main development platform along with the player. However, for research purposes they offer a 6 month demo license for only 300 Euros
3) QEDSOFT - http://www.qedsoft.com. A french company with a very interesting real-time 3D engine. Although I am not sure, their tools must be very expensive.
Real-time but without a real-time web player
1) DA Group - http://www.digital-animations.com/. The creators of Ananova. The company offers a range of real-time content creation tools but there is no player for real-time delivery.
2) Codebaby - http://www.codebaby.com/. A canadian company offering 3D characters through flash.
3) Cantoche - http://www.cantoche.com/. A french company with cartoon-like avatars again relying onto flash for content delivery.
4) VCom - http://www.vcom3d.com. An amazing technology for 3D character creations again using flash for content delivery
5) IMS Interactive - http://www.ims3d.com/ The company uses shockwave 3D for content delivery
6) Visage - http://www.visagetechnologies.com. A Swidish company with an interesting technology but without a web player for content delivery
7) Virtuoz- http://www.virtuoz.com. Another French company
8)SimGraphics - http://www.simg.com. A California company using a Wizard-Of-Oz aporoach for brining 3D characters to life.
Non real-time engines (Psedo 3D)
1) Guile 3D - www.guile3d.com. Recently the company created Denish a 3D photorealistic character with amazing visuals. There is no information for any kind of Web player for content delivery
2) Media Semantics - http://www.mediasemantics.com/. A very interesting and cheap technology using XML for character control and flash for content delivery
3) Gizmoz - http://www.gizmoz.com/. A photorealistic technology using flash for content delivery
4) http://www.karigirl.com. A virtual girlfriend
Face animation only
1) Crazytalk - http://www.reallusion.com/crazytalk/. A face-only creating tool using mainly flash for content delivery
2) http://www.lifemi.com/. Another face-only company using flash for web-content delivery
3) FaceFX-http://www.oc3ent.com/. A face-only tool mainly used for game development
Avatar communities
1) SecondLife http://secondlife.com/ Everyone knows secondlife
2) The Bluemars project (http://www.bluemarsonline.com/). In terms of graphics blue mars is far far better than SL. Download the installer (be carefull is massive 1,01 GB) and try it. The 3D worlds are simply AMAZING.... (that's me avatar in BM by the way)
3) http://www.imvu.com/. Very similar to Second Life with far more amazing graphics
4)Google Lively - http://www.lively.com. A rather unsuccesfull attempt of google to mimic secondlife
5) Soon Startrek online - http://www.startrekonline.com/
6) Entropia Universe. Mainly for Sci-Fi funs http://www.entropiauniverse.com/index.var
7) ActiveWorlds - http://www.activeworlds.com/
8) Kaneva - https://www.kaneva.com
9) The new Amsterdam - https://www.kaneva.com
Character Languages
Several attempts have been made to standarize avatar creation. Currently there are two major trends, one in Japan and the other in the Western World
1) Behaviour Markup language - http://wiki.mindmakers.org/projects:bml:main
2) Multimodal Presentation Markup Language 3D (MPML3D) -
Wednesday, September 30, 2009
Haptek Characters
(open_tag) BOOKMARK mark=‘anim,suggest’ (close_tag) Can I suggest you to have a look on wall (open_tag)BOOKMARK mark=‘anim,show_sea’(close_tag) that goes down the bold cliff and towards the sea?
Note: Open_tag and close_tag stand for the standard opening and closing tags. For some reason I can not display them here, correctly.b) Accurate extraction of tags from already tagged texts. These tags (shown above in red) can easily be used in Talos to tag texts in any language.
#Haptek Version= 2.00 Name=LocC_HB.hap HapType= script FileType= text## world It##prereq= none\clock [t= 0.0] \load [file= [sounds\LocC_HB.ogg]]\clock [t= 0] \SetSwitch [figure= fullBod switch= during state= start]\clock [t= 3] \SetSwitch [figure= fullBod switch= propably state= start]\clock [t= 5.1] \SetSwitch [figure= fullBod switch= used state= start]\clock [t= 9] \SetSwitch [figure= fullBod switch= when state= start]\clock [t= 11] \SetSwitch [figure= fullBod switch= retake state= start]\clock [t= 13.1] \SetSwitch [figure= fullBod switch= converted state= start]\clock [t= 18] \SetSwitch [figure= fullBod switch= retake state= start]\clock [t= 21] \SetSwitch [figure= fullBod switch= and_the state= start]\clock [t= 23] \SetSwitch [figure= fullBod switch= become_again state= start]\clock [t= 26.4] \SetSwitch [figure= fullBod switch= after state= start]\clock [t= 31] \SetSwitch [figure= fullBod switch= while state= start]\clock [t= 35] \SetSwitch [figure= fullBod switch= today state= start]\clock [t= 38.75] \SetSwitch [figure= fullBod switch= from_the state= start]\clock [t= 40.85] \SetSwitch [figure= fullBod switch= name state= start]\clock [t= 42.7] \SetSwitch [figure= fullBod switch= contradictory2 state= start]\clock [t= 44.6] \SetSwitch [figure= fullBod switch= exist_near state= start]\clock [t= 48] \SetSwitch [figure= fullBod switch= large2 state= start]\clock [t= 51] \SetSwitch [figure= fullBod switch= find_series state= start]\clock [t= 52.8] \SetSwitch [figure= fullBod switch= important state= start]\clock [t= 57] \SetSwitch [figure= fullBod switch= whole state= start]\clock [t= 62] \SetSwitch [figure= blackboard switch= enter state= on]\clock [t= 62] \settexture [figure= blackboard tex= museum1.jpg]\clock [t= 63.9] \SetSwitch [figure= fullBod switch= museum1 state= start]\clock [t= 66.1] \settexture [ figure= blackboard tex= museum2.jpg]\clock [t= 66.1] \SetSwitch [figure= fullBod switch= museum2 state= start]\clock [t= 68.3] \SetSwitch [figure= fullBod switch= as_well state= start]\clock [t= 78] \SetSwitch [figure= blackboard switch= enter state= close]\clock [t= 80] \SetSwitch [figure= fullBod switch= plateia_9 state= start]\clock [t= 87] \SetSwitch [figure= fullBod switch= contradictory2 state= start]\clock [t= 91] \SetSwitch [figure= fullBod switch= propably state= start]\clock [t= 98] \SetSwitch [figure= fullBod switch= named state= start]\clock [t= 101] \SetSwitch [figure= fullBod switch= no_help state= start]\clock [t= 105] \SetSwitch [figure= fullBod switch= plateia_10 state= start]\clock [t= 109] \SetSwitch [figure= fullBod switch= which state= start]\clock [t= 114] \SetSwitch [figure= fullBod switch= which state= start]\clock [t= 118] \SetSwitch [figure= fullBod switch= point state= start]\clock [t= 124] \SetSwitch [figure= fullBod switch= of_course_left state= start]\clock [t= 132] \SetSwitch [figure= fullBod switch= if_else state= start]\counter [name= [LocC_HBcounter] i0=0 f0= 1]
Current Developments - Computer Vision
OpenCV is an open source library for robotic vision from Intel. Using its head-tracking functionality, I was able to construct a simple algorithm that detects, the location of the user's head in the 3D environment (right,left,far-away, close, etc). By using this information, you can then implement some really fantastic scenarios in any programming environment. For example in my prototypes I have the following scenarios:
Face recognition is also possible with OpenCV but it doesn't offer much more than face detection. A good idea would be to integrate both into a single component. Ideally such a component would lock onto a face, and provide its coordinates to the application. Finally somehow such a component should also integrate emotion recognition abilities, an important attribute for Virtual Humans.
A VH looking at the direction of my face:
Natural Language Processing
Although I am sure you can cite more advanced system that these, compared to just mere claims, these three systems are: a) publicly available, b) fully-functional c) free-to-use. Although all three systems, do not actually process language, VPF is by far the most effective of all. As VPF people were kind enough to provide me with their script-matching algorithm and a fully functional API I was able to address this limitation.
Stage 1.
Stage2.
Stage 3.
Code for Stage 1
Sub Syntactic_Keyword_Processing(ByVal userinput As String)
'we need to load the tagger first
Try
tagger_counter += 1
If tagger_counter = 1 Then
load_tagger()
Else
'tagger is loaded don't loaded again
End If
'remove panctuation and contractions first
Dim contractions As List(Of String) = New List(Of String)(New String() _
{"didn't", "'ll", "'re", "lets", "let's", "'ve", "'m", "won't", "'d", "'s", "n't"})
Dim word_contractions As List(Of String) = New List(Of String)(New String() _
{"did not", "will", "are", "let us", "let us", "have", "am", "will not", "would", "is", "not"})
Dim end_line As String = "[\,\?\!\.]|\s+$"
Dim start_line As String = "[\,\?\!]|^\s+"
Dim userinput2 As String = Regex.Replace(userinput, end_line, "")
Dim userinput3 As String = Regex.Replace(userinput2, start_line, "")
'remove contractions
For Each item As String In contractions
If userinput.Contains(item) Then
Dim cont_position As Integer = contractions.IndexOf(item)
Dim what_word As String = word_contractions.Item(cont_position)
userinput3 = Regex.Replace(userinput3, item, Space(1) & what_word)
Exit For
End If
Next
Dim ask_step As String = "step_1"
Select Case ask_step
Case "step_1"
hr.addUserInput(userinput)
Dim response As String = hr.findResponses(Current_Script)
Dim index As Integer = hr.findMostRelevantResponse()
VPF_trigger = hr.getResponseMatchedSentence(index)
If VPF_trigger <> "" Then
'Perform syntactic comparison between the input and the trigger
syntactic_keyword_comparison(userinput3)
If comparison = "Sucessful" Then
_answer = hr.getResponseMatchedSpeech(index)
input_list.Clear()
trigger_list.Clear()
Else
ask_step = "step_2"
End If
End If
End Select
Select Case ask_step
Case "step_2"
If new_question <> "" Then
hr.addUserInput(new_question)
Dim response As String = hr.findResponses(Current_Script)
Dim index As Integer = hr.findMostRelevantResponse()
_answer = hr.getResponseMatchedSpeech(index)
input_list.Clear()
trigger_list.Clear()
End If
End Select
Catch ex As Exception
output.Clear()
output.Text += ex.Message + Environment.NewLine
End Try
End Sub
Tuesday, September 29, 2009
Current Developments - Talos Authoring tool
Talos: A Virtual Human Authoring tool
Talos is an authoring environment designed to enable content developers to create Virtual Human systems for the domain of mobile guidance. Although the system has been designed with the paper guide book creators in mind, it can also be used in other application domains as well.The final design of Talos is very complex and includes several different modules. I think that it will take a programming team of 5 people 1 year of intense work to complete.
Talos come as an idea when I realised that there are virtually no tools in the market that would enable me to rapidly create the prototypes I needed for my research work. From the final design of Talos, I was only able to implement some of its ideas, those that needed to implement my final prototype systems.
1 ) A script parser to automatically create Haptek scripts. A screen-shot of the parser is show below:
2) A simple UI (User Interface tool) that enabled me to create without too much effort, the AIML KBs needed for my final prototype systems. In more detail the UI includes:
a) An AIML\XML KB creator.
b) An AIML creator for existing question-sets (e.g., questions that you may already have in XML format)
c) A translator for both AIML/XML KBs
d) A scene character modifier where you can modify various scene and character settings (e.g., probs, backgrounds, etc)
e) A GPS integration module, where you can assign Long and Lat coordinates to various scripts and tests them in real-time
f) A speech recognition module where you can load a grammar and test it
g) A Cyc-creator module. This module was supposed to automate the creation of CYC queries suitable for insertion into AIML scripts. However, the API of ResearchCyc is extremely tough to crack (perhaps because it is pure Java) and I ended up with just a simulation.
From the above I only used "a","b",and "c" in the creation of my final prototypes. The rest of the features were not of any value to the actual development, but they lead to several improvement in the final Talos design. The UI tool is available for free. If you want to experiment with Virtual Humans, AIML and XML KB's, making location-sensitive scripts for your characters that's the best way to start.
A screenshot of the UI tool
Current Developments - Haptek Clothing
Monday, September 28, 2009
Current Developments - MGUIDE Prototypes
Prototype 1:
Prototype 2:
Prototype 3:
Prototype 4:
Sunday, September 27, 2009
Development of Virtual Humans - Early Years
Domain: Mobile Guides
After the first 1 and 1/2 years of my M.Phil work, I finally found heaven. The area of mobile guides is a relatively simple domain but with much to offer. Tourism in several countries (e.g., Greece) is considered a vital part of the economy. In 2005 I begun working on a system that a) provides navigation instructions in a specific area (i.e., a medieval castle in Greece) b) provides information on selected locations of the castle (e.g., historical information). The system was authored in Macromedia Director MX 2004 multimedia environment and used:
The system accepted input through a menu of static English phrases and buttons. It generated no output apart from the minor movement of the character's body on the screen. After field testing the system, I realised that a) the GPS component because the rocky environment of the castle was not working. b) The ASR engine could not distinguish the user's voice from the noise in the environment.
These two components were removed and a formal evaluation took place in November 2005. The goal of the evaluation was to investigate how such an agent would affect the accessibility and usability of a mobile guide system.((#$and(#$isa ?WHAT-ONE-1 #$Country)(#$politiesBorderEachOther #$Greece ?WHAT-ONE-1))(#$and(#$bordersOn #$Greece ?WHAT-ONE-1)(#$isa ?WHAT-ONE-1 #$Country))(#$and(#$isa ?COUNTRY #$IndependentCountry)(#$politiesBorderEachOther #$Greece ?COUNTRY))(#$and(#$bordersOn #$Greece ?COUNTRY)(#$isa ?COUNTRY #$IndependentCountry)))
0 (((?COUNTRY . Bulgaria))((?COUNTRY . Macedonia))((?COUNTRY . Albania))((?COUNTRY . Turkey))).1. (((?COUNTRY . Bulgaria))((?COUNTRY . Macedonia))((?COUNTRY . Albania))((?COUNTRY . Turkey)))2. (((?WHAT-ONE-1 . Bulgaria))((?WHAT-ONE-1 . Macedonia)) ((?WHAT-ONE-1 . Albania))((?WHAT-ONE-1 . Turkey)))3.(((?WHAT-ONE-1 . Bulgaria))((?WHAT-ONE-1 . Macedonia))((?WHAT-ONE-1 . Albania))((?WHAT-ONE-1. Turkey)))
Problems:
Lessons learned:
1) The pilot evaluation suggested a) several improvements in the experimental design b) several design improvements of the prototype.
2) The need to produce more user-insights. This could only be accomplished, by developing more prototypes varying different attributes of virtual humans (e.g., competence, modality of communication, etc.).
3) The need for technological differentiation. Its impossible to continue the simplistic approach of this prototype, as the research will lead me nowhere. The domain of VH is extremely comparative area with groups of researchers that produce massive contributions to both technology and knowledge every single month.
4) My rather bumpy experience in developing the prototype also suggested the need for an authoring tool that would enable non-programmers to rapidly develop complex Virtual Human systems.
Saturday, September 26, 2009
Development of Virtual Humans - Early Years
The idea, was to develop and evaluate a conversational sales assistant that would assist users in all stages of the CBB (Consumer Buying Behaviour Model) - a model used to describe the entire range of consumer needs in on-line commerce environments. The algorithm alone, would have been enough for a M.Phil contribution. Hence, with great enthusiasm, I started my work at Middlesex. The first task in my research list, was the development of a fully-working prototype. I designed a rather complex system (shown below along with an explanation) and begun looking for proper authoring tools.
A discourse manager module can not be evaluated if it is not part of a general dialogue system . The organization of system needed to test this module is shown below: Starting at the top of the figure, the user communicates with the system textually through a standard desktop PC. The input is parsed to a sequence of speech acts based on the syntactic and semantic form of the utterance and sent to the discourse manager. The discourse manager sets up its initial conversation state and passes the sequence to the context for identification of any lexical information (e.g. names, features etc), and then hands the acts to the reference resolution. The component has two duties. First, it assigns the correct referent to anaphoric referring expressions (e.g. the pronoun her in the sentence Anne asked Edward to pass her the salt refers to Anne). Then, if it is necessary it does illocutionary remapping of the speech acts assigned by the parser as needed to fit discourse and reference cues. For instance, an utterance that consist of a REJECT “no” followed by a REQUEST “go via bank” will have the letter REQUEST remapped into the REJECT; it is essentially the content of the REJECT, not a separate REQUEST. After this processing, reference returns the speech act(s), now in an internal format, to the discourse manager for further disposition. These speech acts however, are only the surface acts (i.e., literal acts) of the utterance. Oftentimes, a surface speech act has a deeper intended or indirect meaning (i.e. an illocutionary act). Take, for example, the utterance, “Can you pass the salt?” which on the surface looks like a QUERY speech act (i.e., asking about your ability to pass the salt). However, the indirect speech act is most likely a REQUEST (i.e. a request to pass me the salt). The discourse manager detects and interprets indirect speech acts through an intention recognition process done in conjunction with the Task Manager. More specifically, it sends each of the postulated acts to the Task Manager and asks whether this interpretation would ‘make sense’ given the current domain and planning context. Each of these postulates are then scored by the Task Manager which allows the Discourse Manager (DM) to find the correct interpretation of the surface speech act (i.e., the intended problem solving act). In addition, with its indirect speech acts duties, the discourse manager must convert the speech act semantics from a linguistic knowledge representation (produced by the parser) to a domain specific, problem-solving knowledge representation (used by the Task Manager for intention recognition). For example, the linguistic semantics of the utterance “Buy the HP Palm from PC-World” might be:
(buy1
:agent SYSTEM
: theme HPPalm
: goal PC-World
The domain specific, problem-solving knowledge representation would correspond to some action in the domain (purchase) with some corresponding parameters. For example:
(purchase
:product HPPalm
:from-merchant PC-World
:price ?p)
The ?p is a variable, since the price was not explicitly mentioned in the utterance.
The output of the task manager and the interface agent is a set of surface speech acts that the system wants to communicate to the user. The generation module achieves this communication using the graphical user display. This module also uses prioritized rules to match requested speech acts to appropriate means of expressing them. For example a request to the generator to inform user about a specific product will result in telling the display to show the image of the product, and simultaneously provide a textual description. The above architecture is by no means complete. We suspect that several modifications to the existing components and addition of new ones can be made during the course of development. However, it serves as a good example of the great complexity of the project.
I spent several months search the WWW. I have a massive URL library from companies that probably no longer exist. An example is: http://www.agentscape.de/, that asked me 15,000 USD for their authoring tools. To that, add, a) a University refused to spent any more money on me (it was already paying me a scholarship) b) a supervisor that kept insisting to follow that path, and you end-up with the perfect recipe for career destruction. Under the massive pressure, instead of quitting, I decided to move to a less complex domain and area of research. It was clear, that this project was better suited for a team of researchers with a perhaps unlimited budget. When I think about it today, it is still unknown, why my supervisor failed to see something so obvious!!!!
Friday, September 25, 2009
Development of Virtual Humans - Early Years
I think, its a good idea to start this post, with a simple, non-technical definition for a virtual human. A virtual human is an intelligent system capable of providing and accepting information through a full-range of human modalities (e.g., speech,gesture, face expressions, etc).
Constructing an intelligent VH with a general intelligence is a scary idea that should never be pursued. Instead I am more fun of the the robot-slaves idea, i.e., virtual humans (in physical or holographic form) that ONLY look intelligent in a specific domain (e.g., tour-guides, sales, etc)
I initially started working with VH back in 2002, during my Master's degree. The demo I constructed, the e-briefing room is shown below, along with a brief description of its functionality:
The “e-Briefing Room” service provides a tool for effectively educate customers with complex and high-end products on the Web. This vehicle allow customers to fully interact with three-dimensional (3D) models of products online, and also access personalized services on demand, which will be provided by a three-dimensional (3D) talking virtual sales-assistant.Through this interactive technology, the e-briefing room makes it easy for customers to access information on demand. Start your experience by directly selecting a product category of your choice, or activate Derek to listen a detail presentation about the functionalities of the service.
The demo was extremely simple (but... I did it only for my Master's degree). Its only difference from a static web-page, was the 3D head (from an Australian company called famous3D). Back in those days I wasn't even aware of the existence of ALICE But once you got the bug of VH technology you can not easily stop. The simple "e-briefing room" was enough to start me wondering what could be next in the exciting world of Virtual Humans.
In 2003 I was given the opportunity to study for an M.Phil degree at Middlesex University in London. My decision was to continue working on Virtual Humans for electronic commerce.