SlideShare a Scribd company logo
1 of 73
Download to read offline
Social Data and Log Analysis
      Using MongoDB
      2011/03/01(Tue) #mongotokyo
              doryokujin
Self-Introduction

• doryokujin (Takahiro Inoue), Age: 25
• Education: University of Keio
  • Master of Mathematics March 2011 ( Maybe... )
  • Major: Randomized Algorithms and Probabilistic Analysis

• Company: Geisha Tokyo Entertainment (GTE)
  • Data Mining Engineer (only me, part-time)

• Organized Community:
  • MongoDB JP, Tokyo Web Mining
My Job

• I’m a Fledgling Data Scientist
  • Development of analytical systems for social data
  • Development of recommendation systems for social data
• My Interest: Big Data Analysis
  • How to generate logs scattered many servers
  • How to storage and access to data
  • How to analyze and visualization of billions of data
Agenda
• My Company’s Analytic Architecture
• How to Handle Access Logs
• How to Handle User Trace Logs
• How to Collaborate with Front Analytic Tools
• My Future Analytic Architecture
Agenda                   Hadoop,
                                       Mongo Map Reduce

• My Company’s Analytic Architecture      Hadoop,
                                        Schema Free
• How to Handle Access Logs
• How to Handle User Trace Logs         REST Interface,
                                           JSON

• How to Collaborate with Front Analytic Tools
                                       Capped Collection,
• My Future Analytic Architecture      Modifier Operation


Of Course Everything With
My Company’s
Analytic Architecture
Social Game (Mobile): Omiseyasan




• Enjoy arranging their own shop (and avatar)
• Communicate with other users by shopping, part-time, ...
• Buy seeds of items to display their own shop
Data Flow

Access
Back-end Architecture
  Pretreatment: Trimming,      As a Central Data Server
   Validation, Filtering,...




Dumbo (Hadoop Streaming)

                                         PyMongo




    Back
    Up To
    S3
Front-end Architecture

                  sleepy.mongoose
                  (REST Interface)
PyMongo


                                        Web UI




Social Data Analysis                 Data Analysis
Environment
• MongoDB: 1.6.4
  • PyMongo: 1.9
• Hadoop: CDH2 ( soon update to CDH3 )
  • Dumbo: Simple Python Module for Hadoop Streaming
• Cassandra: 0.6.11
   • R, Neo4j, jQuery, Munin, ...
• [Data Size (a rough estimate)]
  • Access Log 15GB / day ( gzip ) - 2,000M PV
  • User Trace Log 5GB / day ( gzip )
How to Handle
 Access Logs
How to Handle Access Logs
Pretreatment: Trimming,       As a Data Server
 Validation, Filtering, ...




   Back
   Up To
   S3
Access Data Flow
                                                            Caution: need
                                                          MongoDB >= 1.7.4
                                  user_pageview




                                 agent_pageview                     daily_pageview
 Pretreatment                                      2nd Map Reduce




user_access                      hourly_pageview
                1st Map Reduce


                                   Group by
Hadoop

• Using Hadoop: Pretreatment Raw Records
• [Map / Reduce]
    • Read all records
    • Split each record by ‘¥s’
    • Filter unnecessary records (such as *.swf)
    • Check records whether correct or not
    • Insert (save) records to MongoDB
    ※ write operations won’t yet fully utilize all cores
Access Logs

110.44.178.25 - - [19/Nov/2010:04:40:40 +0900] "GET /playshop.4ce13800/battle/
BattleSelectAssetPage.html;jsessionid=9587B0309581914AB7438A34B1E51125-n15.at3?collec
    tion=12&opensocial_app_id=00000&opensocial_owner_id=00000 HTTP/1.0" 200 6773 "-"
"DoCoMo/2.0 ***"


110.44.178.26 - - [19/Nov/2010:04:40:40 +0900] "GET /playshop.4ce13800/shopping/battle/
ShoppingBattleTopPage.html;jsessionid=D901918E3CAE46E6B928A316D1938C3A-n11.a
    p1?opensocial_app_id=00000&opensocial_owner_id=11111 HTTP/1.0" 200 15254 "-"
"DoCoMo/2.0 ***"


110.44.178.27 - - [19/Nov/2010:04:40:40 +0900] "GET /playshop.4ce13800/battle/
BattleSelectAssetDetailPage;jsessionid=202571F97B444370ECB495C2BCC6A1D5-n14.at11?asse
    t=53&collection=9&opensocial_app_id=00000&opensocial_owner_id=22222 HTTP/1.0" 200
11616 "-" "SoftBank/***"


...(many records)
Collection: user_trace
> db.user_trace.find({user: "7777", date: "2011-02-12"}).limit(0)
    .forEach(printjson)
{
        "_id" : "2011-02-12+05:39:31+7777+18343+Access",
        "lastUpdate" : "2011-02-19",
        "ipaddr" : "202.32.107.166",
        "requestTimeStr" : "12/Feb/2011:05:39:31 +0900",
        "date" : "2011-02-12",
        "time" : "05:39:31",
        "responseBodySize" : 18343,
        "userAgent" : "DoCoMo/2.0 SH07A3(c500;TB;W24H14)",
        "statusCode" : "200",
        "splittedPath" : "/avatar2-gree/MyPage,
        "userId" : "7777",
        "resource" : "/avatar2-gree/MyPage;jsessionid=...?
battlecardfreegacha=1&feed=...&opensocial_app_id=...&opensocial_viewer_id=...&
opensocial_owner_id=..."
}
1st Map Reduce

• [Aggregation]
   • Group by url, date, userId
   • Group by url, date, userAgent
   • Group by url, date, time
   • Group by url, date, statusCode
• Map Reduce operations runs in parallel on all shards
1st Map Reduce with PyMongo
map = Code("""
   function(){
                                         • this.userId
        emit({
              path:this.splittedPath,
                                         • this.userAgent
              userId:this.userId,
              date:this.date
        },1)}
                                         • this. timeRange
 """)
                                         • this.statusCode
 reduce = Code("""
   function(key, values){
        var count = 0;
        values.forEach(function(v) {
              count += 1;
        });
        return {"count": count, "lastUpdate": today};
   }
 """)
# ( mongodb >= 1.7.4 )
     result = db.user_access.map_reduce(map,
                                reduce,
                                marge_out="user_pageview",
                                full_response=True,
                                query={"date": date})


• About output collection, there are 4 options: (MongoDB >= 1.7.4)
  • out : overwrite collection if already exists
  • marge_output : merge new data into the old output collection
  • reduce_output : reduce operation will be performed on the two values
    (the same key on new result and old collection) and the result will be
    written to the output collection.
  • full_responce (=false) : If True, return on stats on the operation. If False,
    No collection will be created, and the whole map-reduce operation will
    happen in RAM. The Result set fits within the 8MB/doc limit (16MB/doc
    in 1.8?).
Map Reduce (>=1.7.4):
              out option in JavaScript
• "collectionName" : If you pass a string indicating the name of a collection, then
  the output will replace any existing output collection with the same name.
• { merge : "collectionName" } : This option will merge new data into the old
  output collection. In other words, if the same key exists in both the result set and
  the old collection, the new key will overwrite the old one.
• { reduce : "collectionName" } : If documents exists for a given key in the result
  set and in the old collection, then a reduce operation (using the specified reduce
  function) will be performed on the two values and the result will be written to
  the output collection. If a finalize function was provided, this will be run after
  the reduce as well.
• { inline : 1} : With this option, no collection will be created, and the whole map-
  reduce operation will happen in RAM. Also, the results of the map-reduce will
  be returned within the result object. Note that this option is possible only when
  the result set fits within the 8MB limit.
                                                http://www.mongodb.org/display/DOCS/MapReduce
Collection: user_pageview
> db.user_pageview.find({
          "_id.userId": "7777",                  • Regular Expression
          "_id.path": "/.*MyPage$/",
          "_id.date": {$lte: "2011-02-12"}
                                                 • <, >, <=, >=
    ).limit(1).forEach(printjson)
#####
{
          "_id" : {
                  "date" : "2011-02-12",
                  "path" : "/avatar2-gree/MyPage",
                  "userId" : "7777",
          },
          "value" : {
                  "count" : 10,
                  "lastUpdate" : "2011-02-19"
          }
}
2nd Map Reduce with PyMongo
map = Code("""
       function(){
           emit({
                  "path" : this._id.path,
                  "date":   this._id.date,
           },{
                  "pv": this.value.count,
                  "uu": 1
           });
       }
""")
reduce = Code("""
       function(key, values){
           var pv = 0;
           var uu = 0;
           values.forEach(function(v){
                 pv += v.pv;
                 uu += v.uu;
           });
           return {"pv": pv, "uu": uu};
       }
""")
2nd Map Reduce with PyMongo
map = Code("""
       function(){
           emit({
                  "path" : this._id.path,
                  "date":   this._id.date,
           },{
                  "pv": this.value.count,
                  "uu": 1
           });
       }
""")
reduce = Code("""
       function(key, values){
           var pv = 0;                        Must be the same key
           var uu = 0;                       ({“pv”: NaN} if not)
           values.forEach(function(v){
                 pv += v.pv;
                 uu += v.uu;
           });
           return {"pv": pv, "uu": uu};
       }
""")
# ( mongodb >= 1.7.4 )
result = db.user_pageview.map_reduce(map,
                  reduce,
                  marge_out="daily_pageview",
                  full_response=True,
                  query={"date": date})
Collection: daily_pageview

> db.daily_pageview.find({
        "_id.date": "2011-02-12",
        "_id.path": /.*MyPage$/
    }).limit(1).forEach(printjson)
{
        "_id" : {
                "date" : "2011-02-12",
                "path" : "/avatar2-gree/MyPage",
        },
        "value" : {
                "uu" : 53536,
                "pv" : 539467
        }
}
Current Map Reduce is Imperfect
  • [Single Threads per node]
    • Doesn't scale map-reduce across multiple threads

  • [Overwrite the Output Collection]
    • Overwrite the old collection ( no other options like “marge” or
      “reduce” )

# mapreduce code to merge output (MongoDB < 1.7.4)
result = db.user_access.map_reduce(map,
                   reduce,
                   full_response=True,
                   out="temp_collection",
                   query={"date": date})
[db.user_pageview.save(doc) for doc in temp_collection.find()]
Useful Reference: Map Reduce

• http://www.mongodb.org/display/DOCS/MapReduce
• ALookAt MongoDB 1.8's MapReduce Changes
• Map Reduce and Getting Under the Hood with Commands
• Map/reduce runs in parallel/distributed?
• Map/Reduce parallelism with Master/SlaveA
• mapReduce locks the whole server
• mapreduce vs find
How to Handle
User Trace Logs
How to Handle
               User TRACE Logs
Pretreatment: Trimming,       As a Data Server
 Validation, Filtering, ...




   Back
   Up To
   S3
User Trace / Charge Data Flow

                             user_charge




Pretreatment
                             daily_charge




user_trace     daily_trace
User Trace Log
Hadoop
• Using Hadoop: Pretreatment Raw Records
• [Map / Reduce]
    • Split each record by ‘¥s’
    • Filter Unnecessary Records
    • Check records whether user behaves dishonestly
    • Unify format to be able to sum up ( Because raw records are
      written by free format )

    • Sum up records group by “userId” and “actionType”
    • Insert (save) records to MongoDB
    ※ write operations won’t yet fully utilize all cores
An Example of User Trace Log

     UserId   ActionType   ActionDetail
An Example of User Trace Log
-----Change------
ActionLogger    a{ChangeP}          (Point,1371,1383)
ActionLogger    a{ChangeP}          (Point,2373,2423)

------Get------
ActionLogger    a{GetMaterial}   (syouhinnomoto,0,-1)          The value of “actionDerail”
ActionLogger    a{GetMaterial}   usesyouhinnomoto
ActionLogger    a{GetMaterial}   (omotyanomotoPRO,1,6)
                                                                 must be unified format
-----Trade-----
ActionLogger    a{Trade} buy 3 itigoke-kis from gree.jp:00000 #

-----Make-----
ActionLogger     a{Make}            make item kuronekono_n
ActionLogger     a{MakeSelect}      make item syouhinnomoto
ActionLogger     a{MakeSelect}      (syouhinnomoto,0,1)

-----PutOn/Off-----
ActionLogger    a{PutOff}            put off 1 ksuteras
ActionLogger    a{PutOn}             put 1 burokkus @2500

-----Clear/Clean-----
ActionLogger    a{ClearLuckyStar}       Clear LuckyItem_1     4 times

-----Gatcha-----
ActionLogger     a{Gacha} Play gacha with first free play:
ActionLogger     a{Gacha} Play gacha:
Collection: user_trace
> db.user_trace.find({date:"2011-02-12”,
                         actionType:"a{Make}",
                         userId:”7777"}).forEach(printjson)
{
    "_id" : "2011-02-12+7777+a{Make}",
    "date" : "2011-02-12"
    "lastUpdate" : "2011-02-19",
    "userId" : ”7777",
    "actionType" : "a{Make}",               Sum up values group by
    "actionDetail" : {                     “userId” and “actionType”
        "make item ksutera" : 3,
        "make item makaron" : 1,
        "make item huwahuwamimiate" : 1,
        …

    }

}
Collection: daily_trace
> db.daily_trace.find({
                       date:{$gte:"2011-02-12”,$lte:”2011-02-19”},
                       actionType:"a{Make}"}).forEach(printjson)
{
       "_id" : "2011-02-12+group+a{Make}",
       "date" : "2011-02-12",
       "lastUpdate" : "2011-02-19",
       "actionType" : "a{Make}",
       "actionDetail" : {
             "make item kinnokarakuridokei" : 615,
             "make item banjo-" : 377,
             "make item itigoke-ki" : 135904,
             ...
       },
       ...
}...
User Charge Log
Collection: user_charge
// TOP10 Users at 2011-02-12 abount Accounting
> db.user_charge.find({date:"2011-02-12"})
                 .sort({totalCharge:-1}).limit(10).forEach(printjson)
{
     "_id" : "2011-02-12+7777+Charge",
     "date" : "2011-02-12",
     "lastUpdate" : "2011-02-19",
     "totalCharge" : 10000,
     "userId" : ”7777",
     "actionType" : "Charge",
                                               Sum up values group by
     "boughtItem" : {                         “userId” and “actionType”
         "        EX" : 13,

         "    +6000" : 3,

         "        PRO" : 20

     }
}
{…
Collection: daily_charge
> db.daily_charge.find({date:"2011-02-12",T:"all"})
                                  .limit(10).forEach(printjson)
{
    "_id" : "2011-02-12+group+Charge+all+all",
    "date" : "2011-02-12",
    "total" : 100000,
    "UU" : 2000,
    "group" : {
         "              " : 1000000,

         "   " : 1000000, ...

    },
    "boughtItemNum" : {
         "        EX" : 8,

         "         " : 730, ...

    },
    "boughtItem" : {
         "        EX" : 10000,

         "         " : 100000, ...

    }
}
Categorize Users
Categorize Users
 user_trace     Attribution                    • [Categorize Users]

                              user_registrat
                                                  • by play term
                Attribution        ion
 user_charge                                      • by total amount
                                                    of charge

                                                  • by registration
                Attribution
                                                    date
user_savedata
                              user_category

                Attribution
                                               • [ Take an Snapshot
                                                 of Each Category’s
user_pageview
                                                 Stats per Week]
Collection: user_registration
> db.user_registration.find({userId:”7777"}).forEach(printjson)
{
    "_id" : "2010-06-29+7777+Registration",
    "userId" : ”7777"
    "actionType" : "Registration",
                                                  Tagging User
    "category" : {
         “R1” : “True”,              #

         “T” : “ll”                  #

         …

    },

    “firstCharge” : “2010-07-07”,    #

    “lastLogin” : “2010-09-30”,      #

    “playTerm” : 94,

    “totalCumlativeCharge” : 50000, #

    “totalMonthCharge” : 10000,      #

    …

}
Collection: user_category

> var cross = new Cross()    # User Definition Function
> MCResign = cross.calc(“2011-02-12”,“MC”,1)
# each value is the number of the user
# Charge(yen)/Term(day)
                 0(z)     ~¥1k(s)    ~¥10k(m)   ¥100k~(l)    total
~1day(z)        50000          10          5        0        50015
~1week(s)       50000         100         50        3        50153
~1month(m)     100000         200        100        1       100301
~3month(l)     100000         300         50        6       100356
month~(ll)          0           0          0        0            0
How to Collaborate With
 Front Analytic Tools
Front-end Architecture

                  sleepy.mongoose
                  (REST Interface)
PyMongo


                                        Web UI




Social Data Analysis                 Data Analysis
Web UI and Mongo
Data Table: jQuery.DataTables
  [ Data Table ]                •
                                1 Variable length pagination
                                2 On-the-fly filtering
                                3 Multi-column sorting with data
                                    type detection

• Want to Share Daily Summary   4 Smart handling of column widths
                                5 Scrolling options for table
• Want to See Data from Many
  Viewpoint                         viewport
                                6 ...
• Want to Implement Easily
Graph: jQuery.HighCharts
  [ Graph ]                        •
                                   1. Numerous Chart Types

                                   2. Simple Configuration Syntax

                                   3. Multiple Axes

• Want to Visualize Data           4. Tooltip Labels

• Handle Time Series Data Mainly   5. Zooming

• Want to Implement Easily         6. ...
sleepy.mongoose

• [REST Interface + Mongo]
   • Get Data by HTTP GET/POST Request
   • sleepy.mongoose
      ‣ request as “/db_name/collection_name/_command”
      ‣ made by a 10gen engineer: @kchodorow
      ‣ Sleepy.Mongoose: A MongoDB REST Interface
sleepy.mongoose

//start server
> python httpd.py
…listening for connections on http://localhost:27080


//connect to MongoDB
> curl --data server=localhost:27017 'http://localhost:27080/
_connect’


//request example
> http://localhost:27080/playshop/daily_charge/_find?criteria={}
&limit=10&batch_size=10


{"ok": 1, "results": [{“_id": “…”, ”date":… },{“_id”:…}], "id":
0}}
JSON: Mongo <---> Ajax




 sleepy.mongoose
 (REST Interface)
                    Get
                          JSON

• jQuery library and MongoDB are compatible
• It is not necessary to describe HTML tag(such as <table>)
Example: Web UI
R and Mongo
Collection: user_registration
> db.user_registration.find({userId:”7777"}).forEach(printjson)
{
    "_id" : "2010-06-29+7777+Registration",       Want to know the relation
    "userId" : ”7777"
                                                  between user attributions
    "actionType" : "Registration",
    "category" : {
         “R1” : “True”,              #

         “T” : “ll”                  #

         …

    },

    “firstCharge” : “2010-07-07”,    #

    “lastLogin” : “2010-09-30”,      #

    “playTerm” : 94,

    “totalCumlativeCharge” : 50000, #

    “totalMonthCharge” : 10000,      #

    …

}
R Code: Access MongoDB
       Using sleepy.mongoose
##### LOAD LIBRARY #####
library(RCurl)
library(rjson)
##### CONF #####
today.str    <-    format(Sys.time(), "%Y-%m-%d")
url.base     <-    "http://localhost:27080"
mongo.db     <-    "playshop"
mongo.col    <-    "user_registration"
mongo.base   <-    paste(url.base, mongo.db, mongo.col, sep="/")
mongo.sort   <-    ""
mongo.limit <-     "limit=100000"
mongo.batch <-     "batch_size=100000"
R Code: Access MongoDB
             Using sleepy.mongoose
##### FUNCTION #####
find <- function(query){
    mongo <- fromJSON(getURL(url))
    docs <- mongo$result
    makeTable(docs) # My Function
}
# Example
# Using sleepy.mongoose https://github.com/kchodorow/sleepy.mongoose
mongo.criteria <- "_find?criteria={ ¥
     "totalCumlativeCharge":{"$gt":0,"$lte":1000}}"
mongo.query <- paste(mongo.criteria, mongo.sort, ¥
     mongo.limit, mongo.batch, sep="&")
url <- paste(mongo.base, mongo.query, sep="/")
user.charge.low <- find(url)
The Result
# Result: 10th Document

[[10]]
[[10]]$playTerm
[1] 31

[[10]]$lastUpdate
[1] "2011-02-24"

[[10]]$userId
[1] "7777"

[[10]]$totalCumlativeCharge
[1] 10000

[[10]]$lastLogin
[1] "2011-02-21"

[[10]]$date
[1] "2011-01-22"

[[10]]$`_id`
[1] "2011-02-12+18790376+Registration"

...
Make a Data Table from The Result

# Result: Translate Document to Table

        playTerm totalWinRate totalCumlativeCharge totalCommitNum totalWinNum
 [1,]         56           42                 1000            533         224
 [2,]         57           33                 1000            127          42
 [3,]         57           35                 1000            654         229
 [4,]         18           31                 1000             49          15
 [5,]         77           35                 1000            982         345
 [6,]         77           45                 1000            339         153
 [7,]         31           44                 1000             70          31
 [8,]         76           39                 1000            229          89
 [9,]         40           21                 1000            430          92
[10,]         26           40                 1000             25          10
...
Scatter Plot / Matrix

                  Each Category
                  (User Attribution)




# Run as a batch command
$ R --vanilla --quiet < mongo2R.R
Munin and MongoDB
Monitoring DB Stats




Munin configuration examples - MongoDB

https://github.com/erh/mongo-munin

https://github.com/osinka/mongo-rs-munin
My Future
Analytic Architecture
Realtime Analysis
Access Logs           Flume                 with MongoDB
RealTime
(hourly)

                capped
              collection               user_access               daily/hourly
              (per hour)   Trimming                  MapReduce     _access
                           Filtering                  Modifier
                            Sum Up                    Sum Up

                capped
                                                                 daily/hourly
              collection               user_trace
              (per hour)                                            _trace
RealTime
(hourly)

User Trace
  Logs
Flume
Server A
           Hourly /
Server B   Realtime


Server C                    Flume
                            Plugin   Mongo
              Collector
                                      DB
Server D


Server E   Access Log
           User Trace Log

Server F
An Output From
                 Mongo-Flume Plugin
> db.flume_capped_21.find().limit(1).forEach(printjson)
{
        "_id" : ObjectId("4d658187de9bd9f24323e1b6"),
        "timestamp" : "Wed Feb 23 2011 21:52:06 GMT+0000 (UTC)",
        "nanoseconds" : NumberLong("562387389278959"),
        "hostname" : "ip-10-131-27-115.ap-southeast-1.compute.internal",
        "priority" : "INFO",
        "message" : "202.32.107.42 - - [14/Feb/2011:04:30:32 +0900] "GET /
avatar2-gree.4d537100/res/swf/avatar/18051727/5/useravatar1582476746.swf?
opensocial_app_id=472&opensocial_viewer_id=36858644&o
pensocial_owner_id=36858644 HTTP/1.1" 200 33640 "-" "DoCoMo/2.0 SH01C
(c500;TB;W24H16)"",
        "metadata" : {}
}



Mongo Flume Plugin: https://github.com/mongodb/mongo-hadoop/tree/master/flume_plugin
Summary
Summary
• Almighty as a Analytic Data Server
  • schema-free: social game data are changeable
  • rich queries: important for analyze many point of view
  • powerful aggregation: map reduce
  • mongo shell: analyze from mongo shell are speedy and handy

• More...
  • Scalability: using Replication, Sharding are very easy
  • Node.js: It enable us server side scripting with Mongo
My Presentation
MongoDB
                        UI       MongoDB                          :
         http://www.slideshare.net/doryokujin/mongodb-uimongodb

MongoDB Ajax                                              GraphDB
                             :
    http://www.slideshare.net/doryokujin/mongodbajaxgraphdb-5774546

Hadoop     MongoDB
:
           http://www.slideshare.net/doryokujin/hadoopmongodb

GraphDB
                                        GraphDB                       :
           http://www.slideshare.net/doryokujin/graphdbgraphdb
I ♥ MongoDB JP

• continue to be a organizer of MongoDB JP
• continue to propose many use cases of MongoDB
  • ex: Social Data, Log Data, Medical Data, ...

• support MongoDB users
  • by document translation, user-group, IRC, blog, book,
    twitter,...

• boosting services and products using MongoDB
Thank you for coming to
       Mongo Tokyo!!

[Contact me]
twitter: doryokujin
skype: doryokujin
mail: mr.stoicman@gmail.com
blog: http://d.hatena.ne.jp/doryokujin/
MongoDB JP: https://groups.google.com/group/mongodb-jp?hl=ja

More Related Content

What's hot

Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014MongoDB
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysCAPSiDE
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...MongoDB
 
Webinar: MongoDB + Hadoop
Webinar: MongoDB + HadoopWebinar: MongoDB + Hadoop
Webinar: MongoDB + HadoopMongoDB
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlTO THE NEW | Technology
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
High Performance Applications with MongoDB
High Performance Applications with MongoDBHigh Performance Applications with MongoDB
High Performance Applications with MongoDBMongoDB
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationMongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 

What's hot (20)

Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
 
Webinar: MongoDB + Hadoop
Webinar: MongoDB + HadoopWebinar: MongoDB + Hadoop
Webinar: MongoDB + Hadoop
 
Doing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookupDoing Joins in MongoDB: Best Practices for Using $lookup
Doing Joins in MongoDB: Best Practices for Using $lookup
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
High Performance Applications with MongoDB
High Performance Applications with MongoDBHigh Performance Applications with MongoDB
High Performance Applications with MongoDB
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
MongoDB 101
MongoDB 101MongoDB 101
MongoDB 101
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 

Viewers also liked

eBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLeBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLXu Jiang
 
No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbfabio perrella
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Jesse Wang
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayDataStax Academy
 
NOSQL uma breve introdução
NOSQL uma breve introduçãoNOSQL uma breve introdução
NOSQL uma breve introduçãoWise Systems
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDBMongoDB
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformMongoDB
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)Kevin Weil
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBMongoDB
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 

Viewers also liked (14)

eBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQLeBay Cloud CMS based on NOSQL
eBay Cloud CMS based on NOSQL
 
No sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodbNo sql e as vantagens na utilização do mongodb
No sql e as vantagens na utilização do mongodb
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action: Semantic Wiki: Social Semantic Web In Action:
Semantic Wiki: Social Semantic Web In Action:
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
ebay
ebayebay
ebay
 
Ebay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBayEbay: DB Capacity planning at eBay
Ebay: DB Capacity planning at eBay
 
NOSQL uma breve introdução
NOSQL uma breve introduçãoNOSQL uma breve introdução
NOSQL uma breve introdução
 
Artigo Nosql
Artigo NosqlArtigo Nosql
Artigo Nosql
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)NoSQL at Twitter (NoSQL EU 2010)
NoSQL at Twitter (NoSQL EU 2010)
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 

Similar to Social Data and Log Analysis Using MongoDB

Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoopColin Su
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Hypercubes In Hbase
Hypercubes In HbaseHypercubes In Hbase
Hypercubes In HbaseGeorge Ang
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongoMichael Bright
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewAntonio Pintus
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
MongoDB at ZPUGDC
MongoDB at ZPUGDCMongoDB at ZPUGDC
MongoDB at ZPUGDCMike Dirolf
 
MongoDB hearts Django? (Django NYC)
MongoDB hearts Django? (Django NYC)MongoDB hearts Django? (Django NYC)
MongoDB hearts Django? (Django NYC)Mike Dirolf
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBMongoDB
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarMongoDB
 
Introducing the Seneca MVP framework for Node.js
Introducing the Seneca MVP framework for Node.jsIntroducing the Seneca MVP framework for Node.js
Introducing the Seneca MVP framework for Node.jsRichard Rodger
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRaghunath A
 
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19Henrik Ingo
 

Similar to Social Data and Log Analysis Using MongoDB (20)

Introduction to MapReduce & hadoop
Introduction to MapReduce & hadoopIntroduction to MapReduce & hadoop
Introduction to MapReduce & hadoop
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Hypercubes In Hbase
Hypercubes In HbaseHypercubes In Hbase
Hypercubes In Hbase
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
MongoDB at ZPUGDC
MongoDB at ZPUGDCMongoDB at ZPUGDC
MongoDB at ZPUGDC
 
MongoDB hearts Django? (Django NYC)
MongoDB hearts Django? (Django NYC)MongoDB hearts Django? (Django NYC)
MongoDB hearts Django? (Django NYC)
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDB
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
Introducing the Seneca MVP framework for Node.js
Introducing the Seneca MVP framework for Node.jsIntroducing the Seneca MVP framework for Node.js
Introducing the Seneca MVP framework for Node.js
 
20120816 nodejsdublin
20120816 nodejsdublin20120816 nodejsdublin
20120816 nodejsdublin
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
 

More from Takahiro Inoue

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTakahiro Inoue
 
トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングTakahiro Inoue
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Takahiro Inoue
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するTakahiro Inoue
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューションTakahiro Inoue
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方Takahiro Inoue
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータTakahiro Inoue
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612Takahiro Inoue
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)Takahiro Inoue
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜Takahiro Inoue
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Takahiro Inoue
 
Hadoop and the Data Scientist
Hadoop and the Data ScientistHadoop and the Data Scientist
Hadoop and the Data ScientistTakahiro Inoue
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsTakahiro Inoue
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to TinkerpopTakahiro Inoue
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4jTakahiro Inoue
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDBTakahiro Inoue
 
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Takahiro Inoue
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Takahiro Inoue
 

More from Takahiro Inoue (20)

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC Demo
 
トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティング
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解する
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータ
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!
 
Hadoop and the Data Scientist
Hadoop and the Data ScientistHadoop and the Data Scientist
Hadoop and the Data Scientist
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB Plugins
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to Tinkerpop
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4j
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDB
 
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)
 
Advanced MongoDB #1
Advanced MongoDB #1Advanced MongoDB #1
Advanced MongoDB #1
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Social Data and Log Analysis Using MongoDB

  • 1. Social Data and Log Analysis Using MongoDB 2011/03/01(Tue) #mongotokyo doryokujin
  • 2. Self-Introduction • doryokujin (Takahiro Inoue), Age: 25 • Education: University of Keio • Master of Mathematics March 2011 ( Maybe... ) • Major: Randomized Algorithms and Probabilistic Analysis • Company: Geisha Tokyo Entertainment (GTE) • Data Mining Engineer (only me, part-time) • Organized Community: • MongoDB JP, Tokyo Web Mining
  • 3. My Job • I’m a Fledgling Data Scientist • Development of analytical systems for social data • Development of recommendation systems for social data • My Interest: Big Data Analysis • How to generate logs scattered many servers • How to storage and access to data • How to analyze and visualization of billions of data
  • 4. Agenda • My Company’s Analytic Architecture • How to Handle Access Logs • How to Handle User Trace Logs • How to Collaborate with Front Analytic Tools • My Future Analytic Architecture
  • 5. Agenda Hadoop, Mongo Map Reduce • My Company’s Analytic Architecture Hadoop, Schema Free • How to Handle Access Logs • How to Handle User Trace Logs REST Interface, JSON • How to Collaborate with Front Analytic Tools Capped Collection, • My Future Analytic Architecture Modifier Operation Of Course Everything With
  • 7. Social Game (Mobile): Omiseyasan • Enjoy arranging their own shop (and avatar) • Communicate with other users by shopping, part-time, ... • Buy seeds of items to display their own shop
  • 9. Back-end Architecture Pretreatment: Trimming, As a Central Data Server Validation, Filtering,... Dumbo (Hadoop Streaming) PyMongo Back Up To S3
  • 10. Front-end Architecture sleepy.mongoose (REST Interface) PyMongo Web UI Social Data Analysis Data Analysis
  • 11. Environment • MongoDB: 1.6.4 • PyMongo: 1.9 • Hadoop: CDH2 ( soon update to CDH3 ) • Dumbo: Simple Python Module for Hadoop Streaming • Cassandra: 0.6.11 • R, Neo4j, jQuery, Munin, ... • [Data Size (a rough estimate)] • Access Log 15GB / day ( gzip ) - 2,000M PV • User Trace Log 5GB / day ( gzip )
  • 12. How to Handle Access Logs
  • 13. How to Handle Access Logs Pretreatment: Trimming, As a Data Server Validation, Filtering, ... Back Up To S3
  • 14. Access Data Flow Caution: need MongoDB >= 1.7.4 user_pageview agent_pageview daily_pageview Pretreatment 2nd Map Reduce user_access hourly_pageview 1st Map Reduce Group by
  • 15. Hadoop • Using Hadoop: Pretreatment Raw Records • [Map / Reduce] • Read all records • Split each record by ‘¥s’ • Filter unnecessary records (such as *.swf) • Check records whether correct or not • Insert (save) records to MongoDB ※ write operations won’t yet fully utilize all cores
  • 16. Access Logs 110.44.178.25 - - [19/Nov/2010:04:40:40 +0900] "GET /playshop.4ce13800/battle/ BattleSelectAssetPage.html;jsessionid=9587B0309581914AB7438A34B1E51125-n15.at3?collec tion=12&opensocial_app_id=00000&opensocial_owner_id=00000 HTTP/1.0" 200 6773 "-" "DoCoMo/2.0 ***" 110.44.178.26 - - [19/Nov/2010:04:40:40 +0900] "GET /playshop.4ce13800/shopping/battle/ ShoppingBattleTopPage.html;jsessionid=D901918E3CAE46E6B928A316D1938C3A-n11.a p1?opensocial_app_id=00000&opensocial_owner_id=11111 HTTP/1.0" 200 15254 "-" "DoCoMo/2.0 ***" 110.44.178.27 - - [19/Nov/2010:04:40:40 +0900] "GET /playshop.4ce13800/battle/ BattleSelectAssetDetailPage;jsessionid=202571F97B444370ECB495C2BCC6A1D5-n14.at11?asse t=53&collection=9&opensocial_app_id=00000&opensocial_owner_id=22222 HTTP/1.0" 200 11616 "-" "SoftBank/***" ...(many records)
  • 17. Collection: user_trace > db.user_trace.find({user: "7777", date: "2011-02-12"}).limit(0) .forEach(printjson) { "_id" : "2011-02-12+05:39:31+7777+18343+Access", "lastUpdate" : "2011-02-19", "ipaddr" : "202.32.107.166", "requestTimeStr" : "12/Feb/2011:05:39:31 +0900", "date" : "2011-02-12", "time" : "05:39:31", "responseBodySize" : 18343, "userAgent" : "DoCoMo/2.0 SH07A3(c500;TB;W24H14)", "statusCode" : "200", "splittedPath" : "/avatar2-gree/MyPage, "userId" : "7777", "resource" : "/avatar2-gree/MyPage;jsessionid=...? battlecardfreegacha=1&feed=...&opensocial_app_id=...&opensocial_viewer_id=...& opensocial_owner_id=..." }
  • 18. 1st Map Reduce • [Aggregation] • Group by url, date, userId • Group by url, date, userAgent • Group by url, date, time • Group by url, date, statusCode • Map Reduce operations runs in parallel on all shards
  • 19. 1st Map Reduce with PyMongo map = Code(""" function(){ • this.userId emit({ path:this.splittedPath, • this.userAgent userId:this.userId, date:this.date },1)} • this. timeRange """) • this.statusCode reduce = Code(""" function(key, values){ var count = 0; values.forEach(function(v) { count += 1; }); return {"count": count, "lastUpdate": today}; } """)
  • 20. # ( mongodb >= 1.7.4 ) result = db.user_access.map_reduce(map, reduce, marge_out="user_pageview", full_response=True, query={"date": date}) • About output collection, there are 4 options: (MongoDB >= 1.7.4) • out : overwrite collection if already exists • marge_output : merge new data into the old output collection • reduce_output : reduce operation will be performed on the two values (the same key on new result and old collection) and the result will be written to the output collection. • full_responce (=false) : If True, return on stats on the operation. If False, No collection will be created, and the whole map-reduce operation will happen in RAM. The Result set fits within the 8MB/doc limit (16MB/doc in 1.8?).
  • 21. Map Reduce (>=1.7.4): out option in JavaScript • "collectionName" : If you pass a string indicating the name of a collection, then the output will replace any existing output collection with the same name. • { merge : "collectionName" } : This option will merge new data into the old output collection. In other words, if the same key exists in both the result set and the old collection, the new key will overwrite the old one. • { reduce : "collectionName" } : If documents exists for a given key in the result set and in the old collection, then a reduce operation (using the specified reduce function) will be performed on the two values and the result will be written to the output collection. If a finalize function was provided, this will be run after the reduce as well. • { inline : 1} : With this option, no collection will be created, and the whole map- reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 8MB limit. http://www.mongodb.org/display/DOCS/MapReduce
  • 22. Collection: user_pageview > db.user_pageview.find({ "_id.userId": "7777", • Regular Expression "_id.path": "/.*MyPage$/", "_id.date": {$lte: "2011-02-12"} • <, >, <=, >= ).limit(1).forEach(printjson) ##### { "_id" : { "date" : "2011-02-12", "path" : "/avatar2-gree/MyPage", "userId" : "7777", }, "value" : { "count" : 10, "lastUpdate" : "2011-02-19" } }
  • 23. 2nd Map Reduce with PyMongo map = Code(""" function(){ emit({ "path" : this._id.path, "date": this._id.date, },{ "pv": this.value.count, "uu": 1 }); } """) reduce = Code(""" function(key, values){ var pv = 0; var uu = 0; values.forEach(function(v){ pv += v.pv; uu += v.uu; }); return {"pv": pv, "uu": uu}; } """)
  • 24. 2nd Map Reduce with PyMongo map = Code(""" function(){ emit({ "path" : this._id.path, "date": this._id.date, },{ "pv": this.value.count, "uu": 1 }); } """) reduce = Code(""" function(key, values){ var pv = 0; Must be the same key var uu = 0; ({“pv”: NaN} if not) values.forEach(function(v){ pv += v.pv; uu += v.uu; }); return {"pv": pv, "uu": uu}; } """)
  • 25. # ( mongodb >= 1.7.4 ) result = db.user_pageview.map_reduce(map, reduce, marge_out="daily_pageview", full_response=True, query={"date": date})
  • 26. Collection: daily_pageview > db.daily_pageview.find({ "_id.date": "2011-02-12", "_id.path": /.*MyPage$/ }).limit(1).forEach(printjson) { "_id" : { "date" : "2011-02-12", "path" : "/avatar2-gree/MyPage", }, "value" : { "uu" : 53536, "pv" : 539467 } }
  • 27. Current Map Reduce is Imperfect • [Single Threads per node] • Doesn't scale map-reduce across multiple threads • [Overwrite the Output Collection] • Overwrite the old collection ( no other options like “marge” or “reduce” ) # mapreduce code to merge output (MongoDB < 1.7.4) result = db.user_access.map_reduce(map, reduce, full_response=True, out="temp_collection", query={"date": date}) [db.user_pageview.save(doc) for doc in temp_collection.find()]
  • 28. Useful Reference: Map Reduce • http://www.mongodb.org/display/DOCS/MapReduce • ALookAt MongoDB 1.8's MapReduce Changes • Map Reduce and Getting Under the Hood with Commands • Map/reduce runs in parallel/distributed? • Map/Reduce parallelism with Master/SlaveA • mapReduce locks the whole server • mapreduce vs find
  • 29. How to Handle User Trace Logs
  • 30. How to Handle User TRACE Logs Pretreatment: Trimming, As a Data Server Validation, Filtering, ... Back Up To S3
  • 31. User Trace / Charge Data Flow user_charge Pretreatment daily_charge user_trace daily_trace
  • 33. Hadoop • Using Hadoop: Pretreatment Raw Records • [Map / Reduce] • Split each record by ‘¥s’ • Filter Unnecessary Records • Check records whether user behaves dishonestly • Unify format to be able to sum up ( Because raw records are written by free format ) • Sum up records group by “userId” and “actionType” • Insert (save) records to MongoDB ※ write operations won’t yet fully utilize all cores
  • 34. An Example of User Trace Log UserId ActionType ActionDetail
  • 35. An Example of User Trace Log -----Change------ ActionLogger a{ChangeP} (Point,1371,1383) ActionLogger a{ChangeP} (Point,2373,2423) ------Get------ ActionLogger a{GetMaterial} (syouhinnomoto,0,-1) The value of “actionDerail” ActionLogger a{GetMaterial} usesyouhinnomoto ActionLogger a{GetMaterial} (omotyanomotoPRO,1,6) must be unified format -----Trade----- ActionLogger a{Trade} buy 3 itigoke-kis from gree.jp:00000 # -----Make----- ActionLogger a{Make} make item kuronekono_n ActionLogger a{MakeSelect} make item syouhinnomoto ActionLogger a{MakeSelect} (syouhinnomoto,0,1) -----PutOn/Off----- ActionLogger a{PutOff} put off 1 ksuteras ActionLogger a{PutOn} put 1 burokkus @2500 -----Clear/Clean----- ActionLogger a{ClearLuckyStar} Clear LuckyItem_1 4 times -----Gatcha----- ActionLogger a{Gacha} Play gacha with first free play: ActionLogger a{Gacha} Play gacha:
  • 36. Collection: user_trace > db.user_trace.find({date:"2011-02-12”, actionType:"a{Make}", userId:”7777"}).forEach(printjson) { "_id" : "2011-02-12+7777+a{Make}", "date" : "2011-02-12" "lastUpdate" : "2011-02-19", "userId" : ”7777", "actionType" : "a{Make}", Sum up values group by "actionDetail" : { “userId” and “actionType” "make item ksutera" : 3, "make item makaron" : 1, "make item huwahuwamimiate" : 1, … } }
  • 37. Collection: daily_trace > db.daily_trace.find({ date:{$gte:"2011-02-12”,$lte:”2011-02-19”}, actionType:"a{Make}"}).forEach(printjson) { "_id" : "2011-02-12+group+a{Make}", "date" : "2011-02-12", "lastUpdate" : "2011-02-19", "actionType" : "a{Make}", "actionDetail" : { "make item kinnokarakuridokei" : 615, "make item banjo-" : 377, "make item itigoke-ki" : 135904, ... }, ... }...
  • 39. Collection: user_charge // TOP10 Users at 2011-02-12 abount Accounting > db.user_charge.find({date:"2011-02-12"}) .sort({totalCharge:-1}).limit(10).forEach(printjson) { "_id" : "2011-02-12+7777+Charge", "date" : "2011-02-12", "lastUpdate" : "2011-02-19", "totalCharge" : 10000, "userId" : ”7777", "actionType" : "Charge", Sum up values group by "boughtItem" : { “userId” and “actionType” " EX" : 13, " +6000" : 3, " PRO" : 20 } } {…
  • 40. Collection: daily_charge > db.daily_charge.find({date:"2011-02-12",T:"all"}) .limit(10).forEach(printjson) { "_id" : "2011-02-12+group+Charge+all+all", "date" : "2011-02-12", "total" : 100000, "UU" : 2000, "group" : { " " : 1000000, " " : 1000000, ... }, "boughtItemNum" : { " EX" : 8, " " : 730, ... }, "boughtItem" : { " EX" : 10000, " " : 100000, ... } }
  • 42. Categorize Users user_trace Attribution • [Categorize Users] user_registrat • by play term Attribution ion user_charge • by total amount of charge • by registration Attribution date user_savedata user_category Attribution • [ Take an Snapshot of Each Category’s user_pageview Stats per Week]
  • 43. Collection: user_registration > db.user_registration.find({userId:”7777"}).forEach(printjson) { "_id" : "2010-06-29+7777+Registration", "userId" : ”7777" "actionType" : "Registration", Tagging User "category" : { “R1” : “True”, # “T” : “ll” # … }, “firstCharge” : “2010-07-07”, # “lastLogin” : “2010-09-30”, # “playTerm” : 94, “totalCumlativeCharge” : 50000, # “totalMonthCharge” : 10000, # … }
  • 44. Collection: user_category > var cross = new Cross() # User Definition Function > MCResign = cross.calc(“2011-02-12”,“MC”,1) # each value is the number of the user # Charge(yen)/Term(day) 0(z) ~¥1k(s) ~¥10k(m) ¥100k~(l) total ~1day(z) 50000 10 5 0 50015 ~1week(s) 50000 100 50 3 50153 ~1month(m) 100000 200 100 1 100301 ~3month(l) 100000 300 50 6 100356 month~(ll) 0 0 0 0 0
  • 45. How to Collaborate With Front Analytic Tools
  • 46. Front-end Architecture sleepy.mongoose (REST Interface) PyMongo Web UI Social Data Analysis Data Analysis
  • 47. Web UI and Mongo
  • 48. Data Table: jQuery.DataTables [ Data Table ] • 1 Variable length pagination 2 On-the-fly filtering 3 Multi-column sorting with data type detection • Want to Share Daily Summary 4 Smart handling of column widths 5 Scrolling options for table • Want to See Data from Many Viewpoint viewport 6 ... • Want to Implement Easily
  • 49. Graph: jQuery.HighCharts [ Graph ] • 1. Numerous Chart Types 2. Simple Configuration Syntax 3. Multiple Axes • Want to Visualize Data 4. Tooltip Labels • Handle Time Series Data Mainly 5. Zooming • Want to Implement Easily 6. ...
  • 50. sleepy.mongoose • [REST Interface + Mongo] • Get Data by HTTP GET/POST Request • sleepy.mongoose ‣ request as “/db_name/collection_name/_command” ‣ made by a 10gen engineer: @kchodorow ‣ Sleepy.Mongoose: A MongoDB REST Interface
  • 51. sleepy.mongoose //start server > python httpd.py …listening for connections on http://localhost:27080 //connect to MongoDB > curl --data server=localhost:27017 'http://localhost:27080/ _connect’ //request example > http://localhost:27080/playshop/daily_charge/_find?criteria={} &limit=10&batch_size=10 {"ok": 1, "results": [{“_id": “…”, ”date":… },{“_id”:…}], "id": 0}}
  • 52. JSON: Mongo <---> Ajax sleepy.mongoose (REST Interface) Get JSON • jQuery library and MongoDB are compatible • It is not necessary to describe HTML tag(such as <table>)
  • 54.
  • 55.
  • 57. Collection: user_registration > db.user_registration.find({userId:”7777"}).forEach(printjson) { "_id" : "2010-06-29+7777+Registration", Want to know the relation "userId" : ”7777" between user attributions "actionType" : "Registration", "category" : { “R1” : “True”, # “T” : “ll” # … }, “firstCharge” : “2010-07-07”, # “lastLogin” : “2010-09-30”, # “playTerm” : 94, “totalCumlativeCharge” : 50000, # “totalMonthCharge” : 10000, # … }
  • 58. R Code: Access MongoDB Using sleepy.mongoose ##### LOAD LIBRARY ##### library(RCurl) library(rjson) ##### CONF ##### today.str <- format(Sys.time(), "%Y-%m-%d") url.base <- "http://localhost:27080" mongo.db <- "playshop" mongo.col <- "user_registration" mongo.base <- paste(url.base, mongo.db, mongo.col, sep="/") mongo.sort <- "" mongo.limit <- "limit=100000" mongo.batch <- "batch_size=100000"
  • 59. R Code: Access MongoDB Using sleepy.mongoose ##### FUNCTION ##### find <- function(query){ mongo <- fromJSON(getURL(url)) docs <- mongo$result makeTable(docs) # My Function } # Example # Using sleepy.mongoose https://github.com/kchodorow/sleepy.mongoose mongo.criteria <- "_find?criteria={ ¥ "totalCumlativeCharge":{"$gt":0,"$lte":1000}}" mongo.query <- paste(mongo.criteria, mongo.sort, ¥ mongo.limit, mongo.batch, sep="&") url <- paste(mongo.base, mongo.query, sep="/") user.charge.low <- find(url)
  • 60. The Result # Result: 10th Document [[10]] [[10]]$playTerm [1] 31 [[10]]$lastUpdate [1] "2011-02-24" [[10]]$userId [1] "7777" [[10]]$totalCumlativeCharge [1] 10000 [[10]]$lastLogin [1] "2011-02-21" [[10]]$date [1] "2011-01-22" [[10]]$`_id` [1] "2011-02-12+18790376+Registration" ...
  • 61. Make a Data Table from The Result # Result: Translate Document to Table playTerm totalWinRate totalCumlativeCharge totalCommitNum totalWinNum [1,] 56 42 1000 533 224 [2,] 57 33 1000 127 42 [3,] 57 35 1000 654 229 [4,] 18 31 1000 49 15 [5,] 77 35 1000 982 345 [6,] 77 45 1000 339 153 [7,] 31 44 1000 70 31 [8,] 76 39 1000 229 89 [9,] 40 21 1000 430 92 [10,] 26 40 1000 25 10 ...
  • 62. Scatter Plot / Matrix Each Category (User Attribution) # Run as a batch command $ R --vanilla --quiet < mongo2R.R
  • 64. Monitoring DB Stats Munin configuration examples - MongoDB https://github.com/erh/mongo-munin https://github.com/osinka/mongo-rs-munin
  • 66. Realtime Analysis Access Logs Flume with MongoDB RealTime (hourly) capped collection user_access daily/hourly (per hour) Trimming MapReduce _access Filtering Modifier Sum Up Sum Up capped daily/hourly collection user_trace (per hour) _trace RealTime (hourly) User Trace Logs
  • 67. Flume Server A Hourly / Server B Realtime Server C Flume Plugin Mongo Collector DB Server D Server E Access Log User Trace Log Server F
  • 68. An Output From Mongo-Flume Plugin > db.flume_capped_21.find().limit(1).forEach(printjson) { "_id" : ObjectId("4d658187de9bd9f24323e1b6"), "timestamp" : "Wed Feb 23 2011 21:52:06 GMT+0000 (UTC)", "nanoseconds" : NumberLong("562387389278959"), "hostname" : "ip-10-131-27-115.ap-southeast-1.compute.internal", "priority" : "INFO", "message" : "202.32.107.42 - - [14/Feb/2011:04:30:32 +0900] "GET / avatar2-gree.4d537100/res/swf/avatar/18051727/5/useravatar1582476746.swf? opensocial_app_id=472&opensocial_viewer_id=36858644&o pensocial_owner_id=36858644 HTTP/1.1" 200 33640 "-" "DoCoMo/2.0 SH01C (c500;TB;W24H16)"", "metadata" : {} } Mongo Flume Plugin: https://github.com/mongodb/mongo-hadoop/tree/master/flume_plugin
  • 70. Summary • Almighty as a Analytic Data Server • schema-free: social game data are changeable • rich queries: important for analyze many point of view • powerful aggregation: map reduce • mongo shell: analyze from mongo shell are speedy and handy • More... • Scalability: using Replication, Sharding are very easy • Node.js: It enable us server side scripting with Mongo
  • 71. My Presentation MongoDB UI MongoDB : http://www.slideshare.net/doryokujin/mongodb-uimongodb MongoDB Ajax GraphDB : http://www.slideshare.net/doryokujin/mongodbajaxgraphdb-5774546 Hadoop MongoDB : http://www.slideshare.net/doryokujin/hadoopmongodb GraphDB GraphDB : http://www.slideshare.net/doryokujin/graphdbgraphdb
  • 72. I ♥ MongoDB JP • continue to be a organizer of MongoDB JP • continue to propose many use cases of MongoDB • ex: Social Data, Log Data, Medical Data, ... • support MongoDB users • by document translation, user-group, IRC, blog, book, twitter,... • boosting services and products using MongoDB
  • 73. Thank you for coming to Mongo Tokyo!! [Contact me] twitter: doryokujin skype: doryokujin mail: mr.stoicman@gmail.com blog: http://d.hatena.ne.jp/doryokujin/ MongoDB JP: https://groups.google.com/group/mongodb-jp?hl=ja