8. September 2015

"Twitter Analyse" - mit nichts als Oracle12c und SQL/JSON!

Heute stelle ich ein kleines Anwendungsbeispiel für die JSON-Unterstützung in Oracle12c vor. Mittlerweile ist die Version 12.1.0.2 (inkl. JSON-Support) ja auch in der Standard Edition verfügbar, so dass der JSON-Nutzung nun nicht mehr viel im Wege steht ...
Ein sehr nettes Anwendungsbeispiel ist die Analyse von Twitter-Daten. Und auch mit Bordmitteln kann man (in bescheidenem Umfang) schon einiges anstellen. Es braucht zunächst einen Weg, die Tweets von Twitter zu bekommen und in die Datenbank zu laden - da diese in JSON vorliegen, kann man danach die SQL/JSON-Funktionen nutzen, um Bilder wie die folgenden zu produzieren. Diese Screenshots wurden von einer APEX-Anwendung gemacht, die SQL/JSON Abfragen auf eine Tabelle mit 10.000 Tweets durchführt.

Twitter-Analyse: Wer twittert über #orclapex

Twitter-Analyse: Welche Hashtags werden zusammen mit #orclapex verwendet?

Zunächst ist also die Frage, wie man Twitter-Daten in die Datenbank bekommt. Auf diesem Blog zeige ich einen Weg auf, dies mit Datenbank-Bordmitteln und der Twitter-API zu erreichen; dass es darüber hinaus auch kommerzielle Wege gibt, ist völlig klar.
Auf Twitter kann man über eine REST-API per HTTP zugreifen. Über diese API kann man Twitter-Daten abrufen, aber auch Tweets absetzen. Wie man einen Tweet aus der Datenbank heraus absetzen kann, habe ich bereits vor einiger Zeit auf diesem Blog beschrieben. Dieses Posting ist dann auch die Grundlage für heute.
  • Registriert, wie im Blog-Posting beschrieben, eine "Anwendung" auf Twitter. Am Ende bekommt Ihr die nötigen Schlüssel für die Anwendung angezeigt: Access Token und Access Token Secret. Diese, den Consumer Key und das Consumer Secret merkt euch bitte - das brauchen wir gleich noch.
  • Ladet twitter4j von der Webseite twitter4j.org herunter. Twitter4j ist Open Source unter der Apache Lizenz - kann also für alle Anwendungsfälle, sei es kommerziell oder nicht, verwendet werden. Das ZIP-Archiv muss in ein Verzeichnis eurer Wahl ausgepackt werden - am besten legt Ihr es gleich auf den Datenbankserver.
  • Nach dem Auspacken befinden sich die Java-Bibliotheken im Unterverzeichnis lib. Für das einfache Abrufen von Tweets braucht es nur das Archiv twitter4j-core-4.0.4.jar. Das wird nun wie folgt in die Datenbank geladen.
    $ loadjava -user {dbuser}/{dbpassword}@{hostname}:{port}/{servicename} -o -r -v twitter4j-core-4.0.4.jar
    
    Ihr sehr nun einige Meldungen und zum Schluß kommen einige Fehlermeldungen, die Ihr aber ignorieren könnt.
    skipping : resource META-INF/maven/org.twitter4j/twitter4j-core/pom.xml
    skipping : resource META-INF/maven/org.twitter4j/twitter4j-core/pom.properties
    The following operations failed
        class twitter4j/internal/logging/SLF4JLogger: resolution
        class twitter4j/internal/logging/SLF4JLoggerFactory: resolution
    exiting  : Failures occurred during processing
    
Damit sind die Vorbereitungen abgeschlossen. Nun braucht es noch etwas Code, damit die Twitter4j-Bibliothek für die SQL-Ebene in der Datenbank verfügbar wird. Zunächst etwas Java Code - die folgende Java-Klasse kapselt die Aufrufe in die Twitter4j-Bibliothek und stellt Methoden bereit, die direkt auf eine PL/SQL Funktion abgebildet werden können. Achtet darauf, den Proxy-Server in den Zeilen 33 bis 36 einzutragen und die Twitter-Accountschlüssel (Access Token, Access Token Secret, Consumer Key und Consumer Secret ) in den Zeilen 48 bis 51 einzutragen. Danach lasst Ihr das Skript in der Datenbank laufen.
set define off

create or replace and compile java source named TwitterInterfaceCode as
import twitter4j.Status;
import twitter4j.Query;
import twitter4j.QueryResult;
import twitter4j.Twitter;
import twitter4j.json.*;
import twitter4j.conf.*;
import twitter4j.TwitterException;
import twitter4j.TwitterFactory;
import twitter4j.auth.AccessToken;
import twitter4j.auth.RequestToken;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

import java.util.*;
import oracle.jdbc.*;
import oracle.sql.*;
import java.sql.*;

public class TwitterInterface {
  static Twitter twitter = null;
  static Vector v = new Vector();
  static Object[] o = new Object[3];
  static Connection con = null;
  static ArrayDescriptor aDescr = null;
  static ArrayDescriptor aJsonDescr = null;
  static StructDescriptor sDescr = null;

  static  {
    System.setProperty("http.proxyHost",  "{proxyserver-hostname}");
    System.setProperty("http.proxyPort",  "{proxyserver-port}");
    System.setProperty("https.proxyHost", "{proxyserver-hostname}");
    System.setProperty("https.proxyPort", "{proxyserver-port}");
 
    ConfigurationBuilder confbuilder  = new ConfigurationBuilder(); 
    // confbuilder.setUseSSL(true); 

    /*
     ADD HERE: 
       AccessToken, AccessTokenSecret, ConsumerKey and ConsumerSecret
       
     YOU GET THESE VALUES FROM TWITTER AFTER REGISTERING YOUR APPLICATION
    */    

    confbuilder.setOAuthAccessToken("") 
    .setOAuthAccessTokenSecret("") 
    .setOAuthConsumerKey("") 
    .setOAuthConsumerSecret(""); 


    confbuilder.setJSONStoreEnabled(true);
    twitter = new TwitterFactory(confbuilder.build()).getInstance(); 

    try {
      con = DriverManager.getConnection("jdbc:default:connection:");
      aJsonDescr = ArrayDescriptor.createDescriptor("STATUS_JSON_CT", con);
    } catch (Exception e) {
      e.printStackTrace(System.out);
    }
  }

  public static ARRAY getJsonTweets(String pSearch, long pSinceId) throws Exception {
    v.setSize(0);
    CLOB cl;
    Query query = new Query(pSearch).count(50).sinceId(pSinceId);
    QueryResult result = twitter.search(query);

    for (Status tweet : result.getTweets()) {
      cl = CLOB.createTemporary(con, true, CLOB.DURATION_CALL);
      cl.setString(1, DataObjectFactory.getRawJSON(tweet));
      v.add(cl);
    }
    while (v.size() < 400 && result.hasNext()) {
      query = result.nextQuery();
      result = twitter.search(query);
      for (Status tweet : result.getTweets()) {
        cl = CLOB.createTemporary(con, true, CLOB.DURATION_CALL);
        cl.setString(1, DataObjectFactory.getRawJSON(tweet));
        v.add(cl);
      }
    }  
    return new ARRAY(aJsonDescr, con, v.toArray());
  }
}
/
sho err
Danach folgt der PL/SQL-Code. Es braucht nur einen Object Type und eine PL/SQL-Function, welche direkt auf die Java-Methode TwitterInterface.getJsonTweets abgebildet wird.
create type status_json_ct as table of clob
/

create or replace function get_tweets_json(p_search in varchar2, p_Id in number) return status_json_ct
is language java name 'TwitterInterface.getJsonTweets(java.lang.String, long) return oracle.sql.ARRAY';
/
sho err
Bevor man die Table-Function GET_TWEETS_JSON nun nutzen kann, fehlen aber noch einige Java-Privilegien. Diese sind nötig, damit die Java Stored Procedure den Proxy-Server setzen und sich über das Netzwerk zu Twitter verbinden kann. Der folgende Aufruf muss als DBA (SYS) gemacht werden. Es wird davon ausgegangen, dass der DB-User, mit dem Ihr arbeitet, TWITTER heißt - wenn nicht, müsst Ihr das Skript anpassen. Anschließend solltet Ihr die Datenbanksession, mit der Ihr den Java-Code geladen habt, als TWITTER schließen und wieder öffnen. Damit stellt Ihr sicher, dass die gesetzten Java-Privilegien auch wirklich geladen werden.
begin
 dbms_java.grant_permission('TWITTER', 'SYS:java.util.PropertyPermission', 'http.proxyHost', 'read,write' );
 dbms_java.grant_permission('TWITTER', 'SYS:java.util.PropertyPermission', 'http.proxyPort', 'read,write' );
 dbms_java.grant_permission('TWITTER', 'SYS:java.util.PropertyPermission', 'https.proxyHost', 'read,write' );
 dbms_java.grant_permission('TWITTER', 'SYS:java.util.PropertyPermission', 'https.proxyPort', 'read,write' );

 dbms_java.grant_permission('TWITTER', 'SYS:java.lang.reflect.ReflectPermission', 'suppressAccessChecks', '' );

 dbms_java.grant_permission('TWITTER', 'SYS:java.net.SocketPermission', 'api.twitter.com:443', 'connect,resolve' );
end;
/
sho err

commit
/
Nun könnt Ihr die Table-Function (als TWITTER) testen. Erster Versuch.
SQL> select * from table(GET_TWEETS_JSON('#sqlmagic',-1));

COLUMN_VALUE
--------------------------------------------------------------------------------
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr

12 Zeilen ausgewählt.
Da sieht man noch nicht viel ... mit einem set long 20000 könnt Ihr das gesamte JSON in SQL*Plus sichtbar machen - allerdings werder Ihr sehen, dass es ein wenig Einarbeitung in das JSON-Format von Twitter braucht ...
SQL> select * from table(GET_TWEETS_JSON('#sqlmagic',-1)) where rownum = 1;

COLUMN_VALUE
--------------------------------------------------------------------------------
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
uncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":640811209986572
288,"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"created_at":"Mo
n Sep 07 08:57:52 +0000 2015","favorite_count":0,"place":null,"coordinates":null
,"metadata":{"result_type":"recent","iso_language_code":"en"},"text":"RT @cczars
ki: Need to process #HTML text with SQL or #PLSQL ..? Deal with entity reference
s as follows #orcldb #sqlmagic http://t.co/IqQrTJ\u2026","contributors":null,"re
tweeted_status":{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sens
itive":false,"truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id"
:640804815308759040,"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,
"created_at":"Mon Sep 07 08:32:28 +0000 2015","favorite_count":3,"place":null,"c
oordinates":null,"metadata":{"result_type":"recent","iso_language_code":"en"},"t
ext":"Need to process #HTML text with SQL or #PLSQL ..? Deal with entity referen
ces as follows #orcldb #sqlmagic http://t.co/IqQrTJjhpC","contributors":null,"ge
o":null,"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"HTML","indices":
[16,21]},{"text":"PLSQL","indices":[39,45]},{"text":"orcldb","indices":[89,96]},
:
Aber ein wenig was kann ich hier schon verraten ... die folgende Abfrage lädt Tweets, die das Hashtag #orcldb verwenden und listet davon den Text, den Twitter-User und den Zeitpunkt. Retweets (also das "Weitersagen" auf Twitter) werden ausgeblendet.
select screen_name, created, text 
from 
  table(get_tweets_json('#orcldb', -1)), 
  json_table(
    column_value,
    '$'
    columns
      screen_name varchar2(20) path '$.user.screen_name',
      text        varchar2(140) path '$.text',
      created     varchar2(30) path '$.created_at'
  )
  where not json_exists(column_value, '$.retweeted_status')
/
Es kommt in etwa folgendes Ergebnis
SCREEN_NAME          CREATED
-------------------- ------------------------------
TEXT
--------------------------------------------------------------------------------
UKOUG                Mon Sep 07 10:03:05 +0000 2015
@FranckPachot examines what happens when deleting a datafile in #RAC and #DB12c
http://t.co/0QkkcCZ9CH #OracleScene #OracleACE #orcldb

cczarski             Mon Sep 07 08:32:28 +0000 2015
Need to process #HTML text with SQL or #PLSQL ..? Deal with entity references as
 follows #orcldb #sqlmagic http://t.co/IqQrTJjhpC

orclDBblogs          Sun Sep 06 23:00:58 +0000 2015
ODTUG Board of Directors' Responsibilities and Nomination Guidelines @odtug http
://t.co/55nYFLjqhC #orcldb #odtug

orclDBblogs          Sun Sep 06 16:10:54 +0000 2015
RMAN  --  9  :  Querying the RMAN Views / Catalog @hemantkchitale http://t.co/qo
ibfCKrLr #orcldb #odtug
Eine zweite Query: Welche Hashtags wurden verwendet - und wie oft ...?
select hashtag, count(*) anzahl from 
  table(get_tweets_json('#orcldb', -1)), 
  json_table(
    column_value,
    '$.entities.hashtags[*]'
    columns
      hashtag varchar2(50) path '$.text'
  )
where not json_exists(column_value, '$.retweeted_status')
group by hashtag order by 2 desc
/

HASHTAG                                                ANZAHL
-------------------------------------------------- ----------
orcldb                                                     41
odtug                                                      35
orclDB                                                      3
Kscope16                                                    2
sqlmagic                                                    1
orclapex                                                    1
HTML                                                        1
Es sei noch erwähnt, dass die Twitter API nur Tweets der in etwa letzten sieben Tage zurückliefert - möchte man Tweets über einen längeren Zeitraum haben, so erreicht man dies am einfachsten, indem man das JSON-Ergebnis der Funktion GET_TWEETS_JSON in eine Tabelle kopiert und das per Scheduler Job täglich durchführt. Da Twitter jeden Tweet im $.id mit einem eindeutigen Schlüssel versieht, ist das tägliche Merge kein Problem.
SQL> select json_value(column_value, '$.id') from table(GET_TWEETS_JSON('#sqlmagic',-1))
  2  /

JSON_VALUE(COLUMN_VALUE,'$.ID')
--------------------------------------------------------------------------------
640811209986572288
640809906568888320
640804815308759040
639480213349527553
639478975169986560
639452231264133120
639133679906746371
639109588076392448
639071128565219328
639070814042746880
639045511656423424
639043399853719552

12 Zeilen ausgewählt.
Hat man dann Tweets über einen gewissen Zeitraum gesammelt, lassen sich mit den SQL/JSON Funktionen bequeme Auswertungen fahren. Neben der hier verwendeten Java API twitter4J ist ein Twitter-Zugriff auch mit reinem PL/SQL möglich; die Pakete UTL_HTTP bzw. APEX_WEB_SERVICE bringen dazu alles Nötige mit. Der komplette Setup erfordert allerdings doch ein paar Schritte mehr (Aufsetzen der PL/SQL Network ACL, Aufsetzen eines Oracle Wallet zur Zertifikatsvalidierung), so dass dies einem eigenen Posting vorbehalten bleiben muss.
Today, I'll present a tiny usage example for the SQL/JSON functions in Oracle12c. In the meantime, version 12.1.0.2 is available as Standard Edition as well, so that JSON can now be used by (close to) everyone.
We'll use the SQL/JSON functions to work on twitter data. Twitter data is being provided in JSON format, so that we can achieve pretty nice things with standard database functionality. First, we need to get the tweets into the database - as soon as we have this, we can perform as much SQL/JSON queries as we want. The following images are from an APEX application is performs SQL/JSON queries on a table with about 10,000 tweets.

Twitter-Analysis: Who tweets about #orclapex

Twitter-Analysis: Which Hashtags are being used together with #orclapex?

To get the tweets into the database, we'll use the twitter API - but instead of coding REST requests over HTTP directly, I'd like to use a Java library (twitter4j) for this purpose. There are also commercial data providers, from whom you can get as much twitter data as you want - without programming. But for this blog posting, we stick with open and standard interfaces.
Some time ago, I already published a blog posting about how to access Twitter from the database. This posting was about how to post a tweet (not how to get twitter data) but the technical approach is pretty much the same. So the posting provides the basic steps for our work today: Open it and perform the following steps:
  • Register, as described in the posting, an "application" on twitter. At the end of the process, you'll be provided the Keys for your application: Access Token and Access Token Secret. Keep these as well as the Consumer Key and the Consumer Secret - we'll need it soon.
  • Download twitter4j from twitter4j.org. This is open source software under the Apache license, so it can be used for commercial purposes as well. Upon download, unpack the ZIP archive - it's best to place it directly on the database server.
  • After extracting, the relevant Java libraries are located in the subfolder lib. For our usecase, the "core" library (twitter4j-core-4.0.4.jar) is sufficient. Load it into your database as follows.
    $ loadjava -user {dbuser}/{dbpassword}@{hostname}:{port}/{servicename} -o -r -v twitter4j-core-4.0.4.jar
    
    You'll see a lot of messages and also some error messages at the end - but these can be ignored.
    skipping : resource META-INF/maven/org.twitter4j/twitter4j-core/pom.xml
    skipping : resource META-INF/maven/org.twitter4j/twitter4j-core/pom.properties
    The following operations failed
        class twitter4j/internal/logging/SLF4JLogger: resolution
        class twitter4j/internal/logging/SLF4JLoggerFactory: resolution
    exiting  : Failures occurred during processing
    
Now, preparations are completed. We have registered an application and therefore have access to the Twitter API, and we have loaded a Java library for requests to Twitter into our database. Now comes the actual code. We need a Java Stored Procedure, which calls the Twitter4j API and provides its functionality as "static" Java methods. These can easily be mapped to PL/SQL and SQL functions afterwards. Run the following script to create the Java Stored Procedure, but don't foget to add the Twitter Keys (Access Token, Access Token Secret, Consumer Key and Consumer Secret) in lines 48 to 51, and your proxy server in lines 33 to 36.
set define off

create or replace and compile java source named TwitterInterfaceCode as
import twitter4j.Status;
import twitter4j.Query;
import twitter4j.QueryResult;
import twitter4j.Twitter;
import twitter4j.json.*;
import twitter4j.conf.*;
import twitter4j.TwitterException;
import twitter4j.TwitterFactory;
import twitter4j.auth.AccessToken;
import twitter4j.auth.RequestToken;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

import java.util.*;
import oracle.jdbc.*;
import oracle.sql.*;
import java.sql.*;

public class TwitterInterface {
  static Twitter twitter = null;
  static Vector v = new Vector();
  static Object[] o = new Object[3];
  static Connection con = null;
  static ArrayDescriptor aDescr = null;
  static ArrayDescriptor aJsonDescr = null;
  static StructDescriptor sDescr = null;

  static  {
    System.setProperty("http.proxyHost",  "{proxyserver-hostname}");
    System.setProperty("http.proxyPort",  "{proxyserver-port}");
    System.setProperty("https.proxyHost", "{proxyserver-hostname}");
    System.setProperty("https.proxyPort", "{proxyserver-port}");
 
    ConfigurationBuilder confbuilder  = new ConfigurationBuilder(); 
    // confbuilder.setUseSSL(true); 

    /*
     ADD HERE: 
       AccessToken, AccessTokenSecret, ConsumerKey and ConsumerSecret
       
     YOU GET THESE VALUES FROM TWITTER AFTER REGISTERING YOUR APPLICATION
    */    

    confbuilder.setOAuthAccessToken("") 
    .setOAuthAccessTokenSecret("") 
    .setOAuthConsumerKey("") 
    .setOAuthConsumerSecret(""); 


    confbuilder.setJSONStoreEnabled(true);
    twitter = new TwitterFactory(confbuilder.build()).getInstance(); 

    try {
      con = DriverManager.getConnection("jdbc:default:connection:");
      aJsonDescr = ArrayDescriptor.createDescriptor("STATUS_JSON_CT", con);
    } catch (Exception e) {
      e.printStackTrace(System.out);
    }
  }

  public static ARRAY getJsonTweets(String pSearch, long pSinceId) throws Exception {
    v.setSize(0);
    CLOB cl;
    Query query = new Query(pSearch).count(50).sinceId(pSinceId);
    QueryResult result = twitter.search(query);

    for (Status tweet : result.getTweets()) {
      cl = CLOB.createTemporary(con, true, CLOB.DURATION_CALL);
      cl.setString(1, DataObjectFactory.getRawJSON(tweet));
      v.add(cl);
    }
    while (v.size() < 400 && result.hasNext()) {
      query = result.nextQuery();
      result = twitter.search(query);
      for (Status tweet : result.getTweets()) {
        cl = CLOB.createTemporary(con, true, CLOB.DURATION_CALL);
        cl.setString(1, DataObjectFactory.getRawJSON(tweet));
        v.add(cl);
      }
    }  
    return new ARRAY(aJsonDescr, con, v.toArray());
  }
}
/
sho err
We still can't do anything in SQL or PL/SQL since we only have Java so far. The following two commands create a PL/SQL table function which is being mapped to the above Java Stored Procedure (TwitterInterface.getJsonTweets).
create type status_json_ct as table of clob
/

create or replace function get_tweets_json(p_search in varchar2, p_Id in number) return status_json_ct
is language java name 'TwitterInterface.getJsonTweets(java.lang.String, long) return oracle.sql.ARRAY';
/
sho err
Now all required code has been loaded to the database. An attempt to run it will still fail, since it tries to do things with the network (Network Connections, Changing the Proxy Server) - and these are protected by the Java Security Manager. So our database user needs appropriate Java privileges. The following script grants these - run it with DBA privileges (SYS). It assumes that your database user is named TWITTER - adjust the script if that is not the case. After running the script, close and reopen your database session as TWITTER - this will make sure, that the new privilege set is being loaded.
begin
 dbms_java.grant_permission('TWITTER', 'SYS:java.util.PropertyPermission', 'http.proxyHost', 'read,write' );
 dbms_java.grant_permission('TWITTER', 'SYS:java.util.PropertyPermission', 'http.proxyPort', 'read,write' );
 dbms_java.grant_permission('TWITTER', 'SYS:java.util.PropertyPermission', 'https.proxyHost', 'read,write' );
 dbms_java.grant_permission('TWITTER', 'SYS:java.util.PropertyPermission', 'https.proxyPort', 'read,write' );

 dbms_java.grant_permission('TWITTER', 'SYS:java.lang.reflect.ReflectPermission', 'suppressAccessChecks', '' );

 dbms_java.grant_permission('TWITTER', 'SYS:java.net.SocketPermission', 'api.twitter.com:443', 'connect,resolve' );
end;
/
sho err

commit
/
Now, you can try your new function out. First attempt ...
SQL> select * from table(GET_TWEETS_JSON('#sqlmagic',-1));

COLUMN_VALUE
--------------------------------------------------------------------------------
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr

12 rows selected.
You don't see much - SQL*Plus cuts the returned CLOB after 80 characters. Do a set long 20000 to see the full JSON content - but you'll quickly realize that it will take some time to get familiar with the Twitter JSON format.
SQL> select * from table(GET_TWEETS_JSON('#sqlmagic',-1)) where rownum = 1;

COLUMN_VALUE
--------------------------------------------------------------------------------
{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sensitive":false,"tr
uncated":false,"lang":"en","in_reply_to_status_id_str":null,"id":640811209986572
288,"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,"created_at":"Mo
n Sep 07 08:57:52 +0000 2015","favorite_count":0,"place":null,"coordinates":null
,"metadata":{"result_type":"recent","iso_language_code":"en"},"text":"RT @cczars
ki: Need to process #HTML text with SQL or #PLSQL ..? Deal with entity reference
s as follows #orcldb #sqlmagic http://t.co/IqQrTJ\u2026","contributors":null,"re
tweeted_status":{"retweeted":false,"in_reply_to_screen_name":null,"possibly_sens
itive":false,"truncated":false,"lang":"en","in_reply_to_status_id_str":null,"id"
:640804815308759040,"in_reply_to_user_id_str":null,"in_reply_to_status_id":null,
"created_at":"Mon Sep 07 08:32:28 +0000 2015","favorite_count":3,"place":null,"c
oordinates":null,"metadata":{"result_type":"recent","iso_language_code":"en"},"t
ext":"Need to process #HTML text with SQL or #PLSQL ..? Deal with entity referen
ces as follows #orcldb #sqlmagic http://t.co/IqQrTJjhpC","contributors":null,"ge
o":null,"entities":{"symbols":[],"urls":[],"hashtags":[{"text":"HTML","indices":
[16,21]},{"text":"PLSQL","indices":[39,45]},{"text":"orcldb","indices":[89,96]},
:
So I'll show you one SQL/JSON query which extracts some useful attributes. The following query loads tweets with the Hashtag #orcldb from the Twitter API (GET_TWEETS_JSON). The SQL/JSON part will then extract the Twitter Handle, the Text and the tweet's timestamp. Retweets will be suppressed (see the JSON_EXISTS part).
select screen_name, created, text 
from 
  table(get_tweets_json('#orcldb', -1)), 
  json_table(
    column_value,
    '$'
    columns
      screen_name varchar2(20) path '$.user.screen_name',
      text        varchar2(140) path '$.text',
      created     varchar2(30) path '$.created_at'
  )
  where not json_exists(column_value, '$.retweeted_status')
/
You should see the following result.
SCREEN_NAME          CREATED
-------------------- ------------------------------
TEXT
--------------------------------------------------------------------------------
UKOUG                Mon Sep 07 10:03:05 +0000 2015
@FranckPachot examines what happens when deleting a datafile in #RAC and #DB12c
http://t.co/0QkkcCZ9CH #OracleScene #OracleACE #orcldb

cczarski             Mon Sep 07 08:32:28 +0000 2015
Need to process #HTML text with SQL or #PLSQL ..? Deal with entity references as
 follows #orcldb #sqlmagic http://t.co/IqQrTJjhpC

orclDBblogs          Sun Sep 06 23:00:58 +0000 2015
ODTUG Board of Directors' Responsibilities and Nomination Guidelines @odtug http
://t.co/55nYFLjqhC #orcldb #odtug

orclDBblogs          Sun Sep 06 16:10:54 +0000 2015
RMAN  --  9  :  Querying the RMAN Views / Catalog @hemantkchitale http://t.co/qo
ibfCKrLr #orcldb #odtug
A second query - which "Hashtags" have been used ...? And how often?
select hashtag, count(*) anzahl from 
  table(get_tweets_json('#orcldb', -1)), 
  json_table(
    column_value,
    '$.entities.hashtags[*]'
    columns
      hashtag varchar2(50) path '$.text'
  )
where not json_exists(column_value, '$.retweeted_status')
group by hashtag order by 2 desc
/

HASHTAG                                                ANZAHL
-------------------------------------------------- ----------
orcldb                                                     41
odtug                                                      35
orclDB                                                      3
Kscope16                                                    2
sqlmagic                                                    1
orclapex                                                    1
HTML                                                        1
Please note, that the Twitter API only returns tweets from the last seven days - so the result set returned by Twitter will always be small. But hey, we are in the database, aren't we ...? With a scheduler job, we can fetch the latest tweets for our Hashtag of Interest every day and store the JSON directly into a table. Since Twitter adds a unique ID attribute to each JSON document (attribute $.id), we can easily merge fetched results into our tweets table.
SQL> select json_value(column_value, '$.id') from table(GET_TWEETS_JSON('#sqlmagic',-1))
  2  /

JSON_VALUE(COLUMN_VALUE,'$.ID')
--------------------------------------------------------------------------------
640811209986572288
640809906568888320
640804815308759040
639480213349527553
639478975169986560
639452231264133120
639133679906746371
639109588076392448
639071128565219328
639070814042746880
639045511656423424
639043399853719552

12 rows selected.
After collecting tweets for some time, we can use the SQL/JSON functions to perform any query we want. In addition to the here described approach using the twitter4j Java API we could also do the job with pure PL/SQL - the packages UTL_HTTP and APEX_WEB_SERVICE provide everything we need. But the complete setup requires a few steps (create PL/SQL Network ACL, create an Oracle Wallet for Certificate Validation), so this needs to be done in a separate blog posting.

Beliebte Postings