Documentation for the Code¶

Community Analysis¶

lib.analysis.community.convert_id_name_community(max_hash, community_txt_file, hash_file_txt, reduced_hash_txt, reduced_community)[source]¶

Converts communites from their ID representation to name for easier understanding

Parameters:	max_hash (int) – max possible hash value community_txt_file (str) – location fo the file having community analysis hash_file_txt (str) – location fo the file having the non-reduced nick_hash reduced_hash_txt (str) – location fo the file having the reduced nick_hash reduced_community (bool) – switch b/w reduced and non reduced communities
Returns:	null

lib.analysis.community.infomap_igraph(ig_graph, net_file_location=None, reduce_graph=False)[source]¶

Performs igraph-infomap analysis on the nx graph

Parameters:	ig_graph (object) – igraph graph object net_file_location (str) – location to load graph from if not mentioned in ig_graph reduce_graph (bool) – toggle between enable/disable reduction
Returns:	igraph object community.membership: result of infomap community analyis
Return type:	ig_graph

lib.analysis.community.select_top_vertices(ig_graph, pajek_type, top_channels=None, top_users=None, top_id_for_channels_and_user_graphs=None, top_id_for_user_graphs=None)[source]¶

Reduces the ig_graph to only include top-nodes

Parameters:	ig_graph (object) – igraph graph object pajek_type (str) – UU/CU/CC
Returns:	updated (reduced) igraph graph object
Return type:	ig_graph

Channel Analysis¶

lib.analysis.channel.conv_len_conv_refr_time(log_dict, nicks, nick_same_list)[source]¶

Calculates the conversation length (CL) that is the length of time for which two users communicate i.e. if a message is not replied to within Response Time(RT), then it is considered as a part of another conversation. This function also calculates the conversation refresh time(CRT) For a pair of users, this is the time when one conversation ends and another one starts. :param log_dict: Dictionary of logs data created using reader.py :type log_dict: str :param nicks: list of nickname created using nickTracker.py :type nicks: List :param nick_same_list: List of same_nick names created using nickTracker.py

Returns:	Conversation Length row_crt(zip List) :Conversation Refresh time
Return type:	row_cl(zip List)

lib.analysis.channel.response_time(log_dict, nicks, nick_same_list)[source]¶

finds the response time of a message i.e. the best guess for the time at which one can expect a reply for his/her message.

Parameters:

log_dict (str) – Dictionary of logs data created using reader.py
nicks (List) – List of nickname created using nickTracker.py
nick_same_list – List of same_nick names created using nickTracker.py
output_directory (str) – Location of output directory

Returns:

Response Time (This refers to the response: time of a message i.e. the best guess for the time at which one can expect a reply for his/her message)

Return type:

rows_RT(zip List)

Network Analysis¶

lib.analysis.network.channel_user_presence_graph_and_csv(nicks, nick_same_list, channels_for_user, nick_channel_dict, nicks_hash, channels_hash)[source]¶

creates a directed graph for each nick, each edge from which points to the IRC Channels that nick has participated in. (Nick changes are tracked here and only the initial nick is shown if a user changed his nick)

Parameters:	nicks (list) – list of all the nicks nick_same_list (list) – list of lists mentioning nicks which belong to same users
Returns:	contains adjacency matrices and graphs for Acc Auu Acu full_presence_graph (nx graph object)
Return type:	presence_graph_and_matrix (dict)

lib.analysis.network.degree_analysis_on_graph(nx_graph, date=None)[source]¶

perform degree analysis of input graph object

Parameters:	nx_graph (nx_object) – object to perform analysis on
Returns:	null

lib.analysis.network.degree_node_number_csv(log_dict, nicks, nick_same_list)[source]¶

creates two csv files having no. of nodes with a certain in and out-degree: for number of nodes it interacted with, respectively. Also gives graphs for log(degree) vs log(no. of nodes) and tries to find it’s equation by curve fitting

Parameters:	log_dict (dict) – with key as dateTime.date object and value as {“data”:datalist,”channel_name”:channels name} nicks (list) – list of all the nicks nick_same_list (list) – list of lists mentioning nicks which belong to same users
Returns:	out_degree (list) in_degree (list) total_degree (list)

lib.analysis.network.filter_edge_list(edgelist_file_loc, max_hash, how_many_top)[source]¶

reduces the edge list by selecting top nodes through degree analysis

Parameters:	edgelist_file_loc (str) – location of the edgelist file max_hash (int) – max possinle value of the node_hash in edgelist how_many_top (int) – how many top nodes to select in the new edgeList
Returns:	null

lib.analysis.network.message_number_bins_csv(log_dict, nicks, nick_same_list)[source]¶

creates a CSV file which tracks the number of message exchanged in a channel: for 48 bins of half an hour each distributed all over the day aggragated over the year.

Parameters:	log_dict (dictionary) – Dictionary of logs data created using reader.py nicks (List) – List of nickname created using nickTracker.py nick_same_list (List) – List of same_nick names created using nickTracker.p
Returns:	null

lib.analysis.network.message_number_graph(log_dict, nicks, nick_same_list, DAY_BY_DAY_ANALYSIS=False)[source]¶

Creates a directed graph: with each node representing an IRC user and each directed edge has a weight which mentions the number messages sent and recieved by that user in the selected time frame.

Parameters:	log_dict (dict) – with key as dateTime.date object and value as {“data”:datalist,”channel_name”:channels name} nicks (list) – list of all the nicks nick_same_list (list) – list of lists mentioning nicks which belong to same users
Returns:	message_number_graph (nx graph object)

lib.analysis.network.message_time_graph(log_dict, nicks, nick_same_list, DAY_BY_DAY_ANALYSIS=False)[source]¶

creates a directed graph where each edge denotes a message sent from a user to another user with the stamp denoting the time at which the message was sent

Parameters:	log_dict (dictionary) – Dictionary of logs data created using reader.py nicks (List) – List of nickname created using nickTracker.py nick_same_list (List) – List of same_nick names created using nickTracker.py
Returns:	List of message time graphs for different days msg_time_aggr_graph: aggregate message time graph where edges are date + time when sender sends a message to receiver
Return type:	msg_time_graph_list(List)

lib.analysis.network.nick_receiver_from_conn_comp(nick, conn_comp_list)[source]¶: creates nick_receiver from conn_comp_list, it is a helper function used in create_message_time_graph and message_number_graph

lib.analysis.network.nick_same_list_to_conn_comp_list(conn_comp_list, corrected_nick)[source]¶: changes nick_same_list to conn_comp_list because conn_comp_list is the main list of all users and nicks now it is a helper function used in message_number_graph, create_message_time_graph

User Analysis¶

lib.analysis.user.keywords(log_dict, nicks, nick_same_list)[source]¶

Returns keywods for all users

Parameters:	log_dict (str) – Dictionary of logs data created using reader.py nicks (List) – list of nickname created using nickTracker.py nick_same_list – List of same_nick names created using nickTracker.py

Returns: keywords_filtered: filtered keywords for user user_keyword_freq_dict: dictionary for each user having keywords and their frequency user_words_dict: keywods for user nicks_for_stop_words: stop words

lib.analysis.user.keywords_clusters(log_dict, nicks, nick_same_list)[source]¶

Uses keywords to form clusters of words post TF IDF (optional).

Parameters:	log_dict (str) – Dictionary of logs data created using reader.py nicks (List) – list of nickname created using nickTracker.py nick_same_list – List of same_nick names created using nickTracker.py

Returns: null

lib.analysis.user.nick_change_graph(log_dict, DAY_BY_DAY_ANALYSIS=False)[source]¶

creates a graph which tracks the nick changes of the users where each edge has a time stamp denoting the time at which the nick was changed by the user

Parameters:	log_dict (str) – Dictionary of logs created using reader.py
Returns:	list of the day_to_day nick changes if config.DAY_BY_DAY_ANALYSIS=True or else an aggregate nick change graph for the given time period.

lib.analysis.user.top_keywords_for_nick(user_keyword_freq_dict, nick, threshold, min_words_spoken)[source]¶

outputs top keywords for a particular nick

Parameters:	user_keyword_freq_dict (dict) – dictionary for each user having keywords and their frequency nick (str) – user to do analysis on threshold (float) – threshold on normalised values to seperate meaningful words min_words_spoken (int) – threhold on the minumum number of words spoken by a user to perform analysis on
Returns:	null

Utility¶

lib.util.build_graphs(nick_sender, nick_receiver, time, year, month, day, day_graph, aggr_graph)[source]¶

Parameters:

nick_sender (str) – person who has sent the message
nick_receiver (str) – person who receives the message
time (str) – time when message is sent
year (str) – year when message is sent
month (str) – month when message is sent
day (str) – day when message is sent
day_graph (networkx directed graph) – a single days graph to which we add edges
aggr_graph (networkx directed graph) – a whole time spans aggregate graph to which we add edges

Returns:

None

lib.util.correctLastCharCR(inText)[source]¶

if the last letter of the nick is ‘’ replace it by ‘CR’: for example rohanbecomes rohanCR to avoid complications in nx because of the special char ‘’

Parameters:	inText (str) – input nick, checked for ‘’ at last position
Returns:	updated string with ‘’ replaced by CR (if it exists)
Return type:	str

lib.util.correct_last_char_list(rec_list)[source]¶: corrects last char for all elements in rec_list

lib.util.correct_nick_for_(inText)[source]¶

last letter of nick maybe _ and this produces error in nickmatching

Parameters:	inText (str) – input nick, checked for ‘_’ at last position
Returns:	updated string with ‘_’ removed
Return type:	str

lib.util.create_connected_nick_list(conn_comp_list)[source]¶: A function that converts each individual list member to a list

lib.util.extend_conversation_list(nick_sender, nick_receiver, conversation)[source]¶

A functions that takes the nick_sender and nick_reciver and add them the conversation list and increase the weight. :param nick_sender: nick of user sending a message :param nick_receiver: nick of user to whom message is being send_time :param conversation: list of nick_sender’s and nick_reciever along with number of time message shared btw them

Returns:	list containg all the nick between whom messages have been shared
Return type:	conversation (list)

lib.util.get_year_month_day(day_content)[source]¶

A generator which takes a day_content and gives the associated year, month and date associated with it

Args: day_content(dictionary)=

{ “log_data”: day_data, “auxiliary_data”: {

“channel”: channel_name, “year”: year_iterator, “month”: month_iterator, “day”: day_iterator }

}

Returns:	year, str:month, str:day
Return type:	str

lib.util.to_edges(l)[source]¶

A generator which takes a graph and returns it’s edges | for example : to_edges([‘a’,’b’,’c’,’d’]) -> [(a,b), (b,c),(c,d)]

Parameters:	l (list) – graph object to be converted to edge_list
Returns:	edge list of the inputted graph object
Return type:	str

Visualisation¶

lib.vis.calc_plot_linear_fit(x_in, y_in, output_directory, output_file_name)[source]¶

Calculate and plot linar fit for data

Parameters:	x_in (list of int) – x_axis data y_in (list of int) – y_axis data output_drectory (str) – location to save graph output_file_name (str) – name of the image file to be saved
Returns:	null

lib.vis.csv_heatmap_generator_plotly(in_directory, output_directory, output_file_name)[source]¶

Plots heatmaps for all the csv files in the given directory

Parameters:	in_directory (str) – location of input csv files output_drectory (str) – location to save graph output_file_name (str) – name of the image file to be saved
Returns:	null

lib.vis.exponential_curve_fit_and_plot(data, initial_rows_filter, output_directory, output_file_name)[source]¶

Fit to an expontial curve and draw the x-y data after filtering the intial initial_rows_filter rows

Parameters:	data (list of list) – list of list representation csv data (with 2 coordinates) initial_rows_filter (int) – analysis on first how many rows output_drectory (str) – location to save graph output_file_name (str) – name of the image file to be saved
Returns:	curve fit variable for the equation a * np.exp(-b * x) + c b (int) : curve fit variable for the equation a * np.exp(-b * x) + c c (int) : curve fit variable for the equation a * np.exp(-b * x) + c mse (int) : Mean Squared error from the fit
Return type:	a (int)

lib.vis.exponential_curve_fit_and_plot_x_shifted(data, initial_rows_filter, output_directory, output_file_name)[source]¶

Fit to an expontial curve and draw the x-y data after filtering the intial initial_rows_filter rows Also ignores the the input untill first non-zero y-coordinate and shifts the graph along y axes untill that first non-zero entry

Parameters:	data (list of list) – list of list representation csv data (with 2 coordinates) initial_rows_filter (int) – analysis on first how many rows output_drectory (str) – location to save graph output_file_name (str) – name of the image file to be saved
Returns:	curve fit variable for the equation a * np.exp(-b * x) + c b (int) : curve fit variable for the equation a * np.exp(-b * x) + c c (int) : curve fit variable for the equation a * np.exp(-b * x) + c first_non_zero_index (int): amount by which the graph is shifted along y axis mse (int) : Mean Squared error from the fit
Return type:	a (int)

lib.vis.generate_group_bar_charts(y_values, x_values, trace_header, output_directory, output_file_name)[source]¶

Plots multiple bar graphs on same graph

example usage: generate_group_bar_charts([

[5.10114882, 5.0194652482, 4.9908093076], [4.5824497358, 4.7083614037, 4.3812775722], [2.6839471308, 3.0441476209, 3.6403820447] ], [‘#kubuntu-devel’, ‘#ubuntu-devel’, ‘#kubuntu’], [‘head1’, ‘head2’, ‘head3’], ‘/home/rohan/Desktop/’, ‘multi_box’

)

Parameters:	x_in (list of int) – x_axis data y_in (list of int) – y_axis data output_drectory (str) – location to save graph output_file_name (str) – name of the image file to be saved
Returns:	null

lib.vis.generate_log_plots(filter_val, plot_data, output_directory, output_file_name)[source]¶

Generate log plots for given time frame selecting first filter_val number ofan elements and plotting log of value on y axis.

Parameters:	filter_val (int) – number of values to be used from data for plotting plot_data (list of list) – data to be plotted output_drectory (str) – location to save graph output_file_name (str) – name of the image file to be saved
Returns:	null

lib.vis.generate_probability_distribution(data, initial_rows_filter)[source]¶

Normalises y coordinates, dividing it by sum of all entries of y coordiantes

Parameters:	data (list of list) – list of list representation csv data (with 2 coordinates) initial_rows_filter (int) – analysis on first how many rows
Returns:	x-coordinate (list) freq (list) normalised-y-coordinates

lib.vis.matplotlob_csv_heatmap_generator(csv_file, output_directory, output_file_name)[source]¶

Plots heatmaps for all the csv files in the given directory Can be used as a script for generating heatmaps, faster alternative to plotly

Parameters:	in_directory (str) – location of input csv files output_drectory (str) – location to save graph output_file_name (str) – name of the image file to be saved
Returns:	null

lib.vis.normal(loc=0.0, scale=1.0, size=None)¶

Draw random samples from a normal (Gaussian) distribution.

The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently [2], is often called the bell curve because of its characteristic shape (see the example below).

The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with its own unique distribution [2].

Parameters:	loc (float or array_like of floats) – Mean (“centre”) of the distribution. scale (float or array_like of floats) – Standard deviation (spread or “width”) of the distribution. size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., `(m, n, k)`, then `m * n * k` samples are drawn. If size is `None` (default), a single value is returned if `loc` and `scale` are both scalars. Otherwise, `np.broadcast(loc, scale).size` samples are drawn.
Returns:	out – Drawn samples from the parameterized normal distribution.
Return type:	ndarray or scalar

Documentation for the Code¶

Community Analysis¶

Channel Analysis¶

Network Analysis¶

User Analysis¶

Utility¶

Visualisation¶

Table Of Contents

Related Topics

This Page

Parameters:	nx_graph (object) – networkx graph object membership (list) – membership generated by infomap.community_infomap output_drectory (str) – location to save graph output_file_name (str) – name of the image file to be saved vertex_label_text (bool) – toggle between lable text and index show_edges (bool) – toggle to disable/enable edges during viz
Returns:	null